This repository is the official implementation of the EMNLP 2024 Findings paper: Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting.
In this implementation, we demonstrate our approach using a subset of the FewRel dataset with 5-way classification (religion, location, competition class, operating system, owned by). The methodology presented here is general and can be readily applied to other relation extraction datasets or different relation types.
.
├── data/ # Directory containing datasets
│ └── fewrel/ # FewRel dataset files
│ ├── fewrel_0/ # Dataset split
│ │ ├── syn_data/ # Generated synthetic data
│ │ ├── labels.json # Relation labels
│ │ └── test.json # Test set data
│ └── full_relation_description.json # Natural language descriptions for all relations
├── generate/ # Data generation scripts
│ ├── utils.py # Utility functions for data generation
│ ├── relation_synonyms.ipynb # Notebook for generating relation synonyms
│ ├── generate_samples.ipynb # Notebook for generating samples
│ └── paraphrasing.ipynb # Notebook for paraphrasing samples
├── evaluation_results/ # Directory for evaluation outputs
├── custom_example_selector.py # Custom selector for examples
├── load_dataset.py # Dataset loading utilities
├── inference.ipynb # Inference notebook
- Required Python packages:
pip install -r requirements.txt
Required packages include:
- langchain
- openai
- pandas
- numpy
- tqdm
- jupyter
- ipykernel
The data generation process consists of three main steps for data generation, followed by an inference step:
First, run the relation synonyms generation notebook to create variations of relationship descriptions:
jupyter notebook generate/relation_synonyms.ipynb
This notebook will generate different ways to express the same relationships, enriching the training data variety. The output will be saved in data/fewrel/fewrel_0/syn_data/relation_synonyms.json
.
After generating relation synonyms, run the sample generation notebook:
jupyter notebook generate/generate_samples.ipynb
This notebook creates the base training samples using the previously generated relation synonyms. The output will be saved in data/fewrel/fewrel_0/syn_data/synthetic_samples.json
.
Finally, run the paraphrasing notebook to create more natural variations of the generated samples:
jupyter notebook generate/paraphrasing.ipynb
This notebook enhances the diversity of the training data by creating different ways to express the same information. The output will be saved in data/fewrel/fewrel_0/syn_data/argument_synthetic_samples.json
.
After generating the training data, you can run inference to generate predictions:
jupyter notebook inference.ipynb
This notebook will:
- Load the generated synthetic data
- Run the model to generate predictions
- Save the results in the
evaluation_results/
directory
The pipeline will generate data files at each step:
relation_synonyms.json
: Contains the generated relation synonymssynthetic_samples.json
: Contains the base training samplesargument_synthetic_samples.json
: Contains the final paraphrased training dataevaluation_results/
: Contains prediction results from inference
- Make sure to run the notebooks in the correct order as each step depends on the output of the previous step
- Check the generated files after each step to ensure the quality of the output
- You can adjust the generation parameters in each notebook according to your needs
If you use this code in your research, please cite our paper:
@inproceedings{liu2024unleashing,
title={Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting},
author={Liu, Siyi and Li, Yang and Li, Jiang and Yang, Shan and Lan, Yunshi},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
pages={13147--13161},
year={2024}
}