Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting

This repository is the official implementation of the EMNLP 2024 Findings paper: Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting.

In this implementation, we demonstrate our approach using a subset of the FewRel dataset with 5-way classification (religion, location, competition class, operating system, owned by). The methodology presented here is general and can be readily applied to other relation extraction datasets or different relation types.

Repository Structure

.
├── data/                           # Directory containing datasets
│   └── fewrel/                    # FewRel dataset files
│       ├── fewrel_0/              # Dataset split
│       │   ├── syn_data/          # Generated synthetic data
│       │   ├── labels.json        # Relation labels
│       │   └── test.json          # Test set data
│       └── full_relation_description.json  # Natural language descriptions for all relations
├── generate/                      # Data generation scripts
│   ├── utils.py                  # Utility functions for data generation
│   ├── relation_synonyms.ipynb   # Notebook for generating relation synonyms
│   ├── generate_samples.ipynb    # Notebook for generating samples
│   └── paraphrasing.ipynb       # Notebook for paraphrasing samples
├── evaluation_results/           # Directory for evaluation outputs
├── custom_example_selector.py    # Custom selector for examples
├── load_dataset.py              # Dataset loading utilities
├── inference.ipynb              # Inference notebook

Environment Setup

Required Python packages:

pip install -r requirements.txt

Required packages include:

langchain
openai
pandas
numpy
tqdm
jupyter
ipykernel

Running the Pipeline

The data generation process consists of three main steps for data generation, followed by an inference step:

1. Generate Relation Synonyms

First, run the relation synonyms generation notebook to create variations of relationship descriptions:

jupyter notebook generate/relation_synonyms.ipynb

This notebook will generate different ways to express the same relationships, enriching the training data variety. The output will be saved in data/fewrel/fewrel_0/syn_data/relation_synonyms.json.

2. Generate Samples

After generating relation synonyms, run the sample generation notebook:

jupyter notebook generate/generate_samples.ipynb

This notebook creates the base training samples using the previously generated relation synonyms. The output will be saved in data/fewrel/fewrel_0/syn_data/synthetic_samples.json.

3. Paraphrasing

Finally, run the paraphrasing notebook to create more natural variations of the generated samples:

jupyter notebook generate/paraphrasing.ipynb

This notebook enhances the diversity of the training data by creating different ways to express the same information. The output will be saved in data/fewrel/fewrel_0/syn_data/argument_synthetic_samples.json.

4. Running Inference

After generating the training data, you can run inference to generate predictions:

jupyter notebook inference.ipynb

This notebook will:

Load the generated synthetic data
Run the model to generate predictions
Save the results in the evaluation_results/ directory

Output

The pipeline will generate data files at each step:

relation_synonyms.json: Contains the generated relation synonyms
synthetic_samples.json: Contains the base training samples
argument_synthetic_samples.json: Contains the final paraphrased training data
evaluation_results/: Contains prediction results from inference

Notes

Make sure to run the notebooks in the correct order as each step depends on the output of the previous step
Check the generated files after each step to ensure the quality of the output
You can adjust the generation parameters in each notebook according to your needs

Citation

If you use this code in your research, please cite our paper:

@inproceedings{liu2024unleashing,
  title={Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting},
  author={Liu, Siyi and Li, Yang and Li, Jiang and Yang, Shan and Lan, Yunshi},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
  pages={13147--13161},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting

Repository Structure

Environment Setup

Running the Pipeline

1. Generate Relation Synonyms

2. Generate Samples

3. Paraphrasing

4. Running Inference

Output

Notes

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
evaluation_results/fewrel/fewrel_0		evaluation_results/fewrel/fewrel_0
generate		generate
README.md		README.md
custom_example_selector.py		custom_example_selector.py
inference.ipynb		inference.ipynb
load_dataset.py		load_dataset.py
requirements.txt		requirements.txt

ssui-liu/self-prompting-re

Folders and files

Latest commit

History

Repository files navigation

Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting

Repository Structure

Environment Setup

Running the Pipeline

1. Generate Relation Synonyms

2. Generate Samples

3. Paraphrasing

4. Running Inference

Output

Notes

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages