Skip to content

ssui-liu/self-prompting-re

Repository files navigation

Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting

This repository is the official implementation of the EMNLP 2024 Findings paper: Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting.

In this implementation, we demonstrate our approach using a subset of the FewRel dataset with 5-way classification (religion, location, competition class, operating system, owned by). The methodology presented here is general and can be readily applied to other relation extraction datasets or different relation types.

Repository Structure

.
├── data/                           # Directory containing datasets
│   └── fewrel/                    # FewRel dataset files
│       ├── fewrel_0/              # Dataset split
│       │   ├── syn_data/          # Generated synthetic data
│       │   ├── labels.json        # Relation labels
│       │   └── test.json          # Test set data
│       └── full_relation_description.json  # Natural language descriptions for all relations
├── generate/                      # Data generation scripts
│   ├── utils.py                  # Utility functions for data generation
│   ├── relation_synonyms.ipynb   # Notebook for generating relation synonyms
│   ├── generate_samples.ipynb    # Notebook for generating samples
│   └── paraphrasing.ipynb       # Notebook for paraphrasing samples
├── evaluation_results/           # Directory for evaluation outputs
├── custom_example_selector.py    # Custom selector for examples
├── load_dataset.py              # Dataset loading utilities
├── inference.ipynb              # Inference notebook

Environment Setup

  1. Required Python packages:
pip install -r requirements.txt

Required packages include:

  • langchain
  • openai
  • pandas
  • numpy
  • tqdm
  • jupyter
  • ipykernel

Running the Pipeline

The data generation process consists of three main steps for data generation, followed by an inference step:

1. Generate Relation Synonyms

First, run the relation synonyms generation notebook to create variations of relationship descriptions:

jupyter notebook generate/relation_synonyms.ipynb

This notebook will generate different ways to express the same relationships, enriching the training data variety. The output will be saved in data/fewrel/fewrel_0/syn_data/relation_synonyms.json.

2. Generate Samples

After generating relation synonyms, run the sample generation notebook:

jupyter notebook generate/generate_samples.ipynb

This notebook creates the base training samples using the previously generated relation synonyms. The output will be saved in data/fewrel/fewrel_0/syn_data/synthetic_samples.json.

3. Paraphrasing

Finally, run the paraphrasing notebook to create more natural variations of the generated samples:

jupyter notebook generate/paraphrasing.ipynb

This notebook enhances the diversity of the training data by creating different ways to express the same information. The output will be saved in data/fewrel/fewrel_0/syn_data/argument_synthetic_samples.json.

4. Running Inference

After generating the training data, you can run inference to generate predictions:

jupyter notebook inference.ipynb

This notebook will:

  • Load the generated synthetic data
  • Run the model to generate predictions
  • Save the results in the evaluation_results/ directory

Output

The pipeline will generate data files at each step:

  • relation_synonyms.json: Contains the generated relation synonyms
  • synthetic_samples.json: Contains the base training samples
  • argument_synthetic_samples.json: Contains the final paraphrased training data
  • evaluation_results/: Contains prediction results from inference

Notes

  • Make sure to run the notebooks in the correct order as each step depends on the output of the previous step
  • Check the generated files after each step to ensure the quality of the output
  • You can adjust the generation parameters in each notebook according to your needs

Citation

If you use this code in your research, please cite our paper:

@inproceedings{liu2024unleashing,
  title={Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting},
  author={Liu, Siyi and Li, Yang and Li, Jiang and Yang, Shan and Lan, Yunshi},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
  pages={13147--13161},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published