Skip to content

neulab/ragged

Repository files navigation

Description

Retrieval-augmented generation (RAG) enhances language models by integrating external knowledge, but its effectiveness is highly dependent on system configuration. Improper retrieval settings can degrade performance, making RAG less reliable than closed-book generation. In this work, we introduce RAGGED, a framework for systematically evaluating RAG systems across diverse retriever-reader configurations, retrieval depths, and datasets. Our analysis reveals that reader robustness to noise is the key determinant of RAG stability and scalability. Some readers benefit from increased retrieval depth, while others degrade due to their sensitivity to distracting content. Through large-scale experiments on open-domain, multi-hop, and specialized-domain datasets, we show that retrievers, rerankers, and prompts influence performance but do not fundamentally alter these reader-driven trends. By providing a principled framework and new metrics to assess RAG stability and scalability, RAGGED enables systematic evaluation of retrieval-augmented generation systems, guiding future research on optimizing retrieval depth and model robustness.

Installation

To recreate the conda environment, run conda create -n ragged -y python=3.10 pip install -r requirements.txt

To run and evaluate the retriever, see retriver/README.md.

To run and evaluate the reader, see reader/README.md.

To conduct downstream RAGGED analysis, see analysis_framework/README.md.

Datasets

Our datasets are available on Huggingface

1. Download and process corpus datasets

Specify corpus_dir and corpus_name and see download_data.py for how to download and save the files in appropriate folders.

For Pubmed corpus for BioASQ, the corpus name is pubmed.

For KILT wikipedia corpus, the corpus name is kilt_wikipedia.

After downloading the datasets, process the corpus for ColBERT format by running python retriever/data_processing/create_corpus_tsv.py --corpus $corpus --corpus_dir $corpus_dir, which outputs $corpus_dir/${corpus}/${corpus}.json.

2. Download query datasets

Specify data_dir and dataset_name and see download_data.py for how to download the file to ${data_dir}/${dataset_name}.jsonl.

We support Natural Questions (KILT ver), HotpotQA (KILT ver), and BioASQ11B.

The above files are ready for BM25, but not for ColBERT. To reformat them for ColBERT, run python retriever/data_processing/create_query_tsv.py --data_dir $data_dir --dataset $dataset, which outputs $data_dir/${dataset}-queries.tsv.

3. Adapt your own datasets.

To adapt for BM25, format your corpus and query as jsonl files as instructed here. To adapt for ColBERT, format your corpus and query datasets as instructed here.

Citation

If you use our code, datasets, or concepts from our paper in your research, we would appreciate citing it in your work. Here is an example BibTeX entry for citing our paper:

@inproceedings{hsiaragged,
  title={RAGGED: Towards Informed Design of Scalable and Stable RAG Systems},
  author={Hsia, Jennifer and Shaikh, Afreen and Wang, Zora Zhiruo and Neubig, Graham},
  booktitle={Forty-second International Conference on Machine Learning}
}

Contact

For any questions, feedback, or discussions regarding this project, please feel free to open an issue on the repository or contact us:

About

Retrieval Augmented Generation Generalized Evaluation Dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •