Retrieval Augmented Generation for EIC based document retriever #6
karthik18495
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The Project
Welcome to the project where we aim to build a scalable RAG based document retriever for the upcoming Electron Ion Collider.
The project was first proposed at the AI4EIC workshop in 2023 held at Catholic University of America from November 29 to December 1 2023. The project evolved from there to a web application in early March 2024 with a proceeding (https://arxiv.org/abs/2403.15729)[here].
Retrieval Augmented Generation for EIC
This is a project that is currently being developed to build a RAG based system for the upcoming EIC.
There are three main parts to the RAG pipeline.
Ingestion
Ingestion in Retrieval-Augmented Generation (RAG) is a crucial process that involves the preparation and organization of data to be used by the model. This process can be broken down further into three main steps: chunking of information, embedding models, and storing it in a vector database.
Chunking
This is the first step in the ingestion process. The raw data can come in various forms. which could be a large corpus of text, is divided into manageable chunks or segments. The size of these chunks can vary depending on the specific requirements of the task at hand. Chunking helps in reducing the complexity of the data and makes it easier for the model to process the information.
Retrieval
Content Fusion and Generation
Types of RAG system
A very recent survey paper. summarizes the types of RAG system1. There are three types of RAG architecture broadly based on where LLM being used in the pipeline
Project Milestones
References
Footnotes
Types of RAG ↩
Beta Was this translation helpful? Give feedback.
All reactions