Basic local RAG from scratch

Created throughout Day 80-83 of my self-studying journey.

Repo structure

📁pdf-rag-from-scratch
└── 📁dev
    └── dev_preprocess_pdf.ipynb -> preprocess pdf using llama-index
    └── dev_rag.ipynb -> runs rag using llama-index preprocessed pdf
└── lbg_relationship_tnc.pdf
└── lbg_relationship_tnc_locked.pdf
└── preprocess_pdf.py -> preprocess pdf using langchain
└── rag.py -> run local rag chat using langchain preprocessed pdf
└── requirements.txt

Setup

git clone https://github.com/divakaivan/pdf-rag-from-scratch.git
pip install -r requirements.txt
python preprocess_pdf.py -> PDF must be saved in the same directory as the file, then it reads and processes the PDF for you, outputs a csv with the embeddings (Note! use for up to 100k embeddings)
python rag.py -> downloads gemma-2b-it, runs the RAG, and lets you have a chat with your PDF
(Optional) Run the dev versions (dev_preprocess_pdf.ipynb and dev_rag.ipynb) which uses llama-index as PDF reader and see the difference in the answer quality

PDF preprocessing

In the dev folder, I use the files for development, but also am using llama-index, at the time of writing using it requires an API key, which is free, but we do not know in the future~

In preprocess_pdf.py and rag.py I use just local, pip install and run libraries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Basic local RAG from scratch

Repo structure

Setup

PDF preprocessing

Demo of rag chat

Using out-of-the box embedding and language models

Any feedback is welcome! ^^

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
dev		dev
.gitignore		.gitignore
README.md		README.md
lbg_relationship_tnc.pdf		lbg_relationship_tnc.pdf
lbg_relationship_tnc_locked.pdf		lbg_relationship_tnc_locked.pdf
preprocess_pdf.py		preprocess_pdf.py
rag.py		rag.py
rag_video_demo.mp4		rag_video_demo.mp4
requirements.txt		requirements.txt

divakaivan/pdf-rag-from-scratch

Folders and files

Latest commit

History

Repository files navigation

Basic local RAG from scratch

Repo structure

Setup

PDF preprocessing

Demo of rag chat

Using out-of-the box embedding and language models

Any feedback is welcome! ^^

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages