This project lets you run a fully local RAG-based (Retrieval-Augmented Generation) chatbot using your own PDFs or web content. Ask questions in natural language, and get answers based on the actual contents of your documents.
It uses the following tools for this:
- LangChain for orchestration
- FAISS for semantic vector search
- Ollama to run open-source LLMs locally
- Streamlit for an easy-to-use chat interface
- 📄 Upload PDFs or URLs as your data source
- 🧠 Store document chunks as embeddings in a FAISS vector store
- 🔍 Retrieve relevant content using semantic similarity search
- 💬 Generate context-aware answers via local LLM
- 💻 All running 100% locally
Make sure you have Ollama installed and a compatible model (e.g. granite3.3
) downloaded.
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull granite3.3
ollama run granite3.3
git clone https://github.com/yourname/rag-chatbot.git
cd rag_chatbot
This project uses Python 3.9+.
pip install -r requirements.txt
streamlit run app.py
Once the app is running: Go to the sidebar to upload one or more PDF files.
Ask natural questions about the content in the chat interface.
The chatbot will search for relevant sections and answer using context.
- ChatUI: The user interface is built with Streamlit, using its built-in chat_message components to create a conversational layout. Users can upload documents in the sidebar and interact with the chatbot in real time.
- LLMRAGHandler: This is the main component that connects everything. It is implemented using LangChain and is responsible for managing the conversation flow, retrieving relevant context from the vector store, formatting prompts using a custom template, calling the LLM, and caching chat history.
- Vector Store: Responsible for storing the documents as vector embeddings in FAISS, a high-speed similarity search library and retrieving the relevant context
- LLM: The chatbot runs the Granite 3.3 model locally using Ollama. This means: Easy setup and prototyping, easy model switching, and full control over your data (everything stays local
- Conversation Store: To make the chatbot stateful, we store the conversation history in a local file (e.g. JSON). This allows the chat to resume where you left off - even after refreshing the browser.
- Initial PDF parsing and embedding may take a few seconds for large files.
- Latency depends on the chosen LLM model.
- Evaluation of answers is qualitative — no scoring function included.
- Runs only locally for easier development
- Use agentic RAG (history-aware retrievers, dynamic tool-calling)
- Tool Calling
- Other Data Sources (Google Drive, Notion, ...)
- Cloud deployment
- UI enhancements and document summarization
MIT License. See LICENSE for details.