IntelliDocs

Overview

IntelliDocs is a Retrieval-Augmented Generation (RAG) based project designed to assist users in querying and extracting information from their PDF documents. By leveraging advanced natural language processing techniques, IntelliDocs enables users to efficiently retrieve relevant content from large volumes of text within PDFs.

Project Objectives

PDF Extraction: Implement methods to extract text from PDF files, ensuring the preservation of formatting and structure.
Chunking: Divide the text into manageable chunks to facilitate efficient querying.
Embedding: Use Sentence Transformers to generate embeddings for the text chunks, enabling semantic similarity searches.
Querying: Develop a retrieval system that allows users to input queries and receive relevant chunks of text based on semantic similarity.
Structuring: Structure the generated response with the help of a LLM.

Technologies Used

Programming Language: Python
Libraries:
- fitz: For PDF text extraction.
- sentence-transformers: For embedding text chunks.
- Streamlit: For creating the user interface.
- Chromadb: For vector database.

Step-by-Step Guide to Clone and Run IntelliDocs

Prerequisites

Ensure you have the following installed on your system:

Python (version 3.12)
uv (Python package installer)
Git

Step 1: Clone the Repository

Open your terminal or command prompt and run the following command:

git clone https://github.com/anishka07/intellidocs.git

Step 2: Create a runnable environment automatically with uv

Run the following command:

uv sync

Step 3: Run IntelliDocs using gRPC or streamlit

Run IntelliDocs gRPC server and client:

uv run python server.py

uv run python client.py process *your pdf's name*

uv run python client.py query *your pdf key* *your query*

To run IntelliDocs from it's streamlit UI:

uv run streamlit run ui.py

Streamlit Interface

User Interface:

Indexing multiple PDFs as input:

Query Response (Both Structured and Relevant Chunks):

Usage

Input PDF: Upload your PDF/PDFs using the Streamlit interface.
Querying: Select the PDF you want to query using the unique generated PDF key and query the PDF.
Results: The system will return the most relevant text chunks extracted from the PDF selected.

TODOs

modify the gRPC code to make it more robust
Make the code more dynamic
Web application with FastAPI
dockerize the whole thing

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
examples		examples
resources		resources
src		src
utils		utils
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
ui.py		ui.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IntelliDocs

Overview

Project Objectives

Technologies Used

Step-by-Step Guide to Clone and Run IntelliDocs

Prerequisites

Step 1: Clone the Repository

Step 2: Create a runnable environment automatically with uv

Step 3: Run IntelliDocs using gRPC or streamlit

Streamlit Interface

Usage

TODOs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

anishka07/Intellidocs

Folders and files

Latest commit

History

Repository files navigation

IntelliDocs

Overview

Project Objectives

Technologies Used

Step-by-Step Guide to Clone and Run IntelliDocs

Prerequisites

Step 1: Clone the Repository

Step 2: Create a runnable environment automatically with uv

Step 3: Run IntelliDocs using gRPC or streamlit

Streamlit Interface

Usage

TODOs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages