- Copolit App link (hosted on GCP): http://34.57.239.217:3000/
- Airflow Link (hosted on GCP): http://34.172.235.53:8080/
- FastAPI Link (hosted on GCP): http://34.57.239.217:8000/
- Demo Video URL: (https://drive.google.com/file/d/1TYBWW9FVpCXEEEbj6ZN_Judc86lgEhTP/view?usp=sharing)
- Google Codelabs: Codelabs
This project builds an end-to-end research tool that combines document processing, vector storage, and multi-agent interactions to create an intelligent research assistant. The system uses Airflow for pipeline orchestration, Pinecone for vector storage, and Langraph for multi-agent coordination.
The system processes documents using Docling, stores vectors in Pinecone, and provides an interactive research interface powered by multiple AI agents. Users can conduct document-based research, access relevant papers through Arxiv, perform web searches, and generate comprehensive research reports.
- Docling: Document parsing and structuring
- Pinecone: Vector database for semantic search and retrieval
- Langraph: Multi-agent system orchestration
- Airflow: Pipeline automation and task orchestration
- Streamlit: User interface for research interactions
- FastAPI: Backend API services
- Arxiv API: Academic paper search and retrieval
- Web Search API: Broader context research capabilities
- RAG: Retrieval-augmented generation for document Q&A
-
Document Processing Pipeline
- Airflow orchestrates document ingestion
- Docling parses and structures documents
- Vectors are stored in Pinecone
-
Multi-Agent Research System
- Langraph coordinates multiple research agents
- Arxiv agent searches academic papers
- Web search agent provides broader context
- RAG agent handles document-specific queries
-
User Interaction
- Interface for document selection
- Support for 5-6 questions per document
- Research session tracking
- Export capabilities for reports and Codelabs
agent/
│
├── research_canvas/
│ ├── agent.py #ArxivSearchTool, WebSearchTool, RAGSystem, SavePDFTool implementation
│ ├── __init__.py
│ ├── state.py # AgentState management
│ ├── model.py # Model configurations
│ └── download.py # Resource downloading utilities
│
├── chat_outputs/ # Directory for markdown outputs
│ └── *.md # Chat history markdown files
│
├── reports/ # Directory for PDF reports
│ └── *.pdf # Generated PDF reports
│
├── config/
│ ├── __init__.py
│ └── settings.py # Environment and API configurations
│
└── chat/
├── __init__.py
└── chat_node.py # Main chat implementation
- Docker and Docker Compose
- Python 3.8+
- GCP account
- Pinecone API key
- OpenAI API key (for RAG)
Each component requires specific environment variables:
- Airflow:
airflow/.env
AIRFLOW_UID=50000
AIRFLOW_GID=50000
_AIRFLOW_WWW_USER_USERNAME=admin
_AIRFLOW_WWW_USER_PASSWORD=admin
- Backend:
backend/.env
PINECONE_API_KEY=your_key
OPENAI_API_KEY=your_key
ARXIV_EMAIL=your_email
- Frontend:
frontend/.env
BACKEND_URL=http://backend:8000
- Clone the repository:
git clone https://github.com/your-username/research-tool
cd research-tool
- Start Airflow:
cd airflow
docker compose up -d
- Start the application:
docker compose up --build -d
- Access the application:
- Research Interface: http://localhost:8501
- Airflow Dashboard: http://localhost:8080
- Backend API: http://localhost:8000
-
Document Research
- Select a document from the processed collection
- Ask up to 6 research questions
- View responses from multiple agents
-
Export Options
- Generate PDF research reports
- Export findings in Codelabs format
The system is deployed on Google Cloud Platform using Docker containers:
- Set up GCP project
- Configure GCP credentials
- Deploy using Cloud Run or GKE
- Developer 1: Pipeline Development & Document Processing
- Developer 2: Multi-Agent System & Integration
- Developer 3: Frontend & Export Functionality
Name | Percentage Contribution |
---|---|
Sarthak Somvanshi | 33.33% |
Yuga Kanse | 33.33% |
Tanvi Inchanalkar | 33.33% |
WE ATTEST THAT WE HAVEN'T USED ANY OTHER STUDENTS' WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK. |