Assignment4

AI-Powered Research Tool Documentation

Project Resources

Copolit App link (hosted on GCP): http://34.57.239.217:3000/
Airflow Link (hosted on GCP): http://34.172.235.53:8080/
FastAPI Link (hosted on GCP): http://34.57.239.217:8000/
Demo Video URL: (https://drive.google.com/file/d/1TYBWW9FVpCXEEEbj6ZN_Judc86lgEhTP/view?usp=sharing)
Google Codelabs: Codelabs

Goal of the Project

This project builds an end-to-end research tool that combines document processing, vector storage, and multi-agent interactions to create an intelligent research assistant. The system uses Airflow for pipeline orchestration, Pinecone for vector storage, and Langraph for multi-agent coordination.

Project Overview

The system processes documents using Docling, stores vectors in Pinecone, and provides an interactive research interface powered by multiple AI agents. Users can conduct document-based research, access relevant papers through Arxiv, perform web searches, and generate comprehensive research reports.

Key Technologies Involved

Docling: Document parsing and structuring
Pinecone: Vector database for semantic search and retrieval
Langraph: Multi-agent system orchestration
Airflow: Pipeline automation and task orchestration
Streamlit: User interface for research interactions
FastAPI: Backend API services
Arxiv API: Academic paper search and retrieval
Web Search API: Broader context research capabilities
RAG: Retrieval-augmented generation for document Q&A

System Architecture

System Workflow

Document Processing Pipeline
- Airflow orchestrates document ingestion
- Docling parses and structures documents
- Vectors are stored in Pinecone
Multi-Agent Research System
- Langraph coordinates multiple research agents
- Arxiv agent searches academic papers
- Web search agent provides broader context
- RAG agent handles document-specific queries
User Interaction
- Interface for document selection
- Support for 5-6 questions per document
- Research session tracking
- Export capabilities for reports and Codelabs

Project Structure

agent/
│ 
├── research_canvas/
│   ├── agent.py            #ArxivSearchTool, WebSearchTool, RAGSystem, SavePDFTool implementation
│   ├── __init__.py
│   ├── state.py            # AgentState management
│   ├── model.py            # Model configurations
│   └── download.py         # Resource downloading utilities
│
├── chat_outputs/           # Directory for markdown outputs
│   └── *.md               # Chat history markdown files
│
├── reports/               # Directory for PDF reports
│   └── *.pdf             # Generated PDF reports
│
├── config/
│   ├── __init__.py
│   └── settings.py        # Environment and API configurations
│
└── chat/
    ├── __init__.py
    └── chat_node.py       # Main chat implementation

Prerequisites

Docker and Docker Compose
Python 3.8+
GCP account
Pinecone API key
OpenAI API key (for RAG)

Environment Setup

Each component requires specific environment variables:

Airflow: airflow/.env

AIRFLOW_UID=50000
AIRFLOW_GID=50000
_AIRFLOW_WWW_USER_USERNAME=admin
_AIRFLOW_WWW_USER_PASSWORD=admin

Backend: backend/.env

PINECONE_API_KEY=your_key
OPENAI_API_KEY=your_key
ARXIV_EMAIL=your_email

Frontend: frontend/.env

BACKEND_URL=http://backend:8000

Installation Steps

Clone the repository:

git clone https://github.com/your-username/research-tool
cd research-tool

Start Airflow:

cd airflow
docker compose up -d

Start the application:

docker compose up --build -d

Access the application:

Research Interface: http://localhost:8501
Airflow Dashboard: http://localhost:8080
Backend API: http://localhost:8000

Usage Guide

Document Research
- Select a document from the processed collection
- Ask up to 6 research questions
- View responses from multiple agents
Export Options
- Generate PDF research reports
- Export findings in Codelabs format

Deployment

The system is deployed on Google Cloud Platform using Docker containers:

Set up GCP project
Configure GCP credentials
Deploy using Cloud Run or GKE

Contributors

Developer 1: Pipeline Development & Document Processing
Developer 2: Multi-Agent System & Integration
Developer 3: Frontend & Export Functionality

Additional Notes

Name	Percentage Contribution
Sarthak Somvanshi	33.33%
Yuga Kanse	33.33%
Tanvi Inchanalkar	33.33%
WE ATTEST THAT WE HAVEN'T USED ANY OTHER STUDENTS' WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Assignment4

AI-Powered Research Tool Documentation

Project Resources

Goal of the Project

Project Overview

Key Technologies Involved

System Architecture

System Workflow

Project Structure

Prerequisites

Environment Setup

Installation Steps

Usage Guide

Deployment

Contributors

Additional Notes

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Airflow		Airflow
Architecture		Architecture
agent		agent
chat_outputs		chat_outputs
reports		reports
ui		ui
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml

BigData-Fall2024-Team4/Assignment4

Folders and files

Latest commit

History

Repository files navigation

Assignment4

AI-Powered Research Tool Documentation

Project Resources

Goal of the Project

Project Overview

Key Technologies Involved

System Architecture

System Workflow

Project Structure

Prerequisites

Environment Setup

Installation Steps

Usage Guide

Deployment

Contributors

Additional Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages