AI Search Engine with Client–Server Architecture

A modular information retrieval system built from scratch with traditional IR, neural reranking, vector search, and RAG-style generation.
The system exposes a CLI-based client–server interface.

IMPORTANT

This tool uses GEMINI-api so please create an .env file with GEMINI_API=YOUR_API_KEY

for it to work without failure

Features

Crawler
- Starts from manual seeds or expands dynamically based on queries.
- Extracts and stores page text into docs.jsonl.
Indexing
- Builds inverted index (index.json).
- Tracks term frequency, document frequency, and lengths.
Ranking
- Implements TF-IDF and BM25.
- Phrase queries and fuzzy matching supported.
Vector Search
- Stores dense embeddings using FAISS.
- Query embeddings compared to document embeddings for semantic search.
Hybrid Scoring
- Combines BM25 and FAISS scores with weighted merging.
- Produces top-k candidates.
Reranking
- Uses a cross-encoder for fine-grained scoring of top candidates.
RAG-style Generation
- Top retrieved documents passed with query to a generative model (e.g., Gemini).
- Produces coherent, context-aware answers.
Text Preprocessing
- Pre-process query and document text
- Tokenization + Stemmerization
Client–Server Architecture

Server: Hosts index, embeddings, and search logic.
Client: CLI that sends queries, receives results, and displays answers.

AI Components

Vector Search with Embeddings (FAISS)

Documents and queries are converted into dense embeddings using a language model.
This allows semantic search (matching meaning, not just words).

Cross-Encoder Reranker

A transformer-based model scores the relevance of query–document pairs.
Unlike BM25 (bag-of-words), this uses deep contextual understanding of language.

RAG-style Answer Generation

A generative model (e.g., Gemini) is given the query + retrieved context.
It produces a natural-language answer, simulating an intelligent assistant.

Installation

git clone https://github.com/kaifkh20/ai-engine
cd ai-engine
pip install -r req.txt

Usage

Start the server

Make sure Python 3.8.20 is installed. Use pyenv to use this version locally

python -m packages.crawler #to run the crawler run this only if its first time
                           # or you update the crawler_config.json
python server.py

Client

python client.py "your query here"

Dependencies

As stated in req.text

Next Steps

Scale crawling for larger datasets.
Web Client
Extend client with more commands.
Benchmark against standard IR datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

AI Search Engine with Client–Server Architecture

IMPORTANT

for it to work without failure

Features

AI Components

Vector Search with Embeddings (FAISS)

Cross-Encoder Reranker

RAG-style Answer Generation

Installation

Usage

Start the server

Client

Dependencies

Next Steps

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
packages		packages
.gitignore		.gitignore
README.md		README.md
client.py		client.py
crawler_config.json		crawler_config.json
req.txt		req.txt
server.py		server.py

Uh oh!

Uh oh!

kaifkh20/ai-engine

Folders and files

Latest commit

History

Repository files navigation

AI Search Engine with Client–Server Architecture

IMPORTANT

for it to work without failure

Features

AI Components

Vector Search with Embeddings (FAISS)

Cross-Encoder Reranker

RAG-style Answer Generation

Installation

Usage

Start the server

Client

Dependencies

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages