Notion Archive

Search your Notion content when you can't access Notion itself.

What is Notion Archive?

Notion Archive makes your exported Notion workspace searchable offline. When you can't access the live Notion (due to network issues, leaving a company, account problems, or wanting local access), you can still search through all your content.

Perfect for:

Keeping access to company knowledge after leaving a job
Offline access to your Notion content
Searching Notion data in environments that don't have Notion access
Building tools that need to search Notion exports without API access

What it does

Parses Notion HTML exports
Generates embeddings using OpenAI or local models
Stores them in a vector database (ChromaDB)
Provides basic search functionality
Extracts some metadata (tags, titles, workspace structure)

Installation

pip install notion-archive

How to use it

1. Export your Notion workspace

In Notion, go to Settings & Members → Settings
Click "Export all workspace content"
Choose "HTML" format (not Markdown)
Download and unzip the file
You'll get a folder like Export-abc123.../

2. Use the library

from notion_archive import NotionArchive

# Initialize with persistent storage
archive = NotionArchive(
    embedding_model="text-embedding-3-large",
    db_path="./my_archive"  # Saves data permanently
)

# Add your export
archive.add_export('./Export-abc123-def456-etc')

# Build index (automatically skips if already exists)
archive.build_index()  # Smart - won't rebuild unnecessarily

# Search (always fast after first build)
results = archive.search("meeting notes")
for result in results:
    print(f"{result['title']}: {result['content'][:100]}...")

To force a rebuild:

archive.build_index(force_rebuild=True)  # Rebuilds even if index exists

Embedding Models

# OpenAI (requires API key, costs money)
archive = NotionArchive(embedding_model="text-embedding-3-large")
archive = NotionArchive(embedding_model="text-embedding-3-small")

# Local models (free, slower)
archive = NotionArchive(embedding_model="all-MiniLM-L6-v2")

How it works

You export your Notion workspace as HTML
The parser extracts text and basic metadata
Text gets chunked and turned into embeddings
Embeddings are stored in ChromaDB
Search queries get embedded and matched against stored chunks

Limitations

Only works with HTML exports (not live Notion)
No incremental updates - you have to rebuild the index
Basic metadata extraction
Search quality depends on your embedding model choice
Large workspaces can be expensive with OpenAI models

API

# Initialize
archive = NotionArchive(embedding_model="model-name", db_path="./archive_db")

# Add export folder  
archive.add_export("./path/to/export")

# Build search index (smart - skips if exists)
archive.build_index()

# Force rebuild if needed
archive.build_index(force_rebuild=True)

# Check if index exists
if archive.has_index():
    print("Ready to search!")

# Search
results = archive.search("query", limit=10)

# Get info
stats = archive.get_stats()

Requirements

Python 3.8+
A Notion workspace exported as HTML
OpenAI API key if using OpenAI models

Common issues

"No documents found" - Make sure you exported as HTML, not Markdown, and pointed to the unzipped folder.

"OpenAI API error" - Set your API key: export OPENAI_API_KEY=sk-your-key-here

"Memory error" - Large workspaces need lots of RAM. Try using a smaller embedding model or chunking your export.

License

MIT License - see LICENSE file for details.

A simple tool for adding semantic search to your Notion exports.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
examples		examples
notion_archive		notion_archive
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
requirements.txt		requirements.txt
setup.py		setup.py
test_archive.py		test_archive.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Notion Archive

What is Notion Archive?

What it does

Installation

How to use it

1. Export your Notion workspace

2. Use the library

Embedding Models

How it works

Limitations

API

Requirements

Common issues

License

About

Uh oh!

Releases

Packages

Languages

License

otron-io/notion-archive

Folders and files

Latest commit

History

Repository files navigation

Notion Archive

What is Notion Archive?

What it does

Installation

How to use it

1. Export your Notion workspace

2. Use the library

Embedding Models

How it works

Limitations

API

Requirements

Common issues

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages