Search your Notion content when you can't access Notion itself.
Notion Archive makes your exported Notion workspace searchable offline. When you can't access the live Notion (due to network issues, leaving a company, account problems, or wanting local access), you can still search through all your content.
Perfect for:
- Keeping access to company knowledge after leaving a job
- Offline access to your Notion content
- Searching Notion data in environments that don't have Notion access
- Building tools that need to search Notion exports without API access
- Parses Notion HTML exports
- Generates embeddings using OpenAI or local models
- Stores them in a vector database (ChromaDB)
- Provides basic search functionality
- Extracts some metadata (tags, titles, workspace structure)
pip install notion-archive
- In Notion, go to Settings & Members → Settings
- Click "Export all workspace content"
- Choose "HTML" format (not Markdown)
- Download and unzip the file
- You'll get a folder like
Export-abc123.../
from notion_archive import NotionArchive
# Initialize with persistent storage
archive = NotionArchive(
embedding_model="text-embedding-3-large",
db_path="./my_archive" # Saves data permanently
)
# Add your export
archive.add_export('./Export-abc123-def456-etc')
# Build index (automatically skips if already exists)
archive.build_index() # Smart - won't rebuild unnecessarily
# Search (always fast after first build)
results = archive.search("meeting notes")
for result in results:
print(f"{result['title']}: {result['content'][:100]}...")
To force a rebuild:
archive.build_index(force_rebuild=True) # Rebuilds even if index exists
# OpenAI (requires API key, costs money)
archive = NotionArchive(embedding_model="text-embedding-3-large")
archive = NotionArchive(embedding_model="text-embedding-3-small")
# Local models (free, slower)
archive = NotionArchive(embedding_model="all-MiniLM-L6-v2")
- You export your Notion workspace as HTML
- The parser extracts text and basic metadata
- Text gets chunked and turned into embeddings
- Embeddings are stored in ChromaDB
- Search queries get embedded and matched against stored chunks
- Only works with HTML exports (not live Notion)
- No incremental updates - you have to rebuild the index
- Basic metadata extraction
- Search quality depends on your embedding model choice
- Large workspaces can be expensive with OpenAI models
# Initialize
archive = NotionArchive(embedding_model="model-name", db_path="./archive_db")
# Add export folder
archive.add_export("./path/to/export")
# Build search index (smart - skips if exists)
archive.build_index()
# Force rebuild if needed
archive.build_index(force_rebuild=True)
# Check if index exists
if archive.has_index():
print("Ready to search!")
# Search
results = archive.search("query", limit=10)
# Get info
stats = archive.get_stats()
- Python 3.8+
- A Notion workspace exported as HTML
- OpenAI API key if using OpenAI models
"No documents found" - Make sure you exported as HTML, not Markdown, and pointed to the unzipped folder.
"OpenAI API error" - Set your API key: export OPENAI_API_KEY=sk-your-key-here
"Memory error" - Large workspaces need lots of RAM. Try using a smaller embedding model or chunking your export.
MIT License - see LICENSE file for details.
A simple tool for adding semantic search to your Notion exports.