|
| 1 | +# Search |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The search system provides full-text search functionality for documentation sites through a two-phase architecture. |
| 6 | +During build time, a Julia-based indexer processes all documentation content and generates a searchable index. |
| 7 | +At runtime, a JavaScript client-side interface performs real-time search operations against this pre-built index using a Web Worker for performance optimization. |
| 8 | + |
| 9 | +## Architecture |
| 10 | + |
| 11 | +The search implementation consists of three primary components operating in sequence: |
| 12 | + |
| 13 | +1. **Build-time Index Generation** - Julia code in `src/html/HTMLWriter.jl` processes documentation content during site generation. |
| 14 | +2. **Client-side Search Interface** - JavaScript code in `assets/html/js/search.js` handles user interactions and search execution. |
| 15 | +3. **Web Worker Processing** - Background thread execution prevents UI blocking during search operations. |
| 16 | + |
| 17 | +## Index Generation Process |
| 18 | + |
| 19 | +### 1. SearchRecord Structure |
| 20 | + |
| 21 | +The core data structure is the `SearchRecord` struct defined in `src/html/HTMLWriter.jl`: |
| 22 | + |
| 23 | +```julia |
| 24 | +struct SearchRecord |
| 25 | + src::String # URL/path to the document |
| 26 | + page::Documenter.Page # Reference to the page object |
| 27 | + fragment::String # URL fragment (for anchored content) |
| 28 | + category::String # Content category (page, section, docstring, etc.) |
| 29 | + title::String # Display title for search results |
| 30 | + page_title::String # Title of the containing page |
| 31 | + text::String # Searchable text content |
| 32 | +end |
| 33 | +``` |
| 34 | + |
| 35 | +### 2. Index Generation Pipeline |
| 36 | + |
| 37 | +The indexer processes documentation content through a multi-stage pipeline during HTML generation: |
| 38 | + |
| 39 | +1. **AST Traversal** - The system walks each page's markdown abstract syntax tree structure at `src/html/HTMLWriter.jl` in the function `function domify(dctx::DCtx)` |
| 40 | +2. **Record Instantiation** - Each content node generates a `SearchRecord` via the `searchrecord()` function at `src/html/HTMLWriter.jl` |
| 41 | +3. **Content Classification** - The categorization system assigns content types |
| 42 | +4. **Text Normalization** - The `mdflatten()` function extracts plain text from markdown structures for indexing. |
| 43 | +5. **Deduplication Pass** - Records sharing identical locations undergo merging to optimize index size. |
| 44 | +6. **JavaScript Serialization** - The processed index outputs as JavaScript object notation for client consumption. |
| 45 | + |
| 46 | +### 3. Index Output |
| 47 | + |
| 48 | +The search index is written to `search_index.js` in the following format: |
| 49 | + |
| 50 | +```javascript |
| 51 | +var documenterSearchIndex = {"docs": [ |
| 52 | + { |
| 53 | + "location": "page.html#fragment", |
| 54 | + "page": "Page Title", |
| 55 | + "title": "Content Title", |
| 56 | + "category": "section", |
| 57 | + "text": "Searchable content text..." |
| 58 | + } |
| 59 | + // ... more records |
| 60 | +]} |
| 61 | +``` |
| 62 | + |
| 63 | +### 4. Content Filtering |
| 64 | + |
| 65 | +The indexer excludes specific node types from search index generation (`src/html/HTMLWriter.jl`): |
| 66 | +- `MetaNode` - Metadata annotation blocks containing non-searchable directives |
| 67 | +- `DocsNodesBlock` - Internal documentation node structures |
| 68 | +- `SetupNode` - Configuration and setup directive blocks |
| 69 | + |
| 70 | +## Client-Side Search Implementation |
| 71 | + |
| 72 | +### 1. Search Architecture |
| 73 | + |
| 74 | +The client-side implementation employs a multi-threaded Web Worker architecture for computational isolation: |
| 75 | + |
| 76 | +- **Main Thread** - Manages user interface event handling, result filtering, and DOM manipulation operations |
| 77 | +- **Web Worker Thread** - Executes search algorithms using the [MiniSearch library](https://lucaong.github.io/minisearch/) without blocking the user interface |
| 78 | + |
| 79 | +### 2. MiniSearch Configuration |
| 80 | + |
| 81 | +The search system uses MiniSearch with the following configuration (`assets/html/js/search.js`): |
| 82 | + |
| 83 | +```javascript |
| 84 | +let index = new MiniSearch({ |
| 85 | + fields: ["title", "text"], // Fields to index |
| 86 | + storeFields: ["location", "title", "text", "category", "page"], // Fields to return |
| 87 | + processTerm: (term) => { |
| 88 | + // Custom term processing with stop words removal |
| 89 | + // Preserves Julia-specific symbols (@, !) |
| 90 | + }, |
| 91 | + tokenize: (string) => string.split(/[\s\-\.]+/), // Custom tokenizer |
| 92 | + searchOptions: { |
| 93 | + prefix: true, // Enable prefix matching |
| 94 | + boost: { title: 100 }, // Boost title matches |
| 95 | + fuzzy: 2 // Enable fuzzy matching |
| 96 | + } |
| 97 | +}); |
| 98 | +``` |
| 99 | + |
| 100 | +### 3. Stop Words |
| 101 | + |
| 102 | +The search engine implements a stop words filter (`assets/html/js/search.js`) derived from the Lunr 2.1.3 library, with Julia-language-specific modifications that preserve semantically important Julia symbols and keywords from filtration. |
| 103 | + |
| 104 | +### 4. Search Workflow |
| 105 | + |
| 106 | +#### Main Thread Execution Flow: |
| 107 | +1. **Input Event Processing** - User keystrokes in search input trigger `input` event listeners |
| 108 | +2. **Worker Thread Communication** - Available worker threads receive search requests via `postMessage` API |
| 109 | +3. **Result Set Processing** - Worker thread responses undergo filtering and DOM rendering |
| 110 | +4. **Browser State Management** - Search queries and active filters update browser URL parameters |
| 111 | + |
| 112 | +#### Web Worker Execution Flow: |
| 113 | +1. **Query Reception** - Main thread search requests arrive through message passing interface |
| 114 | +2. **Search Algorithm Execution** - MiniSearch performs full-text search with minimum score threshold of 1 |
| 115 | +3. **Result Set Generation** - Search matches generate HTML markup limited to 200 results per content category |
| 116 | +4. **Response Transmission** - Formatted search results return to main thread via message passing |
| 117 | + |
| 118 | +### 5. Result Rendering |
| 119 | + |
| 120 | +The search result rendering system generates structured output elements (`assets/html/js/search.js`): |
| 121 | +- **Title Component** - Content titles with syntax highlighting and category classification badges |
| 122 | +- **Text Snippet Component** - Extracted text excerpts with search term highlighting via HTML markup |
| 123 | +- **Navigation Link Component** - Direct URL references to specific content locations within documentation |
| 124 | +- **Context Metadata Component** - Hierarchical page information and document location path data |
| 125 | + |
| 126 | +### 6. Content Filtering System |
| 127 | + |
| 128 | +The search interface implements dynamic category-based result filtering: |
| 129 | +- Filter options generate automatically from indexed content categories |
| 130 | +- User filtering operates on content type classifications (page, section, docstring, etc.) |
| 131 | +- Client-side filtering execution provides immediate response without server requests |
| 132 | + |
| 133 | +## Performance Optimizations |
| 134 | + |
| 135 | +### 1. Web Worker Usage |
| 136 | +- Offloads search computation from main thread |
| 137 | +- Maintains UI responsiveness during search operations |
| 138 | +- Handles concurrent search requests efficiently |
| 139 | + |
| 140 | +### 2. Result Limiting |
| 141 | +- Pre-filters to 200 unique results per category |
| 142 | +- Prevents excessive DOM manipulation |
| 143 | +- Reduces memory usage for large documentation sites |
| 144 | + |
| 145 | +### 3. Index Deduplication |
| 146 | +- Merges duplicate entries at build time |
| 147 | +- Reduces index size and network transfer |
| 148 | +- Improves search performance |
| 149 | + |
| 150 | +### 4. Progressive Loading |
| 151 | +- Search index loads asynchronously |
| 152 | +- Fallback handling for missing dependencies |
| 153 | +- Graceful degradation without search functionality |
| 154 | + |
| 155 | +## Configuration Options |
| 156 | + |
| 157 | +### Build-Time Settings |
| 158 | + |
| 159 | +```julia |
| 160 | +# In make.jl |
| 161 | +makedocs( |
| 162 | + # ... other options |
| 163 | + format = Documenter.HTML( |
| 164 | + # Search-related settings |
| 165 | + search_size_threshold_warn = 200_000 # Warn if index > 200KB |
| 166 | + ) |
| 167 | +) |
| 168 | +``` |
| 169 | + |
| 170 | +### Size Thresholds |
| 171 | +- Warning threshold: 200KB by default |
| 172 | +- Large indices may impact page load performance |
| 173 | +- Automatic warnings during build process |
| 174 | + |
| 175 | +## Integration Points |
| 176 | + |
| 177 | +### 1. Asset Management |
| 178 | +- Search JavaScript is bundled with other Documenter assets |
| 179 | +- MiniSearch library loaded from CDN (`__MINISEARCH_VERSION__` placeholder) |
| 180 | +- Dependencies managed through `JSDependencies.jl` |
| 181 | + |
| 182 | +### 2. Theme Integration |
| 183 | +- Search UI styled using Bulma CSS framework |
| 184 | +- Responsive design for mobile devices |
| 185 | +- Dark/light theme support |
| 186 | + |
| 187 | +### 3. URL Routing |
| 188 | +- Search queries persist in URL parameters (`?q=search_term`) |
| 189 | +- Filter states maintained in URL (`?filter=section`) |
| 190 | +- Browser history integration for navigation |
| 191 | + |
| 192 | +## Testing and Benchmarking |
| 193 | + |
| 194 | +### 1. Test Infrastructure |
| 195 | +- Real search testing: `test/search/real_search.jl` |
| 196 | +- Benchmark suite: `test/search/run_benchmarks.jl` |
| 197 | +- Edge case testing: `test/search_edge_cases/` |
| 198 | + |
| 199 | +### 2. Search Validation |
| 200 | +The testing system provides: |
| 201 | +- Index generation validation |
| 202 | +- Search result accuracy verification |
| 203 | +- Performance benchmarking capabilities |
| 204 | +- Edge case handling verification |
| 205 | + |
| 206 | + |
0 commit comments