Generate enriched search phrases and natural questions for each hit in the dataset JSON files.
- Install dependencies:
npm install
- Set OpenAI key (required):
export OPENAI_API_KEY=sk-...
- Run generation:
npm run generate:2025
Outputs queries-2025.json
with a searchQueries
object per hit. An OpenAI API key is required; the script will exit if it's missing.
You can control batching, concurrency, and incremental saving:
tsx scripts/generateQueries.ts 2025.json queries-2025.json --limit=100 --batchSize=10 --concurrency=5 --saveEvery=2
Flags:
--limit=N
Process only first N hits.--batchSize=K
Number of hits grouped per batch (default 5).--concurrency=C
Parallel requests inside a batch (default = batchSize).--saveEvery=B
Persist to disk every B batches (default 1).--resume
Resume from an existing output file (matched byid
).
Progress is streamed in-place with elapsed time, ETA, and error count. A .tmp
file is atomically swapped for durability on each save.
After generation you can flatten all generated phrases/questions to a CSV (one row per anchor with associated positive content snippet):
npm run export:2025
# or custom
tsx scripts/exportAnchors.ts queries-2025.json anchors-2025.csv --maxContentChars=3000
CSV Columns:
- hit_id: original section id
- rule: section name/number
- anchor: generated phrase or question
- content: associated markdown snippet (truncated)