-
-
Notifications
You must be signed in to change notification settings - Fork 793
feat: Implement synonym index management and API #2425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v30
Are you sure you want to change the base?
Conversation
…oval logic and enhancing test coverage
It seems like some synonyms aren't being triggered when migrating to the new version. Reproduction Steps
#!/bin/bash
export TYPESENSE_HOST="http://localhost:8108"
export TYPESENSE_API_KEY="xyz"
export COLLECTION_NAME="books"
wait_for_typesense() {
echo "Waiting for Typesense to be ready..."
local max_attempts=30
local attempt=1
while [ $attempt -le $max_attempts ]; do
if curl -s -o /dev/null -w "%{http_code}" "${TYPESENSE_HOST}/health" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" | grep -q "200"; then
echo "Typesense is ready"
return 0
fi
echo " Attempt ${attempt}/${max_attempts}..."
sleep 2
((attempt++))
done
echo "Typesense failed to start after ${max_attempts} attempts"
exit 1
}
echo "Creating collection schema..."
curl -s -X POST "${TYPESENSE_HOST}/collections" \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"name": "'${COLLECTION_NAME}'",
"fields": [
{"name": "id", "type": "string"},
{"name": "title", "type": "string"},
{"name": "author", "type": "string", "facet": true},
{"name": "genre", "type": "string", "facet": true},
{"name": "description", "type": "string"},
{"name": "publication_year", "type": "int32", "facet": true},
{"name": "rating", "type": "float"},
{"name": "pages", "type": "int32"}
],
"default_sorting_field": "rating"
}'
curl -s -X POST "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/documents" \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"id": "1",
"title": "The Great Gatsby",
"author": "F. Scott Fitzgerald",
"genre": "Classic Fiction",
"description": "A classic American novel about the Jazz Age and the American Dream.",
"publication_year": 1925,
"rating": 4.2,
"pages": 180
}'
curl -s -X POST "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/documents" \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"id": "2",
"title": "To Kill a Mockingbird",
"author": "Harper Lee",
"genre": "Literary Fiction",
"description": "A powerful story of racial injustice and childhood innocence in the American South.",
"publication_year": 1960,
"rating": 4.5,
"pages": 376
}'
curl -s -X POST "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/documents" \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"id": "3",
"title": "1984",
"author": "George Orwell",
"genre": "Dystopian Fiction",
"description": "A dystopian novel about totalitarianism and surveillance.",
"publication_year": 1949,
"rating": 4.6,
"pages": 328
}'
curl -s -X POST "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/documents" \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"id": "4",
"title": "Pride and Prejudice",
"author": "Jane Austen",
"genre": "Romance",
"description": "A romantic novel about manners, upbringing, morality, and marriage.",
"publication_year": 1813,
"rating": 4.3,
"pages": 432
}'
curl -s -X POST "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/documents" \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"id": "5",
"title": "The Catcher in the Rye",
"author": "J.D. Salinger",
"genre": "Coming of Age",
"description": "A novel about teenage rebellion and alienation.",
"publication_year": 1951,
"rating": 3.8,
"pages": 277
}'
echo "Documents indexed successfully"
echo "Creating synonyms..."
curl -s -X PUT "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/synonyms/classic-synonyms" \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"synonyms": ["classic", "literature", "literary", "masterpiece"]
}'
curl -s -X PUT "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/synonyms/scifi-synonyms" \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"synonyms": ["dystopian", "sci-fi", "science fiction", "futuristic"]
}'
curl -s -X PUT "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/synonyms/romance-synonyms" \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
-d '{
"synonyms": ["romance", "romantic", "love story", "love"]
}'
echo "Synonyms created successfully"
sleep 2
echo "Test 1: Searching for 'classic' (should find literary fiction via synonyms)"
curl -s -G "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/documents/search" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
--data-urlencode "q=classic" \
--data-urlencode "query_by=title,description,genre" | jq -r '.hits[] | "- \(.document.title) by \(.document.author) [\(.document.genre)]"'
echo ""
echo "Test 2: Searching for 'sci-fi' (should find dystopian via synonyms)"
curl -s -G "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/documents/search" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
--data-urlencode "q=sci-fi" \
--data-urlencode "query_by=title,description,genre" | jq -r '.hits[] | "- \(.document.title) by \(.document.author) [\(.document.genre)]"'
echo ""
echo "Test 3: Searching for 'love story' (should find romance via synonyms)"
curl -s -G "${TYPESENSE_HOST}/collections/${COLLECTION_NAME}/documents/search" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
--data-urlencode "q=love story" \
--data-urlencode "query_by=title,description,genre" | jq -r '.hits[] | "- \(.document.title) by \(.document.author) [\(.document.genre)]"' This just creates a collection, indexes 5 documents and creates a couple of synonyms to test the results. Results
Results
There are some discrepancies (the dystopian and love story one isn't triggered) and "To Kill a mocking bird" is returned, even though its category is of literary fiction, and there's no synonym there. Another valuable addition to this would be to migrate the synonyms to new synonym sets associated with that collection, so users are aware of their older synonyms as well. On the new synonyms build, querying out to synonym sets returns an empty array, even though synonyms are being triggered: curl "http://localhost:8108/synonym_sets/" -X GET \
-H "Content-Type: application/json" \
-H "X-TYPESENSE-API-KEY: ${TYPESENSE_API_KEY}" \
[]% |
Change Summary
PR Checklist