Skip to content

Commit 4d112a3

Browse files
committed
Merge update.redisearch into mixed_workload
- Resolved conflict in run.py by combining examples from both branches - Kept mixed workload functionality (insert_fraction, mixed_workload_seed) - Added new functionality from update.redisearch (engines-file, describe options)
2 parents 397f874 + b5d1de8 commit 4d112a3

File tree

8 files changed

+1395
-827
lines changed

8 files changed

+1395
-827
lines changed

DOCKER_README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ docker run --rm redis/vector-db-benchmark:latest run.py --describe datasets
5858
# Basic Redis benchmark (requires local Redis)
5959
docker run --rm -v $(pwd)/results:/app/results --network=host \
6060
redis/vector-db-benchmark:latest \
61-
run.py --host localhost --engines redis-default-simple --dataset random-100
61+
run.py --host localhost --engines redis-default-simple --datasets random-100
6262
```
6363

6464
## Features
@@ -78,12 +78,12 @@ docker run --rm -v $(pwd)/results:/app/results --network=host \
7878
### Redis 8.2 with RediSearch
7979
```bash
8080
# Start Redis 8.2 with built-in vector support
81-
docker run -d --name redis-test -p 6379:6379 redis:8.2-rc1-bookworm
81+
docker run -d --name redis-test -p 6379:6379 redis:8.2-bookworm
8282

8383
# Run benchmark
8484
docker run --rm -v $(pwd)/results:/app/results --network=host \
8585
redis/vector-db-benchmark:latest \
86-
run.py --host localhost --engines redis-default-simple --dataset glove-25-angular
86+
run.py --host localhost --engines redis-default-simple --datasets glove-25-angular
8787
```
8888

8989

@@ -103,18 +103,18 @@ docker run --rm redis/vector-db-benchmark:latest run.py --describe engines
103103
# Quick test with small dataset
104104
docker run --rm -v $(pwd)/results:/app/results --network=host \
105105
redis/vector-db-benchmark:latest \
106-
run.py --host localhost --engines redis-default-simple --dataset random-100
106+
run.py --host localhost --engines redis-default-simple --datasets random-100
107107

108108
# Comprehensive benchmark with multiple configurations
109109
docker run --rm -v $(pwd)/results:/app/results --network=host \
110110
redis/vector-db-benchmark:latest \
111-
run.py --host localhost --engines "*redis*" --dataset glove-25-angular
111+
run.py --host localhost --engines "*redis*" --datasets glove-25-angular
112112

113113
# With Redis authentication
114114
docker run --rm -v $(pwd)/results:/app/results --network=host \
115115
-e REDIS_AUTH=mypassword -e REDIS_USER=myuser \
116116
redis/vector-db-benchmark:latest \
117-
run.py --host localhost --engines redis-default-simple --dataset random-100
117+
run.py --host localhost --engines redis-default-simple --datasets random-100
118118
```
119119

120120
### Results Analysis

README.md

Lines changed: 71 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -112,13 +112,13 @@ For testing with Redis, start a Redis container first:
112112

113113
```bash
114114
# Start Redis container
115-
docker run -d --name redis-test -p 6379:6379 redis:8.2-rc1-bookworm
115+
docker run -d --name redis-test -p 6379:6379 redis:8.2-bookworm
116116

117117
# Run benchmark against Redis
118118

119119
docker run --rm -v $(pwd)/results:/app/results --network=host \
120120
redis/vector-db-benchmark:latest \
121-
run.py --host localhost --engines redis-default-simple --dataset random-100
121+
run.py --host localhost --engines redis-default-simple --datasets random-100
122122

123123
# Or use the convenience script
124124
./docker-run.sh -H localhost -e redis-default-simple -d random-100
@@ -221,14 +221,23 @@ Run the benchmark:
221221

222222
```bash
223223
# Basic usage examples
224-
python run.py --engines redis-default-simple --dataset random-100
225-
python run.py --engines redis-default-simple --dataset glove-25-angular
226-
python run.py --engines "*-m-16-*" --dataset "glove-*"
224+
python run.py --engines redis-default-simple --datasets random-100
225+
python run.py --engines redis-default-simple --datasets glove-25-angular
226+
python run.py --engines "*-m-16-*" --datasets "glove-*"
227+
228+
# Using custom engine configurations from a JSON file
229+
python run.py --engines-file custom_engines.json --datasets glove-25-angular
230+
231+
# Get information about available engines (with pattern matching)
232+
python run.py --engines "*redis*" --describe engines --verbose
233+
234+
# Get information about engines from a custom file
235+
python run.py --engines-file custom_engines.json --describe engines --verbose
227236

228237
# Docker usage (recommended)
229238
docker run --rm -v $(pwd)/results:/app/results --network=host \
230239
redis/vector-db-benchmark:latest \
231-
run.py --host localhost --engines redis-default-simple --dataset random-100
240+
run.py --host localhost --engines redis-default-simple --datasets random-100
232241

233242
# Get help
234243
python run.py --help
@@ -237,6 +246,62 @@ python run.py --help
237246
Command allows you to specify wildcards for engines and datasets.
238247
Results of the benchmarks are stored in the `./results/` directory.
239248

249+
## Using Custom Engine Configurations
250+
251+
The benchmark tool supports two ways to specify which engine configurations to use:
252+
253+
### 1. Pattern Matching (Default)
254+
Use the `--engines` flag with wildcard patterns to select configurations from the `experiments/configurations/` directory:
255+
256+
```bash
257+
python run.py --engines "*redis*" --datasets glove-25-angular
258+
python run.py --engines "qdrant-m-*" --datasets random-100
259+
```
260+
261+
### 2. Custom Configuration File
262+
Use the `--engines-file` flag to specify a JSON file containing custom engine configurations:
263+
264+
```bash
265+
python run.py --engines-file my_engines.json --datasets glove-25-angular
266+
```
267+
268+
The JSON file should contain an array of engine configuration objects. Each configuration must have a `name` field and follow the same structure as configurations in `experiments/configurations/`:
269+
270+
```json
271+
[
272+
{
273+
"name": "my-custom-redis-config",
274+
"engine": "redis",
275+
"connection_params": {},
276+
"collection_params": {
277+
"algorithm": "hnsw",
278+
"data_type": "FLOAT32",
279+
"hnsw_config": {
280+
"M": 16,
281+
"DISTANCE_METRIC": "L2",
282+
"EF_CONSTRUCTION": 200
283+
}
284+
},
285+
"search_params": [
286+
{
287+
"parallel": 1,
288+
"top": 10,
289+
"search_params": {
290+
"ef": 100,
291+
"data_type": "FLOAT32"
292+
}
293+
}
294+
],
295+
"upload_params": {
296+
"parallel": 16,
297+
"data_type": "FLOAT32"
298+
}
299+
}
300+
]
301+
```
302+
303+
**Note:** You cannot use both `--engines` and `--engines-file` at the same time.
304+
240305
## How to update benchmark parameters?
241306

242307
Each engine has a configuration file, which is used to define the parameters for the benchmark.

benchmark/dataset.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import os
22
import shutil
33
import tarfile
4+
import bz2
45
import urllib.request
56
import urllib.parse
67
from dataclasses import dataclass, field
@@ -201,6 +202,19 @@ def _extract_or_move_file(self, tmp_path, target_path):
201202
with tarfile.open(tmp_path) as file:
202203
file.extractall(target_path)
203204
os.remove(tmp_path)
205+
elif tmp_path.endswith(".bz2"):
206+
print(f"Extracting bz2: {tmp_path} -> {target_path}")
207+
Path(target_path).parent.mkdir(exist_ok=True)
208+
# Remove .bz2 extension from target path if present
209+
if str(target_path).endswith(".bz2"):
210+
final_target_path = str(target_path)[:-4] # Remove .bz2
211+
else:
212+
final_target_path = target_path
213+
214+
with bz2.BZ2File(tmp_path, 'rb') as f_in:
215+
with open(final_target_path, 'wb') as f_out:
216+
shutil.copyfileobj(f_in, f_out)
217+
os.remove(tmp_path)
204218
else:
205219
print(f"Moving: {tmp_path} -> {target_path}")
206220
Path(target_path).parent.mkdir(exist_ok=True)

datasets/datasets.json

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -982,13 +982,53 @@
982982
"vector_count": 100000,
983983
"description": "Image embeddings"
984984
},
985+
{
986+
"name": "dbpedia-openai-1M-512-angular",
987+
"vector_size": 512,
988+
"distance": "cosine",
989+
"type": "h5",
990+
"path": "dbpedia-openai-1M-512-angular/dbpedia_openai_1M",
991+
"link": "http://benchmarks.redislabs.s3.amazonaws.com/vecsim/dbpedia/dbpedia-openai-1M-text-embedding-3-large-512d.hdf5",
992+
"vector_count": 1000000,
993+
"description": "Knowledge embeddings"
994+
},
995+
{
996+
"name": "dbpedia-openai-1M-1024-angular",
997+
"vector_size": 1024,
998+
"distance": "cosine",
999+
"type": "h5",
1000+
"path": "dbpedia-openai-1M-1024-angular/dbpedia_openai_1M",
1001+
"link": "http://benchmarks.redislabs.s3.amazonaws.com/vecsim/dbpedia/dbpedia-openai-1M-text-embedding-3-large-1024d.hdf5",
1002+
"vector_count": 1000000,
1003+
"description": "Knowledge embeddings"
1004+
},
9851005
{
9861006
"name": "dbpedia-openai-1M-1536-angular",
9871007
"vector_size": 1536,
9881008
"distance": "cosine",
989-
"type": "tar",
1009+
"type": "h5",
9901010
"path": "dbpedia-openai-1M-1536-angular/dbpedia_openai_1M",
991-
"link": "https://storage.googleapis.com/ann-filtered-benchmark/datasets/dbpedia_openai_1M.tgz",
1011+
"link": "http://benchmarks.redislabs.s3.amazonaws.com/vecsim/dbpedia/dbpedia-openai-1M-text-embedding-3-large-1536d.hdf5",
1012+
"vector_count": 1000000,
1013+
"description": "Knowledge embeddings"
1014+
},
1015+
{
1016+
"name": "dbpedia-openai-1M-2048-angular",
1017+
"vector_size": 2048,
1018+
"distance": "cosine",
1019+
"type": "h5",
1020+
"path": "dbpedia-openai-1M-2048-angular/dbpedia_openai_1M",
1021+
"link": "http://benchmarks.redislabs.s3.amazonaws.com/vecsim/dbpedia/dbpedia-openai-1M-text-embedding-3-large-2048d.hdf5",
1022+
"vector_count": 1000000,
1023+
"description": "Knowledge embeddings"
1024+
},
1025+
{
1026+
"name": "dbpedia-openai-1M-3072-angular",
1027+
"vector_size": 3072,
1028+
"distance": "cosine",
1029+
"type": "h5",
1030+
"path": "dbpedia-openai-1M-3072-angular/dbpedia_openai_1M",
1031+
"link": "http://benchmarks.redislabs.s3.amazonaws.com/vecsim/dbpedia/dbpedia-openai-1M-text-embedding-3-large-3072d.hdf5",
9921032
"vector_count": 1000000,
9931033
"description": "Knowledge embeddings"
9941034
},

0 commit comments

Comments
 (0)