From 4e95826bec5450044aa8860486de3a40a6644889 Mon Sep 17 00:00:00 2001 From: Sergei Grebnov Date: Tue, 21 Oct 2025 23:34:56 +0000 Subject: [PATCH 1/2] Document `partition_mode` DuckDB acceleration param (#1194) * Document `partition_mode` DuckDB acceleration param * Clarify partition_mode connection pooling details Updated description of `partition_mode` to clarify connection pooling behavior. * Update partition_mode description for clarity Enhanced explanation of 'partition_mode' to clarify performance benefits. --- website/docs/components/data-accelerators/duckdb.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/website/docs/components/data-accelerators/duckdb.md b/website/docs/components/data-accelerators/duckdb.md index 9d44e07fb..a3749c72c 100644 --- a/website/docs/components/data-accelerators/duckdb.md +++ b/website/docs/components/data-accelerators/duckdb.md @@ -38,6 +38,8 @@ DuckDB acceleration supports the following optional parameters under `accelerati - `duckdb_data_dir` (string, default:`.spice/data/`): Path to the directory the DuckDB database file(s) will be placed in. This is useful when using the `partition_by` acceleration parameter. If both `duckdb_data_dir` and `duckdb_file` are specified, `duckdb_file` will be used and `duckdb_data_dir` will be ignored. - `duckdb_memory_limit` (string, default: none): Limits DuckDB's memory usage for instance. Acceptable units are KB, MB, GB, TB (decimal: 1000^i) or KiB, MiB, GiB, TiB (binary: 1024^i). See [DuckDB memory limit documentation](https://duckdb.org/docs/stable/configuration/overview). - `duckdb_preserve_insertion_order` (boolean, default: `true`): Controls whether DuckDB preserves the insertion order of rows in tables. When set to `true`, rows are returned in the order they were inserted. See [DuckDB preserve insertion order documentation](https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads#the-preserve_insertion_order-option) and [order preservation documentation](https://duckdb.org/docs/stable/sql/dialect/order_preservation). +- `partition_mode` (string, default: `files`): Controls how partitioned data is stored. Can only be used with `partition_by`. Set to `tables` to store partitions as separate tables within a single DuckDB database, improving resource usage through single shared connection pool for all partitions. Default `files` mode creates separate database files per partition with individual connection pools and generally faster query performance. +- `duckdb_partitioned_write_flush_threshold` (integer, default: `122880`): The number of rows buffered per partition before flushing data to acceleration storage. Only applicable when using `partition_mode: tables`. Using a larger value can improve write performance but requires more memory. Refer to the [datasets configuration reference](/docs/reference/spicepod/datasets.md#acceleration) for additional supported fields. From 06ca8360cfef22b35f2cd5b04940316a5e0a6c23 Mon Sep 17 00:00:00 2001 From: Kevin <4733573+kczimm@users.noreply.github.com.> Date: Fri, 24 Oct 2025 12:08:28 -0500 Subject: [PATCH 2/2] add s3_vectors_index_poll_interval --- website/docs/components/vectors/s3_vectors.md | 1 + 1 file changed, 1 insertion(+) diff --git a/website/docs/components/vectors/s3_vectors.md b/website/docs/components/vectors/s3_vectors.md index afa82d4bf..82400d417 100644 --- a/website/docs/components/vectors/s3_vectors.md +++ b/website/docs/components/vectors/s3_vectors.md @@ -43,6 +43,7 @@ embeddings: | `s3_vectors_bucket` | The S3 vectors bucket to use. If `s3_vectors_index` is not specified, an index will be created based on the underlying embedding column. Incompatible with `s3_vectors_arn` | `a-bucket` | | `s3_vectors_index` | The name of the s3 vectors index to use or create. Incompatible with `s3_vectors_arn`. | `index-of-important-embeddings` | | `s3_vectors_distance_metric` | The distance metric to be used for similarity search. One of: `euclidean`, `cosine`. Default `cosine`. | `euclidean` | +| `s3_vectors_index_poll_interval` | The interval to poll for index updates to avoid excessive API calls. Minimum 5 seconds. Default is to poll on every scan. | `5m` | :::warning[Limitations]