Skip to content

Conversation

@abelcha
Copy link

@abelcha abelcha commented Jun 19, 2025

Summary

This provides a high-level abstraction for result streaming that matches JavaScript language idioms alongside existing chunk-based APIs.
it permit to iterate over query results using for await loops

Usage Example

const result = await connection.run('SELECT * FROM large_table');

for await (const row of result) {
  console.log(row);
}

Features Added

  • Async Iterator Implementation: Added [Symbol.asyncIterator]() method to DuckDBResult class

Technical Details

  • The async iterator fetches chunks progressively, reducing memory usage for large result sets
  • Maintains compatibility with existing DuckDBResult API
  • Properly handles edge cases like empty results and null values

Testing

the tests verify:

  • Correct iteration behavior
  • Memory-efficient chunk fetching
  • Proper handling of edge cases
  • Early termination scenarios

@abelcha abelcha changed the title Add async iterator Add async iterator on result Jun 19, 2025
@jraymakers
Copy link
Contributor

Thanks for the PR! This is a very cool idea.

To make it even better, and to fit in with the rest of the API, it should allow iterating over either row arrays or row objects, and support the raw or converted (to JS, JSON, or custom) variants. To make that maintainable, we'd like need an async chunk iterator as a building block.

If you'd like to give that a shot, go ahead, or I can try to outline the API I have in mind when I get some time.

@abelcha
Copy link
Author

abelcha commented Jun 27, 2025

I tried wiring up support for all the variants, but it add a lot of stuff in the codebase, i feel like the kind of call that’s yours to make. This is just a minimal version that could serve as a base.

this binding is already a blessing compared to the first one — I’d rather not mess it up

Performance-wise, I was surprised how much per-row object creation adds up. With a template object + Object.create for each row i got a ~10% improvement though it’s hard to benchmark. but yeah at this level its best to let the consumer choose to eat the cost or not

I’m working on a more experimental, fully typed high-level DuckDB TypeScript runtime, and this is the UX I’ve landed on based on the select return value:

alt text

  • im mapping Bigint to Number so it simplifies a lot

@jraymakers
Copy link
Contributor

Yes, the reason for the variants is to provide a choice between convenience and performance. Generally the column-oriented ones are going to perform better than the row-oriented ones, and raw arrays will perform better than objects, but for small results it doesn't matter, and rows and objects can be convenient at times.

Supporting all the variants without a lot of code duplication that's hard to maintain took some iteration. I think it could be done while also supporting async iterators, but it will take some experimentation, which I haven't had time for yet. (I still hope to, though probably not very soon.)

That library/runtime you're building looks interesting. How are you ensuring the results are correctly typed? I'd like to provide better typing for results, but I haven't discovered a good way yet. (See #140.)

@abelcha
Copy link
Author

abelcha commented Jul 15, 2025

I follow a similar approach to convex.dev, where intermediate schemas are written to a local .buckdb/ directory.

Either on first execution it inspects .columnTypes() dynamically, or — if you’re in a live environment — it can describe the schema ahead of time (e.g. https://buckdb.pages.dev).

It also codegens phantom types from duckdb_functions() and duckdb_types() to produce full method signatures and static type info for function calls.

Then it use TS generics to handle joins, CTEs, name aliases, etc. to infer return value
src/build.types.ts

@missinglink
Copy link

missinglink commented Aug 18, 2025

FWIW these simple iterators work well for me:

static async *iterate (res: DuckDBResult): AsyncIterable<DuckDBValue[]> {
  while (true) {
    let chunk = await res.fetchChunk()
    if (!chunk?.rowCount) break
    for (const row of chunk.getRows()) {
      yield row
    }
  }
}

static async *iterateObjects<T extends Record<string,DuckDBValue>> (res: DuckDBResult): AsyncIterable<T> {
  const columnNames = res.deduplicatedColumnNames()
  while (true) {
    let chunk = await res.fetchChunk()
    if (!chunk?.rowCount) break
    for (const row of chunk.getRowObjects(columnNames)) {
      yield row as T
    }
  }
}

Being able to specify a type for the return values is nice:

const rows = await Foo.iterateObjects<{id: number, name: string}>(`
  SELECT id, name FROM example
)

for await (const row of rows){
  console.error(row)
}

@jraymakers
Copy link
Contributor

We integrated the core idea of this PR (the async iterator on DuckDBResult) into this other one: #303

Thanks for the contribution!

@jraymakers jraymakers closed this Sep 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants