DuckDBRowReader: Add class to stream DuckDB results efficiently #303

rstefanic · 2025-09-26T14:21:23Z

There currently exists the DuckDBResultReader, but the issue that we're running into is that this processes chunks into an internal buffer that's never cleared. For large result sets, this buffer builds and builds until we run out of memory.

The case that we have is that we just need the result set row by row without needing to compute any of the column data. This new class does less than DuckDBResultReader by not storing the results of the query in the underlying buffer. Instead, the results of the chunk are passed back to the caller where the caller can handle the chunks as they wish.

The interface for processing rows is similar to DuckDBRowReader where the caller can still use converters as they wish to process their data.

From a DuckDBConnection, calling rowReaderStream will return a DuckDBRowReader where the caller can handle each individually processed chunk without having it written to an internal buffer for more memory efficient operations.

const rowReader = await connection.rowReaderStream(sql);

for await (const rows of rowReader.getRowObjects()) {

  // `rows` is a Record<string, DuckDBValue>[] from the last chunk that was processed.
  // Here, we can console.log each row without keeping the entire result set in memory.
  // We process the chunk of `rows` which is only available in this scope.

  for (const row or rows) {
    console.log(row);
  }
}

I didn't see any documentation on contributing, but if this is a feature you're interested in supporting, I'd be happy to change the formatting, add tests, change the API of DuckDBRowReader, etc. since this is an issue that we're facing at my job.

jraymakers

Thanks for the PR! I definitely support adding something like this to the API.

Another contributor submitted this related PR some time ago, but I haven't yet found the time to integrate it properly into the API by making it more general.

This PR comes closer, but I think we can do even better by combining the two approaches. I'm wondering if we can implement the async iterator protocol directly on DuckDBResult, as done in the above PR, but have it just yield chunks, instead of performing a particular conversion and yielding rows (as the above PR does).

Then, we can use this iterator to add other convenience methods to DuckDBResult that perform the various conversions and also return results row-by-row in addition to chunk-by-chunk. I think that would alleviate the need for a separate DuckDBRowReader class or a new method on DuckDBConnection.

Below are a few detailed comments on this PR, which may or may not be relevant if we adopt the alternate approach described above.

api/src/DuckDBConnection.ts

api/src/DuckDBRowReader.ts

jraymakers · 2025-09-26T17:36:00Z

Here are some examples of the API I'm envisioning.

Yield chunks:

const result = await connection.stream(sql);
for await (const chunk of result) {
  // ...
}

Yield rows (:

const result = await connection.stream(sql);
for await (const row of result.yieldRows()) {
  // ...
}

Yield row objects (:

const result = await connection.stream(sql);
for await (const rowObject of result.yieldRowObjects()) {
  // ...
}

With (general) conversion (:

const result = await connection.stream(sql);
for await (const row of result.yieldRows(converter)) {
  // ...
}

With specific conversion (:

const result = await connection.stream(sql);
for await (const row of result.yieldRowsJS()) {
  // ...
}

Plus other combinations for converting row objects, returning JSON, etc.

rstefanic · 2025-09-27T12:11:23Z

I've removed the method added to DuckDBConnection and deleted DuckDBRowReader. An async iterator has been added to DuckDBResult along with helper methods/wrappers that you wrote out. The only one I couldn't follow is yieldRows with two version (your example written above overloaded that method). So I named the one that uses a converter yieldConvertedRows.

I think I got all of them and the other combinations that you mentioned, but let me know if I missed one and I'll be happy to add it (or anything else you would like me to add)!

jraymakers

I definitely like where this is headed; it's looking very clean. Just a few comments about the details and the set of methods.

api/src/DuckDBResult.ts

jraymakers · 2025-09-27T18:47:37Z

Also: Once we have all the methods in place, it would be good to add some unit tests for each to api.test.ts.

(I apologize for the single large unit test file. I've been meaning to split it up, but haven't gotten to it yet.)

There currently exists the `DuckDBResultReader`, but the issue that we're running into is that this processes chunks into an internal buffer that's never cleared. For large result sets, this buffer builds and builds until we run out of memory. The case that we have is that we just need the result set row by row without needing to compute any of the column data. This new class does less than `DuckDBResultReader` by not storing the results of the query in the underlying buffer. Instead, the results of the chunk are passed back to the caller where the caller can handle the chunk as they wish. The interface for processing rows is similar to `DuckDBRowReader` where the caller can still use converters as they wish to process their data.

Adds an asyncIterator to the DuckDBResult class to return the raw DuckDBDataChunks. Helper methods to perform common operations such as returning row objects or returning the rows as JSON have been added here so that the caller can iterate over the results in a way that's convenient for them.

`yieldConvertedRows` has been refactored to use `convertRowsFromChunks`. `yeidlConvertedRowObjects` now uses `convertRowObjectsFromChunks` as the previous version of `yieldConvertedRows` did.

Converts the remaining declarations of `iterator` to just use `this` directly.

These were a bit backwards. `yieldRowsJs` and `yieldRowsJson` now call `yieldConvertedRows`. The object versions of these methods have been added here as `yieldRowObjectJs` and `yieldRowObjectJson` which both call `yieldConvertedRowObjects`.

The first test added is to ensure that the caller can iterate over the DuckDBResult object to get the data chunks. The last 4 tests added here test the `yieldConvertedRows` and `yieldConvertedRowObjects` along with the `yieldRows*` and `yieldRowObject*` methods since the latter methods call the former methods.

rstefanic · 2025-09-28T14:02:46Z

One thing to note that I forgot to add to my commit message is that I had to remove the async call before the JS/Json converter functions. It was returning a promise which was unnecessary and confusing.

I have added tests here, and they all passed on my branch when I ran them on my local machine; however there is now a conflict since my branch was a bit stale. On my machine, I have rebased my branch onto main and resolved the conflict, but the tests don't pass.

Even with a fresh pull of the repo from main has many of the tests failing. If you just want me to push my version with the merge conflict resolved, I can. But I wasn't sure if I wanted to push it since it was failing anyways.

EDIT: After writing that last message, I figured it was probably easier on you just to push the rebased version without the conflict; however I did want to note that the tests now fail on my machine since the rebase. But while I was developing this on a week old stale branch and creating the tests, they passed.

jraymakers · 2025-09-28T16:29:27Z

It seems the tests pass in this PR. Perhaps something is stale in your local environment? Have you tried pnpm run clean in both bindings and api?

jraymakers

Looks great! Just a couple comments about some additional test cases.

api/test/api.test.ts

jraymakers · 2025-09-28T16:44:23Z

BTW I was able to pull your fork and run both the build and tests just fine, both on main and your branch.

rstefanic · 2025-09-28T17:19:36Z

Your suggestion to run pnpm run clean and rebuild the bindings is what helped. I built those over a week ago which explains why my tests were failing. Thanks!

I added three new tests. One that generates more than one chunk, a test for yieldRows, and a test for `yieldRowObjects.

jraymakers reviewed Sep 26, 2025

View reviewed changes

jraymakers reviewed Sep 27, 2025

View reviewed changes

api/src/DuckDBResult.ts Show resolved Hide resolved

api/src/DuckDBResult.ts Outdated Show resolved Hide resolved

api/src/DuckDBResult.ts Outdated Show resolved Hide resolved

rstefanic added 7 commits September 28, 2025 09:52

DuckDBRowReader: Remove class and usage in DuckDBConnection

f6b56b3

DuckDBResult: Add yieldConvertedRowObjects

473613d

`yieldConvertedRows` has been refactored to use `convertRowsFromChunks`. `yeidlConvertedRowObjects` now uses `convertRowObjectsFromChunks` as the previous version of `yieldConvertedRows` did.

DuckDBResult: Use this instead of defining iterator variable

a8a239c

Converts the remaining declarations of `iterator` to just use `this` directly.

rstefanic force-pushed the feature/stream branch from 1c588c5 to 6e41488 Compare September 28, 2025 14:07

jraymakers reviewed Sep 28, 2025

View reviewed changes

api/test/api.test.ts Show resolved Hide resolved

api/test/api.test.ts Show resolved Hide resolved

rstefanic added 2 commits September 28, 2025 13:07

api.test: Add test case that tests generating more than one chunk

396c1f1

api.test: Add tests for yieldRows and yieldRowObjects

feadcbe

jraymakers approved these changes Sep 28, 2025

View reviewed changes

jraymakers merged commit be6f528 into duckdb:main Sep 28, 2025
5 checks passed

jraymakers mentioned this pull request Sep 28, 2025

Add async iterator on result #234

Closed

elefeint mentioned this pull request Oct 3, 2025

bump versions to release 1.4.0-r.2 #306

Merged

belgattitude mentioned this pull request Oct 7, 2025

Update to latest dependencies belgattitude/flowblade#823

Merged

DuckDBRowReader: Add class to stream DuckDB results efficiently #303

DuckDBRowReader: Add class to stream DuckDB results efficiently #303

Uh oh!

Conversation

rstefanic commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jraymakers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jraymakers commented Sep 26, 2025

Uh oh!

rstefanic commented Sep 27, 2025

Uh oh!

jraymakers left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jraymakers commented Sep 27, 2025

Uh oh!

rstefanic commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jraymakers commented Sep 28, 2025

Uh oh!

jraymakers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jraymakers commented Sep 28, 2025

Uh oh!

rstefanic commented Sep 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rstefanic commented Sep 26, 2025 •

edited

Loading

jraymakers left a comment •

edited

Loading

rstefanic commented Sep 28, 2025 •

edited

Loading