Skip to content

Conversation

@zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Feb 28, 2025

Which issue does this PR close?

Rationale for this change

When we only have one batch and size < max_rows, also size < preview limit, we should collect it at the end.

What changes are included in this PR?

When we only have one batch and size < max_rows, also size < preview limit, we should collect it at the end.

Add the collect logic at the end.

Are these changes tested?

Yes

Testing result

> select * from a;
Error during planning: table 'datafusion.public.a' not found
> CREATE EXTERNAL TABLE IF NOT EXISTS lineitem (
        l_orderkey BIGINT,
        l_partkey BIGINT,
        l_suppkey BIGINT,
        l_linenumber INTEGER,
        l_quantity DECIMAL(15, 2),
        l_extendedprice DECIMAL(15, 2),
        l_discount DECIMAL(15, 2),
        l_tax DECIMAL(15, 2),
        l_returnflag VARCHAR,
        l_linestatus VARCHAR,
        l_shipdate DATE,
        l_commitdate DATE,
        l_receiptdate DATE,
        l_shipinstruct VARCHAR,
        l_shipmode VARCHAR,
        l_comment VARCHAR
) STORED AS parquet
LOCATION '/Users/zhuqi/arrow-datafusion/benchmarks/data/tpch_sf10/lineitem';
0 row(s) fetched.
Elapsed 0.007 seconds.

> select l_comment from lineitem limit 1;
+--------------------------+
| l_comment                |
+--------------------------+
| nic packages. regular re |
+--------------------------+
1 row(s) fetched.
Elapsed 0.087 seconds.

Are there any user-facing changes?

@zhuqi-lucas zhuqi-lucas changed the title Datafusion cli one small batch Fix: New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows Feb 28, 2025
@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Feb 28, 2025

The bug which is found when i testing the following ticket:

#14909

cc @alamb @2010YOUY01 @xudong963

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas

I wonder if there is any way we can write a test for this logic so we don't accidentally break it again 🤔

@zhuqi-lucas
Copy link
Contributor Author

Thank you @alamb for review, i also add the testing case now!

@zhuqi-lucas
Copy link
Contributor Author

The testing include the following test, which was empty before this PR.

> SELECT * FROM generate_series(1, 5) t1(v1) ORDER BY v1 DESC;
+----+
| v1 |
+----+
| 5  |
| 4  |
| 3  |
| 2  |
| 1  |
+----+
5 row(s) fetched.
Elapsed 0.003 seconds.

@2010YOUY01 2010YOUY01 merged commit 463ef3b into apache:main Feb 28, 2025
24 checks passed
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas and @2010YOUY01 ❤️

alamb added a commit to alamb/datafusion that referenced this pull request Feb 28, 2025
…corner case for only one small batch which lines are less than max_rows (apache#14921)"

This reverts commit 463ef3b.
xudong963 pushed a commit that referenced this pull request Mar 1, 2025
…t, make it totally streaming printing without memory overhead (#14948)

* Revert "Fix: New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows (#14921)"

This reverts commit 463ef3b.

* Revert "Datafusion-cli: Redesign the datafusion-cli execution and print, make it totally streaming printing without memory overhead (#14877)"

This reverts commit 53fc94f.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New Datafusion-cli streaming printing way should handle corner case for only one small batch which lines are less than max_rows

3 participants