Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Oct 15, 2025

Which issue does this PR close?

Rationale for this change

As suggested by @BlakeOrth in #18045 (comment) here is an attempt to improve the output of datafusion object store trace profiling:

What changes are included in this PR?

Update the output format when \object_store_profiling summary is set

Current format (on main, before this PR):

Summaries:
Get
count: 2
duration min: 0.024603s
duration max: 0.031946s
duration avg: 0.028274s
size min: 8 B
size max: 34322 B
size avg: 17165 B
size sum: 34330 B

New format (after this PR):

DataFusion CLI v50.2.0
> \object_store_profiling summary
ObjectStore Profile mode set to Summary
> select count(*) from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
+----------+
| count(*) |
+----------+
| 1000000  |
+----------+
1 row(s) fetched.
Elapsed 6.754 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: HttpStore
Summaries:
+-----------+----------+-----------+-----------+-----------+-----------+-------+
| Operation | Metric   | min       | max       | avg       | sum       | count |
+-----------+----------+-----------+-----------+-----------+-----------+-------+
| Get       | duration | 0.031645s | 0.047780s | 0.039713s | 0.079425s | 2     |
| Get       | size     | 8 B       | 34322 B   | 17165 B   | 34330 B   | 2     |
+-----------+----------+-----------+-----------+-----------+-----------+-------+

Are these changes tested?

Yes

Are there any user-facing changes?

Nicer datafusion-cli output

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 15, 2025
@BlakeOrth
Copy link
Contributor

This is a much improved way to display the summaries! I was agonizing over how to use Rust format padding etc. to build out the table, using a RecordBatch to get the formatted output for "free" is clever.

I have a PR queued up that instruments LIST that I was going to submit as soon as

hits main (helps avoid resolving merge conflicts etc). Does it make sense to hold off finalizing these changes until we get LIST instrumented so we can ensure this behaves as expected with multiple operations?

Comment on lines 665 to 676
assert_snapshot!(RequestSummaries::new(&requests), @r"
+-----------+----------+-----------+-----------+-----------+-----------+
| Operation | Metric | min | max | avg | sum |
+-----------+----------+-----------+-----------+-----------+-----------+
| Get | duration | 5.000000s | 5.000000s | 5.000000s | 5.000000s |
| Get | size | 100 B | 100 B | 100 B | 100 B |
+-----------+----------+-----------+-----------+-----------+-----------+
");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test changes are the only thing that's really caused me to pause and think some. I really like how it's reduced the overall size of the test and improved readability. However, I think it comes with the tradeoff that if there was a regression of some sort that caused the test to fail it's going to be a bit harder to hunt down exactly what's happening since it's now behind a display method implementation. I don't think I have a strong opinion on the "right" path forward here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that now the test (implicitly) also relies on display / conversion behavior which might make it harder to track down

However, given the display output is also the end user API (aka what people will actually see) I think this is actually a good thing (it won't do any good if the summarizing logic is correct, but then it is not displayed correctly)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. I think there was some discussion about building a machine-readable output, such as JSON, as well. That output would likely require its own set of test cases as it is though, so I think similar justification applies to both.

@alamb alamb force-pushed the alamb/better_summary_display branch from 1f0d3b3 to 851cdb5 Compare October 16, 2025 13:12
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Oct 16, 2025
@alamb alamb force-pushed the alamb/better_summary_display branch from 851cdb5 to 0677fe9 Compare October 16, 2025 13:29
@alamb alamb force-pushed the alamb/better_summary_display branch from 0677fe9 to 5f5de60 Compare October 16, 2025 13:34
@alamb alamb marked this pull request as ready for review October 16, 2025 13:36
Copy link
Contributor

@BlakeOrth BlakeOrth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small comment that I caught while looking through this. I have no power here, but these changes look good to me!

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb
WDYT should it be documented?

@alamb
Copy link
Contributor Author

alamb commented Oct 17, 2025

Thank you for the review @comphead

Thanks @alamb WDYT should it be documented?

Great suggestion!

@BlakeOrth did document the command here:
https://datafusion.apache.org/user-guide/cli/usage.html#commands

However, it seems to me like maybe an example in the docs would be good. I will add one

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 17, 2025
@alamb alamb added this pull request to the merge queue Oct 17, 2025
@alamb
Copy link
Contributor Author

alamb commented Oct 17, 2025

Thanks @BlakeOrth and @comphead

Merged via the queue into apache:main with commit 5a074ea Oct 17, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants