-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Adds summary output to CLI instrumented object stores #18045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Adds a `RequestSummary` type for the instrumented object store to display summary statistics about instrumented requests - Adds a generic Stats type to track the statistics for the summary - Adds tests for the new code - Adds a basic summary output to the user-facing display when profiling is enabled - Adds docs for new and newly exported public items
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- thank you @BlakeOrth
| Summaries: | ||
| Get | ||
| count: 1 | ||
| [SUMMARY_DURATION] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a nice to have future feature, it would be great to have these values be more human readable -- e.g. in stead of 17446363 B have it be 17.4 MB
|
Another potential improvement is to reduce the number of lines used for display: From: To something like but then again maybe that is even harder to read 🤔 |
|
@alamb Yeah, I'm not sure what the best way to display these is in a concise but readable manner. I think in the original issue for this I had also postulated that perhaps a table output, similar to the table information and query results might work. Admittedly I'm probably not the right person to make that decision; UI/UX are not my strong suit. |
I made a PR with this It isn't ready for review yet but I'll get it ready tomorrow |
## Which issue does this PR close? - part of #17207 ## Rationale for this change As suggested by @BlakeOrth in #18045 (comment) here is an attempt to improve the output of datafusion object store trace profiling: ## What changes are included in this PR? Update the output format when `\object_store_profiling summary` is set Current format (on main, before this PR): ```sql Summaries: Get count: 2 duration min: 0.024603s duration max: 0.031946s duration avg: 0.028274s size min: 8 B size max: 34322 B size avg: 17165 B size sum: 34330 B ``` New format (after this PR): ```sql DataFusion CLI v50.2.0 > \object_store_profiling summary ObjectStore Profile mode set to Summary > select count(*) from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'; +----------+ | count(*) | +----------+ | 1000000 | +----------+ 1 row(s) fetched. Elapsed 6.754 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Summary, inner: HttpStore Summaries: +-----------+----------+-----------+-----------+-----------+-----------+-------+ | Operation | Metric | min | max | avg | sum | count | +-----------+----------+-----------+-----------+-----------+-----------+-------+ | Get | duration | 0.031645s | 0.047780s | 0.039713s | 0.079425s | 2 | | Get | size | 8 B | 34322 B | 17165 B | 34330 B | 2 | +-----------+----------+-----------+-----------+-----------+-----------+-------+ ``` ## Are these changes tested? Yes ## Are there any user-facing changes? Nicer datafusion-cli output
Which issue does this PR close?
This does not fully close, but is an incremental building block component for:
datafusion-cli] Add a way to see what object store requests are made #17207The full context of how this code is likely to progress can be seen in the POC for this effort:
Rationale for this change
For particularly large requests, in terms of number of objects in a table or large objects, the number of operations for a query may be quite large. In these cases, understanding the aggregate impact of various object store operations is likely the best way to understand the impact those operations had on a particular query. This PR allows users of an instrumented object store to understand and display basic summary statistics related to the
RequestDetailscollected during a query.What changes are included in this PR?
RequestSummarytype for the instrumented object store to display summary statistics about instrumented requestsAre these changes tested?
Yes. The new functionality has tests implemented, aside from testing the actual display output. The functional output can be seen below:
Are there any user-facing changes?
Yes? Just like the previous PR this does change the user-facing output, but there's no API breaking changes.
cc @alamb