-
Notifications
You must be signed in to change notification settings - Fork 1.7k
POC: datafusion-cli instrumented object store #17266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- A WIP/POC of instrumenting the object store backing datafusion-cli operations
Thank you @BlakeOrth -- I tried this out and it looks very nice 👌
I have some ideas on how to make it configurable and easy to see. BRB |
@alamb Awesome, I'm looking forward to hearing your feedback. Just so you can understand where my head was at for this (currently very rough) implementation in terms of configuration:
Any feedback on the above, or general code structure/implementation is obviously welcome as well, so let me know your thoughts. This code is obviously a minimally functional example, but I'd rather incorporate feedback early than need to re-work a bunch of stuff! |
Yes, I agree with this
I think it would be ok to always use the "Instrumented Object REgistry" and then only pass back an instrumented object store if the profiling was enabled
That is perfect!
how about something like this (maybe you can implement one of these, perhaps summary, as the intial PR and we can add the others afterwards) # Prints an object store summary at the end of all results, after the query executed
\profile objectstore summary # prints one line per object store request at the end of all results, after the query executed
\profile objectstore trace # prints one line per object store request when it happens (potentially in the middle of results)
# this can be useful to understand how the requests happen over time
\profile objectstore inline
Indeed -- makes total sense I |
In my mind I was going to have time to work on this myself, but I fear that is not likely to be the case for a while (and I will be out for the next week or so on vacation, though I will be reviewing PRs as much as possible). If you are willing to help push this forward that would be most appreciated |
I'm happy to keep driving this effort as long as it sounds like it's moving in the right direction. My interpretation of your above comments is that we currently are moving along at least mostly the right path, so I should be able to generally make progress. (And do your best to enjoy your vacation! The code will be here when you return) I just pushed some changes the implement the summary output. I get the feeling we're about to learn quite a lot... DataFusion CLI v49.0.1
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
0 row(s) fetched.
Elapsed 2.587 seconds.
List Summary:
count: 1
Get Summary:
count: 288
duration min: 0.058361s
duration max: 0.374491s
duration avg: 0.122724s
size min: 8
size max: 44247
size avg: 18870
size sum: 5434702
List Summary:
count: 1
> select count(*) from 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/' where vendor_id='CMT';
+-----------+
| count(*) |
+-----------+
| 505603754 |
+-----------+
1 row(s) fetched.
Elapsed 56.057 seconds.
Get Summary:
count: 1126
duration min: 0.062342s
duration max: 1.831455s
duration avg: 0.397414s
size min: 47
size max: 112
size avg: 69
size sum: 78422
List Summary:
count: 4 (caveat: this is a debug build, not
I think this is an interesting thought and I'll take some time to explore this as a solution. Are you thinking the |
Yes indeed -- I think this will be super valuable
Yes that is what I was thinking I agree the implementation would be more complicated, but it would be easier to use (people wouldn't have to run datafusion-cli with a flag and then also have to enable/disable profiling) |
- Introduces object_store_profiling as a command and cli argument, with options of disabled, summary, and trace - Cleans up hacked in println output in favor of carrying the necessary objects in print_options and using the expected Writer for output
pub maxrows: MaxRows, | ||
pub color: bool, | ||
pub object_store_profile_mode: InstrumentedObjectStoreMode, | ||
pub instrumented_registry: Arc<InstrumentedObjectStoreRegistry>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love carrying this as a public member of this struct, but I didn't want to go making too many unnecessary changes without some additional reviewer input. I feel like PrintOptions
might be to the point where it needs a builder or perhaps a dedicated new()
method to better encapsulate its behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree it is time to encapsulate it
Ok(store) | ||
} | ||
|
||
pub mod instrumented { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put this into its own module because with this addition I feel like object_storage
has probably grown past the point of being a single file. It seems like it should probably become a directory with 2 sub-modules, but this change set is already getting large enough I didn't want to go making large organizational changes prior to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that putting the instrumented object store into object_storage/instrumented.rs
seems like a good idea
@alamb I've made some changes that I think get the functional side of the code where it needs to be and pushed those for review. This is still lacking tests and docs, but I thought it would be good to more or less have the architecture and functional code settled before wrapping everything up and marking it as ready for review. Here's a little demo of the output from the current code: $ ./datafusion-cli --object-store-profiling summary DataFusion CLI v49.0.1
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
0 row(s) fetched.
Elapsed 2.573 seconds.
Object Store Profiling
List Summary:
count: 1
Get Summary:
count: 288
duration min: 0.059328s
duration max: 0.714468s
duration avg: 0.128311s
size min: 8 B
size max: 44247 B
size avg: 18870 B
size sum: 5434702 B
List Summary:
count: 1
> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> select count(*) from 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
+------------+
| count(*) |
+------------+
| 1310903963 |
+------------+
1 row(s) fetched.
Elapsed 0.577 seconds.
Object Store Profiling
2025-08-28T18:49:22.066425183+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-08-28T18:49:22.303706449+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-08-28T18:49:22.419878784+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-08-28T18:49:22.493157995+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
List Summary:
count: 4
> \object_store_profiling disabled
ObjectStore Profile mode set to Disabled
> select count(*) from 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/' where vendor_id='CMT';
+-----------+
| count(*) |
+-----------+
| 505603754 |
+-----------+
1 row(s) fetched.
Elapsed 45.531 seconds.
> |
Thanks @BlakeOrth -- I am back and catching up on reviews. I will review this one shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @BlakeOrth -- I think this is looking good and we should proceed with implementation. Now that I am back, I am ready/willing to help do so as well.
Also, FYI @nuno-faria as I think you are interested in this area too.
I view this as an important first step towards being able to implement listing file caching reasonably:
Shall we use this PR ?
I think it is possible to break this into a few smaller PRs which might be faster to review:
- Add basic object store instrumentation and plumbing, but only instrument one operation (like
get
orlist
), and set the pattern - Other PRs to fill out the rest of the object store methods.
Also, BTW tried it out but it doesn't seem to be working anymore:
> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
0 row(s) fetched.
Elapsed 2.000 seconds.
Object Store Profiling
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
0 row(s) fetched.
Elapsed 0.688 seconds.
Object Store Profiling
>
pub maxrows: MaxRows, | ||
pub color: bool, | ||
pub object_store_profile_mode: InstrumentedObjectStoreMode, | ||
pub instrumented_registry: Arc<InstrumentedObjectStoreRegistry>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree it is time to encapsulate it
requests: Mutex<Vec<RequestDetails>>, | ||
} | ||
|
||
impl fmt::Display for InstrumentedObjectStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it a bit confusing that Display
also modified the object store -- maybe we could have an explicit take/display
function 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good point I hadn't even considered when writing this. Having the behavior of Display
be dependent on underlying object state and when you called it is weird implicit behavior.
I'm open to implementation recommendations here. I could see keeping the method as-is but changing the method signature to something that better communicates the underlying state modification and dropping the Display
impl. Another path I can see working is making the Display
impl not modify the underlying state, but push the state modification into another method a user has to call in order to clear the existing entries to be displayed. I'm sure there are other paths as well. One thing I liked about the existing implementation is it's more difficult for a user of the API to accidentally accumulate object store operations from multiple queries because removing them does not require the user to call an additional method.
Ok(store) | ||
} | ||
|
||
pub mod instrumented { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that putting the instrumented object store into object_storage/instrumented.rs
seems like a good idea
It does work when I ran it with the CLI flag:
😍 |
This is pretty cool! We can easily see the effects of the metadata caching: > CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 row(s) fetched.
Elapsed 0.106 seconds.
Object Store Profiling
2025-09-07T12:17:04.082191800+00:00 operation=Head duration=0.056605s path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:04.138839400+00:00 operation=Get duration=0.019972s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:04.158853600+00:00 operation=Get duration=0.019854s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
Get Summary:
count: 2
duration min: 0.019854s
duration max: 0.019972s
duration avg: 0.019913s
size min: 8 B
size max: 34322 B
size avg: 17165 B
size sum: 34330 B
Head Summary:
count: 1
duration min: 0.056605s
duration max: 0.056605s
duration avg: 0.056605s With the metadata cache, three requests are done: > select * from hits limit 1;
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
| WatchID | JavaEnable | Title | GoodEvent | EventTime | EventDate | CounterID | ClientIP | RegionID | UserID | CounterClass | OS | UserAgent | URL | Referer | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersion4 | PageCharset | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID | OriginalURL | HID | IsOldCounter | IsEvent | IsParameter | DontCountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash | URLHash | CLID |
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
| 8374547729199360385 | 1 | d0a2d0b5d181d1822028d0a0d0bed181d181d0b8d18f29202d20d0afd0bdd0b4d0b5d0bad181 | 1 | 1373893805 | 15901 | 62 | 1388530699 | 229 | 3217804679217022550 | 0 | 2 | 5 | 687474703a2f2f6972722e72752f696e6465782e7068703f73686f77616c62756d2f6c6f67696e2d6c656e697961373737373239342c393338333033313330 | 687474703a2f2f6b696e6f706f69736b2e72752f3f7374617465 | 0 | 10813 | 952 | 9500 | 520 | 1638 | 1658 | 37 | 15 | 7 | 373030 | 0 | 0 | 22 | 44efbfbd | 1 | 1 | 0 | 0 | | | 3830428 | -1 | 0 | | 0 | 0 | 1654 | 936 | 135 | 1373857827 | 4 | 1 | 16561 | 0 | 77696e646f7773 | 1601 | 0 | 0 | 0 | 8731137316151599477 | | 4563091 | 0 | 0 | 0 | 0 | 0 | 35 | 1373847066 | 0 | 0 | 0 | 0 | 0 | 1547096432 | -1 | -1 | -1 | 5330 | efbfbd0c | | | 0 | 0 | 0 | 190 | 987 | 55 | 35 | 0 | | 0 | | 4e481c | 0 | | | | | | | | | | | 0 | -1172318462146836803 | 2868770270353813622 | 0 |
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
1 row(s) fetched.
Elapsed 0.405 seconds.
Object Store Profiling
2025-09-07T12:17:16.132117800+00:00 operation=Head duration=0.023943s path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:16.157671300+00:00 operation=Get duration=0.028673s size=8134104 range: bytes=4-8134107 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:16.157998800+00:00 operation=Get duration=0.068516s size=97589907 range: bytes=77340716-174930622 path=hits_compatible/athena_partitioned/hits_1.parquet
Get Summary:
count: 2
duration min: 0.028673s
duration max: 0.068516s
duration avg: 0.048595s
size min: 8134104 B
size max: 97589907 B
size avg: 52862005 B
size sum: 105724011 B
Head Summary:
count: 1
duration min: 0.023943s
duration max: 0.023943s
duration avg: 0.023943s After disabling the metadata cache, a lot more requests will be done: > set datafusion.runtime.metadata_cache_limit = '0M';
0 row(s) fetched.
Elapsed 0.000 seconds.
Object Store Profiling
> select * from hits limit 1;
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
| WatchID | JavaEnable | Title | GoodEvent | EventTime | EventDate | CounterID | ClientIP | RegionID | UserID | CounterClass | OS | UserAgent | URL | Referer | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersion4 | PageCharset | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID | OriginalURL | HID | IsOldCounter | IsEvent | IsParameter | DontCountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash | URLHash | CLID |
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
| 8374547729199360385 | 1 | d0a2d0b5d181d1822028d0a0d0bed181d181d0b8d18f29202d20d0afd0bdd0b4d0b5d0bad181 | 1 | 1373893805 | 15901 | 62 | 1388530699 | 229 | 3217804679217022550 | 0 | 2 | 5 | 687474703a2f2f6972722e72752f696e6465782e7068703f73686f77616c62756d2f6c6f67696e2d6c656e697961373737373239342c393338333033313330 | 687474703a2f2f6b696e6f706f69736b2e72752f3f7374617465 | 0 | 10813 | 952 | 9500 | 520 | 1638 | 1658 | 37 | 15 | 7 | 373030 | 0 | 0 | 22 | 44efbfbd | 1 | 1 | 0 | 0 | | | 3830428 | -1 | 0 | | 0 | 0 | 1654 | 936 | 135 | 1373857827 | 4 | 1 | 16561 | 0 | 77696e646f7773 | 1601 | 0 | 0 | 0 | 8731137316151599477 | | 4563091 | 0 | 0 | 0 | 0 | 0 | 35 | 1373847066 | 0 | 0 | 0 | 0 | 0 | 1547096432 | -1 | -1 | -1 | 5330 | efbfbd0c | | | 0 | 0 | 0 | 190 | 987 | 55 | 35 | 0 | | 0 | | 4e481c | 0 | | | | | | | | | | | 0 | -1172318462146836803 | 2868770270353813622 | 0 |
+---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+---------------------+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+
1 row(s) fetched.
Elapsed 0.637 seconds.
Object Store Profiling
2025-09-07T12:17:31.352590200+00:00 operation=Head duration=0.020476s path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373586300+00:00 operation=Get duration=0.017040s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373610700+00:00 operation=Get duration=0.036302s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373727700+00:00 operation=Get duration=0.056047s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373893500+00:00 operation=Get duration=0.056901s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373614300+00:00 operation=Get duration=0.059208s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373669600+00:00 operation=Get duration=0.061699s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373679500+00:00 operation=Get duration=0.062037s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373698900+00:00 operation=Get duration=0.064385s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373688700+00:00 operation=Get duration=0.064557s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.390647700+00:00 operation=Get duration=0.050088s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373669100+00:00 operation=Get duration=0.071315s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373625700+00:00 operation=Get duration=0.071612s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.409928500+00:00 operation=Get duration=0.035328s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.430810100+00:00 operation=Get duration=0.019518s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.373877300+00:00 operation=Get duration=0.076532s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.429786100+00:00 operation=Get duration=0.021806s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.432826500+00:00 operation=Get duration=0.022541s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.435378800+00:00 operation=Get duration=0.020154s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.435720300+00:00 operation=Get duration=0.021886s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.438249100+00:00 operation=Get duration=0.020374s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.438117600+00:00 operation=Get duration=0.021973s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.445244800+00:00 operation=Get duration=0.019013s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.444999300+00:00 operation=Get duration=0.020614s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.441698900+00:00 operation=Get duration=0.024848s size=8134104 range: bytes=4-8134107 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.450412700+00:00 operation=Get duration=0.023832s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-09-07T12:17:31.465032100+00:00 operation=Get duration=0.021047s size=97589907 range: bytes=77340716-174930622 path=hits_compatible/athena_partitioned/hits_1.parquet
Get Summary:
count: 26
duration min: 0.017040s
duration max: 0.076532s
duration avg: 0.040025s
size min: 8 B
size max: 97589907 B
size avg: 4082152 B
size sum: 106135971 B
Head Summary:
count: 1
duration min: 0.020476s
duration max: 0.020476s
duration avg: 0.020476s Its also cool to see that it works on the local object store, but the output appears duplicated: > select * from t limit 1;
+---+---+
| k | v |
+---+---+
| 1 | 1 |
+---+---+
1 row(s) fetched.
Elapsed 0.002 seconds.
Object Store Profiling
2025-09-07T12:26:34.064444500+00:00 operation=Head duration=0.000192s path=17266/datafusion-cli/t.parquet
2025-09-07T12:26:34.064800300+00:00 operation=Get duration=0.000099s size=72166 range: bytes=4-72169 path=17266/datafusion-cli/t.parquet
Get Summary:
count: 1
duration min: 0.000099s
duration max: 0.000099s
duration avg: 0.000099s
size min: 72166 B
size max: 72166 B
size avg: 72166 B
size sum: 72166 B
Head Summary:
count: 1
duration min: 0.000192s
duration max: 0.000192s
duration avg: 0.000192s
2025-09-07T12:26:34.064443400+00:00 operation=Head duration=0.000194s path=17266/datafusion-cli/t.parquet
2025-09-07T12:26:34.064799400+00:00 operation=Get duration=0.000100s size=72166 range: bytes=4-72169 path=17266/datafusion-cli/t.parquet
Get Summary:
count: 1
duration min: 0.000100s
duration max: 0.000100s
duration avg: 0.000100s
size min: 72166 B
size max: 72166 B
size avg: 72166 B
size sum: 72166 B
Head Summary:
count: 1
duration min: 0.000194s
duration max: 0.000194s
duration avg: 0.000194s |
@alamb Thanks for the review! I'll take a look into why it's suddenly stopped working (or perhaps it's a "works on my machine" situation, which is also never good).
I'm happy to split and divide up the work in whatever manner you think will be best for reviews. I know that review bandwidth is almost always strained, so let me know how we can make that process the smoothest and I'm happy to facilitate as much as I can. I will note that splitting up the actual instrumentation of the Looking at the changes here, and the comments you left, I can see two easy PRs that can be done immediately to help streamline the implementation. They'd likely be a good first code contribution if any community members are looking for a simple task to pick up!
|
@nuno-faria thanks for giving this a test run! Looking at this output I initially thought you were correct that it was duplicated, but on closer inspection the timestamps and duration of the operations is different between the first and the 2nd case. This leads me to believe the object_store operations may actually get executed twice for some reason. Generating the text output currently leverages a |
- Fixes an issue where the first profiling command would not be recognized if the CLI was not started with profiling enabled
@alamb I've found the bug and fixed this behavior. Although this is one of those scenarios where I'm somewhat questioning how it ever worked correctly in terms of switching modes using commands. Regardless, I'm glad you chose to run a functional test case that I clearly had not run! The issue seems to have been with the initial command to change the state. Curiously, once the profiling had been enabled one time the changes thereafter seemed to work as expected. With the bug fix, the initial command now seems to be working though. /datafusion-cli$ ../target/debug/datafusion-cli DataFusion CLI v49.0.2
> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/';
0 row(s) fetched.
Elapsed 1.906 seconds.
Object Store Profiling
2025-09-08T23:55:35.057244770+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
List Summary:
count: 1
2025-09-08T23:55:35.395591891+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-09-08T23:55:35.630754482+00:00 operation=Get duration=0.100976s size=8 range: bytes=222192975-222192982 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-08T23:55:35.731796892+00:00 operation=Get duration=0.105280s size=38976 range: bytes=222153999-222192974 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-09-08T23:55:35.635551741+00:00 operation=Get duration=0.220758s size=8 range: bytes=217303101-217303108 path=nyc_taxi_rides/data/tripdata_parquet/data-200908.parquet
2025-09-08T23:55:35.632923076+00:00 operation=Get duration=0.238810s size=8 range: bytes=225659957-225659964 path=nyc_taxi_rides/data/tripdata_parquet/data-200904.parquet
2025-09-08T23:55:35.633575925+00:00 operation=Get duration=0.240542s size=8 range: bytes=232847298-232847305 path=nyc_taxi_rides/data/tripdata_parquet/data-200905.parquet
2025-09-08T23:55:35.638022172+00:00 operation=Get duration=0.237329s size=8 range: bytes=235166917-235166924 path=nyc_taxi_rides/data/tripdata_parquet/data-201001.parquet
2025-09-08T23:55:35.634262563+00:00 operation=Get duration=0.244533s size=8 range: bytes=224226567-224226574 path=nyc_taxi_rides/data/tripdata_parquet/data-200906.parquet
. . . Truncated for brevity . . .
2025-09-08T23:55:36.814991494+00:00 operation=Get duration=0.073694s size=19872 range: bytes=214807880-214827751 path=nyc_taxi_rides/data/tripdata_parquet/data-201603.parquet
2025-09-08T23:55:36.774456617+00:00 operation=Get duration=0.124367s size=15508 range: bytes=158722835-158738342 path=nyc_taxi_rides/data/tripdata_parquet/data-201612.parquet
2025-09-08T23:55:36.837998603+00:00 operation=Get duration=0.064300s size=18219 range: bytes=200688011-200706229 path=nyc_taxi_rides/data/tripdata_parquet/data-201602.parquet
List Summary:
count: 1
Get Summary:
count: 288
duration min: 0.057396s
duration max: 0.357809s
duration avg: 0.108023s
size min: 8 B
size max: 44247 B
size avg: 18870 B
size sum: 5434702 B
> |
Thank you! @BlakeOrth I am very interested in getting this PR merged in as a precursor to some other work we are doing. Would you be willing to accept patches / commits to this branch (I could add some tests, for example)? |
@alamb Please do! My interest lies in keeping the code moving forward, not hoarding tasks! |
I have not forgotten about this PR -- I am just busy with other projects now. I will come back to this soon (TM) |
@alamb It's not entirely clear to me how we should proceed to keep this effort moving. There's been some discussion of using this PR and polishing it up with tests/docs etc vs splitting it up into several smaller PRs. I'm happy to help facilitate either approach. Do you have an idea on how you'd like to move forward? |
Thanks @BlakeOrth Basically I think the code needs some tests before merging I am suggesting breaking it into smaller PRs to (selfishly) make the review easier (as I can find smaller amounts of time to review PRs but finding large continguous blocks to review larger PRs is challenging) |
@alamb OK, great. I completely understand the desire to have smaller reviews. Would you prefer to reduce the scope of this PR for the first iteration to better preserve the discussion/progression and how it's associated with the work, or should I close this PR and a new one can be opened with reduced scope? |
How about we close this one and start making smaller PRs to build up to it |
Thank you for your patience @BlakeOrth |
@alamb Perfect, I will consider this POC complete at this point and will carve out some time to submit an initial, smaller PR to get us started. As I noted in #17214 (comment) some other efforts that are high priority for me need to be addressed first. My hope is those will be resolved within a week or so, at which point I will be able to get back to these changes with a more concerted effort. I will make sure to tag you once I get the first PR ready and submitted! |
Thank you @BlakeOrth 🙏 |
## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - #17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - #17266 ## Rationale for this change Continued progress filling out the methods that are instrumented for the instrumented object store. ## What changes are included in this PR? - Adds instrumentation around basic list operations into the instrumented object store - Adds test cases for new code ## Are these changes tested? Yes. Example output: ```sql DataFusion CLI v50.2.0 > \object_store_profiling trace ObjectStore Profile mode set to Trace > CREATE EXTERNAL TABLE nyc_taxi_rides STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet'; 0 row(s) fetched. Elapsed 2.679 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.512970085+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet Summaries: List count: 1 Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data) 2025-10-16T18:53:09.929709943+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.106757629+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet 2025-10-16T18:53:10.220555058+00:00 operation=Get duration=0.230604s size=8 range: bytes=222192975-222192982 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet 2025-10-16T18:53:10.226399832+00:00 operation=Get duration=0.263826s size=8 range: bytes=233123927-233123934 path=nyc_taxi_rides/data/tripdata_parquet/data-201104.parquet 2025-10-16T18:53:10.226194195+00:00 operation=Get duration=0.269754s size=8 range: bytes=252843253-252843260 path=nyc_taxi_rides/data/tripdata_parquet/data-201103.parquet . . . 2025-10-16T18:53:11.928787014+00:00 operation=Get duration=0.072248s size=18278 range: bytes=201384109-201402386 path=nyc_taxi_rides/data/tripdata_parquet/data-201509.parquet 2025-10-16T18:53:11.933475464+00:00 operation=Get duration=0.068880s size=17175 range: bytes=195411804-195428978 path=nyc_taxi_rides/data/tripdata_parquet/data-201601.parquet 2025-10-16T18:53:11.949629591+00:00 operation=Get duration=0.065645s size=19872 range: bytes=214807880-214827751 path=nyc_taxi_rides/data/tripdata_parquet/data-201603.parquet Summaries: List count: 2 Get count: 288 duration min: 0.060930s duration max: 0.444601s duration avg: 0.133339s size min: 8 B size max: 44247 B size avg: 18870 B size sum: 5434702 B > ``` ## Are there any user-facing changes? No-ish ## cc @alamb
## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - #17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - #17266 ## Rationale for this change Continued progress filling out methods that are instrumented by the instrumented object store ## What changes are included in this PR? - Adds instrumentation around delimited list operations into the instrumented object store - Adds test cases for the new code ## Are these changes tested? Yes, unit tests have been added. Example output: ```sql DataFusion CLI v50.2.0 > CREATE EXTERNAL TABLE overture_partitioned STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/theme=addresses/'; 0 row(s) fetched. Elapsed 2.307 seconds. > \object_store_profiling trace ObjectStore Profile mode set to Trace > select count(*) from overture_partitioned; +-----------+ | count(*) | +-----------+ | 446544475 | +-----------+ 1 row(s) fetched. Elapsed 1.932 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(overturemaps-us-west-2) 2025-10-17T17:05:27.922724180+00:00 operation=List duration=0.132154s path=release/2025-09-24.0/theme=addresses 2025-10-17T17:05:28.054894440+00:00 operation=List duration=0.049048s path=release/2025-09-24.0/theme=addresses/type=address 2025-10-17T17:05:28.104233937+00:00 operation=Get duration=0.053522s size=8 range: bytes=1070778162-1070778169 path=release/2025-09-24.0/theme=addresses/type=address/part-00000-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet 2025-10-17T17:05:28.106862343+00:00 operation=Get duration=0.108103s size=8 range: bytes=1017940335-1017940342 path=release/2025-09-24.0/theme=addresses/type=address/part-00003-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet ... 2025-10-17T17:05:28.589084204+00:00 operation=Get duration=0.084737s size=836971 range: bytes=1112791717-1113628687 path=release/2025-09-24.0/theme=addresses/type=address/part-00009-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet Summaries: List count: 2 duration min: 0.049048s duration max: 0.132154s duration avg: 0.090601s Get count: 33 duration min: 0.045500s duration max: 0.162114s duration avg: 0.089775s size min: 8 B size max: 917946 B size avg: 336000 B size sum: 11088026 B > ``` Note that a `LIST` report showing a duration must be a `list_with_delimiter()` call because a standard `list` call does not currently report a duration. ## Are there any user-facing changes? No-ish cc @alamb
## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - #17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - #17266 ## Rationale for this change Further fills out the missing methods that have yet to be instrumented in the instrumented object store. ## What changes are included in this PR? - Adds instrumentation around put_opts - Adds instrumentation around put_multipart - Adds tests for newly instrumented methods ## Are these changes tested? Yes. Unit tests have been added for the new methods Example output: ```sql DataFusion CLI v50.2.0 > CREATE EXTERNAL TABLE test(a bigint, b bigint) STORED AS parquet LOCATION '../../test_table/'; 0 row(s) fetched. Elapsed 0.003 seconds. > \object_store_profiling trace ObjectStore Profile mode set to Trace > INSERT INTO test values (1, 2), (3, 4); +-------+ | count | +-------+ | 2 | +-------+ 1 row(s) fetched. Elapsed 0.007 seconds. Object Store Profiling Instrumented Object Store: instrument_mode: Trace, inner: LocalFileSystem(file:///) 2025-10-17T19:02:15.440246215+00:00 operation=List path=home/blake/open_source_src/datafusion-BlakeOrth/test_table 2025-10-17T19:02:15.444096012+00:00 operation=Put duration=0.000249s size=815 path=home/blake/open_source_src/datafusion-BlakeOrth/test_table/a9pjKBxSOtXZobJO_0.parquet Summaries: List count: 1 Put count: 1 duration min: 0.000249s duration max: 0.000249s duration avg: 0.000249s size min: 815 B size max: 815 B size avg: 815 B size sum: 815 B > ``` (note: I have no idea how to exercise/show a multi-part put operation, or if DataFusion even utilizes multipart puts for large files) ## Are there any user-facing changes? No-ish cc @alamb --------- Co-authored-by: Andrew Lamb <[email protected]>
## Which issue does this PR close? This does not fully close, but is an incremental building block component for: - #17207 The full context of how this code is likely to progress can be seen in the POC for this effort: - #17266 ## Rationale for this change Further fills out method instrumentation ## What changes are included in this PR? - Adds instrumentation to head requests in the instrumented object store - Adds instrumentatin to delete requests in the instrumented object store - Adds tests for new code ## Are these changes tested? Yes. New unit tests have been added. ## Are there any user-facing changes? No-ish ## cc @alamb
Which issue does this PR close?
N/A - POC to recieve initial implementation feedback related to
datafusion-cli
] Add a way to see what object store requests are made #17207Rationale for this change
What changes are included in this PR?
Are these changes tested?
No - This is a POC
cc @alamb