Skip to content

Conversation

@BlakeOrth
Copy link
Contributor

Which issue does this PR close?

This does not fully close, but is an incremental building block component for:

The full context of how this code is likely to progress can be seen in the POC for this effort:

Rationale for this change

For queries that have many calls to an instrumented object store generating a full output of all the calls and the summary of those calls could end up generating thousands of lines of output. Allowing users to only see a summary for these cases will help ensure the instrumented object store does not completely dominate the output for a query.

What changes are included in this PR?

  • Adds the ability for a user to choose a summary only output for an instrumented object store when using the CLI
  • The existing "enabled" setting that displays both a summary and a detailed usage for each object store call has been renamed to Trace to improve clarity
  • Adds additional test cases for summary only and modifies existing tests to use trace
  • Updates user guide docs to reflect the CLI flag and command line changes

Are these changes tested?

Yes. Additional unit tests have been added, and the existing integration test has been augmented to exercise the new option(s).

Example functional output:

./datafusion-cli --object-store-profiling trace
DataFusion CLI v50.2.0
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 row(s) fetched.
Elapsed 0.532 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Trace, inner: HttpStore
2025-10-14T22:26:13.185625701+00:00 operation=Get duration=0.035335s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-14T22:26:13.221015783+00:00 operation=Get duration=0.045423s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet

Summaries:
Get
count: 2
duration min: 0.035335s
duration max: 0.045423s
duration avg: 0.040379s
size min: 8 B
size max: 34322 B
size avg: 17165 B
size sum: 34330 B

> \object_store_profiling summary
ObjectStore Profile mode set to Summary
> CREATE EXTERNAL TABLE hits2
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_2.parquet';
0 row(s) fetched.
Elapsed 0.179 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: HttpStore
Summaries:
Get
count: 2
duration min: 0.021558s
duration max: 0.022129s
duration avg: 0.021843s
size min: 8 B
size max: 55508 B
size avg: 27758 B
size sum: 55516 B

>

Are there any user-facing changes?

Yes. An existing user option in the form of a CLI flag and the associated command was changed. The user documentation has been updated to reflect these changes.

cc @alamb
(I believe the previous PR that was merged for this effort was the last major set of core functionality! 🎉 The remaining PRs should all be pretty concise and just fill out the small bits of missing implementation.)

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 14, 2025
@BlakeOrth BlakeOrth force-pushed the feature/cli_instrument_trace branch from 46460fa to 0ba562f Compare October 15, 2025 16:49
 - Adds the ability for a user to choose a summary only output for an
   instrumented object store when using the CLI
 - The existing "enabled" setting that displays both a summary and a
   detailed usage for each object store call has been renamed to `Trace`
   to improve clarity
 - Adds additional test cases for summary only and modifies existing
   tests to use trace
 - Updates user guide docs to reflect the CLI flag and command line
   changes
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BlakeOrth -- I tried it out, and it works great!

ObjectStore Profile mode set to Summary
> select count(*) from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
+----------+
| count(*) |
+----------+
| 1000000  |
+----------+
1 row(s) fetched.
Elapsed 0.595 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: HttpStore
Summaries:
Get
count: 2
duration min: 0.053315s
duration max: 0.056176s
duration avg: 0.054746s
size min: 8 B
size max: 34322 B
size avg: 17165 B
size sum: 34330 B

> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> select count(*) from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
+----------+
| count(*) |
+----------+
| 1000000  |
+----------+
1 row(s) fetched.
Elapsed 0.199 seconds.

Object Store Profiling

@alamb alamb added this pull request to the merge queue Oct 16, 2025
Merged via the queue into apache:main with commit 3bca1bb Oct 16, 2025
29 checks passed
reggieross pushed a commit to elastiflow/datafusion that referenced this pull request Oct 28, 2025
* Refactor: split test_window_partial_constant_and_set_monotonicity into multiple tests (#17952)

* fix: Ensure ListingTable partitions are pruned when filters are not used (#17958)

* fix: Prune partitions when no filters are defined

* fix: Formatting

* chore: Cargo fmt

* chore: Clippy

* Push Down Filter Subexpressions in Nested Loop Joins as Projections (#17906)

* Check-in NestedLoopJoinProjectionPushDown

* Update Cargo.lock

* Add some comments

* Update slts that are affected by the nl-join-projection-push-down

* please lints

* Move code into projection_pushdown.rs

* Remove explicit coalesce batches

* Docs

* feat: support Spark `concat` string function (#18063)

* chore: Extend backtrace coverage

* fmt

* part2

* feedback

* clippy

* feat: support Spark `concat`

* clippy

* comments

* test

* doc

* Add independent configs for topk/join dynamic filter (#18090)

* Add independent configs for topk/join dynamic filter

* fix ci

* update doc

* fix typo

* Adds Trace and Summary to CLI instrumented stores (#18064)

- Adds the ability for a user to choose a summary only output for an
   instrumented object store when using the CLI
 - The existing "enabled" setting that displays both a summary and a
   detailed usage for each object store call has been renamed to `Trace`
   to improve clarity
 - Adds additional test cases for summary only and modifies existing
   tests to use trace
 - Updates user guide docs to reflect the CLI flag and command line
   changes

* fix: Improve null handling in array_to_string function (#18076)

* fix: Improve null handling in array_to_string function

* chore

* feat: update .asf.yaml configuration settings (#18027)

* Fix extended tests on main to get CI green (#18096)

## Which issue does this PR close?


- Closes https://github.com/apache/datafusion/issues/18084

## Rationale for this change
Some of the extended tests are failing because we have fixed case
conditional evaluation and queries that (incorrectly) previously did not
pass are now.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

Update datafusion-testing pin

## Are these changes tested?

I tested locally with:

```shell
INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests
```

## Are there any user-facing changes?

No

* chore(deps): bump taiki-e/install-action from 2.62.29 to 2.62.31 (#18094)

Bumps
[taiki-e/install-action](https://github.com/taiki-e/install-action) from
2.62.29 to 2.62.31.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's
releases</a>.</em></p>
<blockquote>
<h2>2.62.31</h2>
<ul>
<li>
<p>Update <code>protoc@latest</code> to 3.33.0.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.9.3.</p>
</li>
<li>
<p>Update <code>syft@latest</code> to 1.34.1.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.9.</p>
</li>
<li>
<p>Update <code>cargo-shear@latest</code> to 1.6.0.</p>
</li>
</ul>
<h2>2.62.30</h2>
<ul>
<li>
<p>Update <code>vacuum@latest</code> to 0.18.6.</p>
</li>
<li>
<p>Update <code>zizmor@latest</code> to 1.15.2.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>This project adheres to <a href="https://semver.org">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased]</h2>
<h2>[2.62.31] - 2025-10-16</h2>
<ul>
<li>
<p>Update <code>protoc@latest</code> to 3.33.0.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.9.3.</p>
</li>
<li>
<p>Update <code>syft@latest</code> to 1.34.1.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.9.</p>
</li>
<li>
<p>Update <code>cargo-shear@latest</code> to 1.6.0.</p>
</li>
</ul>
<h2>[2.62.30] - 2025-10-15</h2>
<ul>
<li>
<p>Update <code>vacuum@latest</code> to 0.18.6.</p>
</li>
<li>
<p>Update <code>zizmor@latest</code> to 1.15.2.</p>
</li>
</ul>
<h2>[2.62.29] - 2025-10-14</h2>
<ul>
<li>
<p>Update <code>zizmor@latest</code> to 1.15.1.</p>
</li>
<li>
<p>Update <code>cargo-nextest@latest</code> to 0.9.106.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.8.</p>
</li>
<li>
<p>Update <code>ubi@latest</code> to 0.8.1.</p>
</li>
</ul>
<h2>[2.62.28] - 2025-10-11</h2>
<ul>
<li>
<p>Update <code>release-plz@latest</code> to 0.3.148.</p>
</li>
<li>
<p>Update <code>cargo-sort@latest</code> to 2.0.2.</p>
</li>
<li>
<p>Update <code>cargo-binstall@latest</code> to 1.15.7.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.9.2.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/taiki-e/install-action/commit/0005e0116e92d8489d8d96fbff83f061c79ba95a"><code>0005e01</code></a>
Release 2.62.31</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/6936d999d90424ed013e4f325d91e14d7ddba27f"><code>6936d99</code></a>
Update <code>protoc@latest</code> to 3.33.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/ac7ad6efa1b1bb919bcaa357eb1873f328ee07f7"><code>ac7ad6e</code></a>
Update <code>uv@latest</code> to 0.9.3</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/005833aaf18c1621513995406c3bc0397747afc2"><code>005833a</code></a>
Update <code>syft@latest</code> to 1.34.1</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/2b32ff6f3dc99bc9fa6647cbc9f7da71cf979b65"><code>2b32ff6</code></a>
Update <code>mise@latest</code> to 2025.10.9</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/74c0274864f156f487aee04623a20b315fb2125a"><code>74c0274</code></a>
Update <code>cargo-shear@latest</code> to 1.6.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/f13d8e15c52b25c79b608d399cc802adc73d83da"><code>f13d8e1</code></a>
Release 2.62.30</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/1034dc55996706645239db97d3ea04f42a708f22"><code>1034dc5</code></a>
Update <code>vacuum@latest</code> to 0.18.6</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/55b5d509b8761e9696e1cfec0d6f66f0655e8fff"><code>55b5d50</code></a>
Update <code>zizmor@latest</code> to 1.15.2</li>
<li>See full diff in <a
href="https://github.com/taiki-e/install-action/compare/5b5de1b4da26ad411330c0454bdd72929bfcbeb2...0005e0116e92d8489d8d96fbff83f061c79ba95a">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.62.29&new-version=2.62.31)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore: run extended suite on PRs for critical areas (#18088)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.
Related to https://github.com/apache/datafusion/issues/18084

## Rationale for this change

Run extended suite on PRs for critical areas, to avoid post merge
bugfixing

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>

* refactor: add dialect enum (#18043)

## Which issue does this PR close?

- Closes #18042 

## Rationale for this change

This PR introduces a new dialect enum to improve type safety and code
maintainability when handling different SQL dialects in DataFusion

1. Provide compile-time guarantees for dialect handling
2. Improve code readability and self-documentation
3. Enable better IDE support and autocomplete

## What changes are included in this PR?

- Added a new `Dialect` enum to represent supported SQL dialects
- Refactored existing code to use the new enum instead of previous
representations
- Modified tests to work with the new enum-based approach

## Are these changes tested?

Yes

## Are there any user-facing changes?

Yes, this is an API change: the type of the `dialect` field changed from
`String` to `Dialect`

* #17982 Make `nvl` a thin wrapper for `coalesce` (#17991)

## Which issue does this PR close?

- Closes #17982

## Rationale for this change

By making `NVLFunc` a wrapper for `CoalesceFunc` with a more restrictive
signature the implementation automatically benefits from any
optimisation work related to `coalesce`.

## What changes are included in this PR?

- Make `NVLFunc` a thin wrapper of `CoalesceFunc`. This seemed like the
simplest way to reuse the coalesce logic, but keep the stricter
signature of `nvl`.
- Add `ScalarUDF::conditional_arguments` as a more precise complement to
`ScalarUDF::short_circuits`. By letting each function expose which
arguments are eager and which are lazy, we provide more precise
information to the optimizer which may enable better optimisation.

## Are these changes tested?

Assumed to be covered by sql logic tests.
Unit tests for the custom implementation were removed since those are no
longer relevant.

## Are there any user-facing changes?

The rewriting of `nvl` to `case when ... then ... else ... end` is
visible in the physical query plan.

---------

Co-authored-by: Andrew Lamb <[email protected]>

* minor: fix incorrect deprecation version & window docs (#18093)

* chore: use `NullBuffer::union` for Spark `concat` (#18087)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

Followup on
https://github.com/apache/datafusion/pull/18063#pullrequestreview-3341818221

## Rationale for this change

Use cheaper `NullBuffer::union` to apply null mask instead of iterator
approach

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

* feat: support `null_treatment`, `distinct`, and `filter` for window functions in proto (#18024)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #17417.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

- Support `null_treatment`, `distinct`, and `filter` for window function
in proto.
- Support `null_treatment` for aggregate udf in proto.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- [x] Add `null_treatment`, `distinct`, `filter` fields to
`WindowExprNode` message and handle them in `to/from_proto.rs`.
- [x] Add `null_treatment` field to `AggregateUDFExprNode` message and
handle them in `to/from_proto.rs`.
- [ ] Docs update: I'm not sure where to add docs as declared in the
issue description.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

- Add tests to `roundtrip_window` for respectnulls, ignorenulls,
distinct, filter.
- Add tests to `roundtrip_aggregate_udf` for respectnulls, ignorenulls.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
N/A

---------

Co-authored-by: Jeffrey Vo <[email protected]>

* feat: Add percentile_cont aggregate function (#17988)

## Summary

Adds exact `percentile_cont` aggregate function as the counterpart to
the existing `approx_percentile_cont` function.

## What changes were made?

### New Implementation
- Created `percentile_cont.rs` with full implementation
- `PercentileCont` struct implementing `AggregateUDFImpl`
- `PercentileContAccumulator` for standard aggregation
- `DistinctPercentileContAccumulator` for DISTINCT mode
- `PercentileContGroupsAccumulator` for efficient grouped aggregation
- `calculate_percentile` function with linear interpolation

### Features
- **Exact calculation**: Stores all values in memory for precise results
- **WITHIN GROUP syntax**: Supports `WITHIN GROUP (ORDER BY ...)` 
- **Interpolation**: Uses linear interpolation between values
- **All numeric types**: Works with integers, floats, and decimals
- **Ordered-set aggregate**: Properly marked as
`is_ordered_set_aggregate()`
- **GROUP BY support**: Efficient grouped aggregation via
GroupsAccumulator

### Tests
Added comprehensive tests in `aggregate.slt`:
- Error conditions validation
- Basic percentile calculations (0.0, 0.25, 0.5, 0.75, 1.0)
- Comparison with `median` function
- Ascending and descending order
- GROUP BY aggregation
- NULL handling
- Edge cases (empty sets, single values)
- Float interpolation
- Various numeric data types

## Example Usage

```sql
-- Basic usage with WITHIN GROUP syntax
SELECT percentile_cont(0.75) WITHIN GROUP (ORDER BY column_name) 
FROM table_name;

-- With GROUP BY
SELECT category, percentile_cont(0.95) WITHIN GROUP (ORDER BY value)
FROM sales
GROUP BY category;

-- Compare with median (percentile_cont(0.5) == median)
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY price) FROM products;
```

## Performance Considerations

Like `median`, this function stores all values in memory before
computing results. For large datasets or when approximation is
acceptable, use `approx_percentile_cont` instead.

## Related Issues

Closes #6714

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <[email protected]>

* fix: Re-bump latest datafusion-testing module so extended tests succeed (#18110)

Looks like #17988 accidentally reverted the bump from #18096

* chore(deps): bump taiki-e/install-action from 2.62.31 to 2.62.33 (#18113)

Bumps
[taiki-e/install-action](https://github.com/taiki-e/install-action) from
2.62.31 to 2.62.33.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's
releases</a>.</em></p>
<blockquote>
<h2>2.62.33</h2>
<ul>
<li>Update <code>mise@latest</code> to 2025.10.10.</li>
</ul>
<h2>2.62.32</h2>
<ul>
<li>
<p>Update <code>syft@latest</code> to 1.34.2.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.18.7.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>This project adheres to <a href="https://semver.org">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased]</h2>
<h2>[2.62.33] - 2025-10-17</h2>
<ul>
<li>Update <code>mise@latest</code> to 2025.10.10.</li>
</ul>
<h2>[2.62.32] - 2025-10-16</h2>
<ul>
<li>
<p>Update <code>syft@latest</code> to 1.34.2.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.18.7.</p>
</li>
</ul>
<h2>[2.62.31] - 2025-10-16</h2>
<ul>
<li>
<p>Update <code>protoc@latest</code> to 3.33.0.</p>
</li>
<li>
<p>Update <code>uv@latest</code> to 0.9.3.</p>
</li>
<li>
<p>Update <code>syft@latest</code> to 1.34.1.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.9.</p>
</li>
<li>
<p>Update <code>cargo-shear@latest</code> to 1.6.0.</p>
</li>
</ul>
<h2>[2.62.30] - 2025-10-15</h2>
<ul>
<li>
<p>Update <code>vacuum@latest</code> to 0.18.6.</p>
</li>
<li>
<p>Update <code>zizmor@latest</code> to 1.15.2.</p>
</li>
</ul>
<h2>[2.62.29] - 2025-10-14</h2>
<ul>
<li>
<p>Update <code>zizmor@latest</code> to 1.15.1.</p>
</li>
<li>
<p>Update <code>cargo-nextest@latest</code> to 0.9.106.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.10.8.</p>
</li>
<li>
<p>Update <code>ubi@latest</code> to 0.8.1.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/taiki-e/install-action/commit/e43a5023a747770bfcb71ae048541a681714b951"><code>e43a502</code></a>
Release 2.62.33</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/2ae4258c3daeaf460c202b95aa4272c1f594d78e"><code>2ae4258</code></a>
Update <code>mise@latest</code> to 2025.10.10</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/e79914c740f0acf092c59adfa2a61d3d2266b6bf"><code>e79914c</code></a>
Release 2.62.32</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/40168eab5f259c94f094865825dbdefd1cf31bbf"><code>40168ea</code></a>
Update <code>syft@latest</code> to 1.34.2</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/6d89b16c494331f0cdbca002e68ea5ab4fa8e3f6"><code>6d89b16</code></a>
Update <code>vacuum@latest</code> to 0.18.7</li>
<li>See full diff in <a
href="https://github.com/taiki-e/install-action/compare/0005e0116e92d8489d8d96fbff83f061c79ba95a...e43a5023a747770bfcb71ae048541a681714b951">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.62.31&new-version=2.62.33)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Adding hiop as known user (#18114)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Doesn't close an issue.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Hi we are hiop, a Serverless Data Logistic Platform.
We use DataFusion as a core part of our backend engine, and it plays a
crucial role in our data infrastructure. Our team members are passionate
about the project and actively try contribute to its development
(@dariocurr).

We’d love to have Hiop listed among the Known Users to show our support
and help the DataFusion community continue to grow.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Just adding hiop as known user

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

* chore: remove unnecessary `skip_failed_rules` config in slt (#18117)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #3695
- Closes #3797

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Was looking at above issues and I don't believe we skip the failed rules
for any tests anymore (default for the config is also `false`), apart
from this cleanup, so filing this PR so we can close the issues. Seems
we only do in this `window.slt` test after this fix:


https://github.com/apache/datafusion/blob/621a24978a7a9c6d2b27973d1853dbc8776a56b5/datafusion/sqllogictest/test_files/window.slt#L2587-L2611

Which seems intentional.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Remove unnecessary `skip_failed_rules` config.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

* move repartition to insta (#18106)

Related https://github.com/apache/datafusion/pull/16324
https://github.com/apache/datafusion/pull/16617

almost there!

* refactor: move ListingTable over to the catalog-listing-table crate (#18080)

## Which issue does this PR close?

- This addresses part of
https://github.com/apache/datafusion/issues/17713
- Closes https://github.com/apache/datafusion/issues/14462


## Rationale for this change

In order to remove the `datafusion` core crate from `proto` as a
dependency, we need to access `ListingTable` but it is within the `core`
crate. There already exists a `datafusion-catalog-listing` which is bare
and appears to be the place this should exist.

## What changes are included in this PR?

Move `ListingTable` and some of its dependent structs over to the
`datafusion-catalog-listing` crate.

There is one dependency I wasn't able to remove from the `core` crate,
which is inferring the listing table configuration options. That is
because within this method it downcasts `Session` to `SessionState`. If
a downstream user ever attempts to implement `Session` themselves, these
methods also would not work. Because it would cause a circular
dependency, we cannot also lift the method we need out of `SessionState`
to `Session`. Instead I took the approach of splitting off the two
methods that require `SessionState` as an extension trait for the
listing table config.

From the git diff this appears to be a large change (+1637/-1519)
however the *vast* majority of that is copying the code from one file
into another. I have added a comment on the significant change.

## Are these changes tested?

Existing unit tests show no regression. This is just a code refactor.

## Are there any user-facing changes?

Users may need to update their use paths.

* refactor: move arrow datasource to new `datafusion-datasource-arrow` crate (#18082)

## Which issue does this PR close?

- This addresses part of
https://github.com/apache/datafusion/issues/17713 but it does not close
it.

## Rationale for this change

In order to remove `core` from `proto` crate, we need `ArrowFormat` to
be available. Similar to the other datasource types (csv, avro, json,
parquet) this splits the Arrow IPC file format into its own crate.

## What changes are included in this PR?

This is a straight refactor. Code is merely moved around.

The size of the diff is the additional files that are required
(cargo.toml, readme.md, etc)

## Are these changes tested?

Existing unit tests.

## Are there any user-facing changes?

Users that include `ArrowSource` may need to update their include paths.
For most, the reexports will cover this need.

* Adds instrumentation to LIST operations in CLI (#18103)

## Which issue does this PR close?

This does not fully close, but is an incremental building block
component for:
 - https://github.com/apache/datafusion/issues/17207

The full context of how this code is likely to progress can be seen in
the POC for this effort:
 - https://github.com/apache/datafusion/pull/17266

## Rationale for this change

Continued progress filling out the methods that are instrumented for the
instrumented object store.

## What changes are included in this PR?

- Adds instrumentation around basic list operations into the
instrumented object store
 - Adds test cases for new code

## Are these changes tested?

Yes.

Example output:
```sql
DataFusion CLI v50.2.0
> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET
LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet';
0 row(s) fetched.
Elapsed 2.679 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data)
2025-10-16T18:53:09.512970085+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet

Summaries:
List
count: 1

Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(altinity-clickhouse-data)
2025-10-16T18:53:09.929709943+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-10-16T18:53:10.106757629+00:00 operation=List path=nyc_taxi_rides/data/tripdata_parquet
2025-10-16T18:53:10.220555058+00:00 operation=Get duration=0.230604s size=8 range: bytes=222192975-222192982 path=nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet
2025-10-16T18:53:10.226399832+00:00 operation=Get duration=0.263826s size=8 range: bytes=233123927-233123934 path=nyc_taxi_rides/data/tripdata_parquet/data-201104.parquet
2025-10-16T18:53:10.226194195+00:00 operation=Get duration=0.269754s size=8 range: bytes=252843253-252843260 path=nyc_taxi_rides/data/tripdata_parquet/data-201103.parquet

. . .

2025-10-16T18:53:11.928787014+00:00 operation=Get duration=0.072248s size=18278 range: bytes=201384109-201402386 path=nyc_taxi_rides/data/tripdata_parquet/data-201509.parquet
2025-10-16T18:53:11.933475464+00:00 operation=Get duration=0.068880s size=17175 range: bytes=195411804-195428978 path=nyc_taxi_rides/data/tripdata_parquet/data-201601.parquet
2025-10-16T18:53:11.949629591+00:00 operation=Get duration=0.065645s size=19872 range: bytes=214807880-214827751 path=nyc_taxi_rides/data/tripdata_parquet/data-201603.parquet

Summaries:
List
count: 2

Get
count: 288
duration min: 0.060930s
duration max: 0.444601s
duration avg: 0.133339s
size min: 8 B
size max: 44247 B
size avg: 18870 B
size sum: 5434702 B

>
```


## Are there any user-facing changes?
No-ish

##
cc @alamb

* feat: spark udf array shuffle (#17674)

## Which issue does this PR close?


## Rationale for this change

support shuffle udf

## What changes are included in this PR?

support shuffle udf

## Are these changes tested?

UT

## Are there any user-facing changes?

No

* make Union::try_new pub (#18125)

## Which issue does this PR close?

- Closes #18126.

## Rationale for this change

It's a useful constructor for users manipulating logical plans where
they know the schemas will match exactly. We already expose other
constructors for Union and constructors for logical plans.

## What changes are included in this PR?

Makes `Union::try_new` a public function.

## Are these changes tested?

Seems unnecessary.

## Are there any user-facing changes?

The function is now public. Not a breaking change, but going forward
changes to it would breaking changes to users of the logical plan API.

* fix: window unparsing (#17367)

## Which issue does this PR close?

- Closes #17360.

## Rationale for this change

in LogicalPlan::Filter unparsing, 
if there's a window expr, it should be converted to quailify.

postgres must has an alias for derived table. otherwise it will
complain:
```
ERROR: subquery in FROM must have an alias.
```
fixed this issue at the same time.

## What changes are included in this PR?

If window expr is found, convert filter to quailify.

## Are these changes tested?

UT

## Are there any user-facing changes?

No

---------

Co-authored-by: Jeffrey Vo <[email protected]>

* feat: Support configurable `EXPLAIN ANALYZE` detail level (#18098)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
`EXPLAIN ANALYZE` can be used for profiling and displays the results
alongside the EXPLAIN plan. The issue is that it currently shows too
many low-level details. It would provide a better user experience if
only the most commonly used metrics were shown by default, with more
detailed metrics available through specific configuration options.

### Example
In `datafusion-cli`:
```
> CREATE EXTERNAL TABLE IF NOT EXISTS lineitem
STORED AS parquet
LOCATION '/Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem';
0 row(s) fetched.
Elapsed 0.000 seconds.

explain analyze select *
from lineitem
where l_orderkey = 3000000;
```
The parquet reader includes a large number of low-level details:
```
metrics=[output_rows=19813, elapsed_compute=14ns, batches_split=0, bytes_scanned=2147308, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=18, num_predicate_creation_errors=0, page_index_rows_matched=19813, page_index_rows_pruned=729088, predicate_cache_inner_records=0, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=21.997µs, metadata_load_time=273.83µs, page_index_eval_time=29.915µs, row_pushdown_eval_time=42ns, statistics_eval_time=76.248µs, time_elapsed_opening=4.02146ms, time_elapsed_processing=24.787461ms, time_elapsed_scanning_total=24.17671ms, time_elapsed_scanning_until_data=23.103665ms]
```

I believe only a subset of it is commonly used, for example
`output_rows`, `metadata_load_time`, and how many file/row-group/pages
are pruned, and it would better to only display the most common ones by
default.

### Existing `VERBOSE` keyword
There is a existing verbose keyword in `EXPLAIN ANALYZE VERBOSE`,
however it's turning on per-partition metrics instead of controlling
detail level. I think it would be hard to mix this partition control and
the detail level introduced in this PR, so they're separated: the
following config will be used for detail level and the semantics of
`EXPLAIN ANALYZE VERBOSE` keep unchanged.

### This PR: configurable explain analyze level
1. Introduced a new config option `datafusion.explain.analyze_level`.
When set to `dev` (default value), all existing metrics will be shown.
If set to `summary`, only `BaselineMetrics` will be displayed (i.e.
`output_rows` and `elapsed_compute`).
Note now we only include `BaselineMetrics` for simplicity, in the
follow-up PRs we can figure out what's the commonly used metrics for
each operator, and add them to `summary` analyze level, finally set the
`summary` analyze level to default.
2. Add a `MetricType` field associated with `Metric` for detail level or
potentially category in the future. For different configurations, a
certain `MetricType` set will be shown accordingly.

#### Demo
```
-- continuing the above example
> set datafusion.explain.analyze_level = summary;
0 row(s) fetched.
Elapsed 0.000 seconds.

> explain analyze select *
from lineitem
where l_orderkey = 3000000;
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=25.339µs]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                   |   FilterExec: l_orderkey@0 = 3000000, metrics=[output_rows=5, elapsed_compute=81.221µs]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|                   |     DataSourceExec: file_groups={14 groups: [[Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:0..11525426], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:11525426..20311205, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:0..2739647], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:2739647..14265073], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:14265073..20193593, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:0..5596906], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:5596906..17122332], ...]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], file_type=parquet, predicate=l_orderkey@0 = 3000000, pruning_predicate=l_orderkey_null_count@2 != row_count@3 AND l_orderkey_min@0 <= 3000000 AND 3000000 <= l_orderkey_max@1, required_guarantees=[l_orderkey in (3000000)], metrics=[output_rows=19813, elapsed_compute=14ns] |
|                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.025 seconds.
```
Only `BaselineMetrics` are shown.


## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
4. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
UT

## Are there any user-facing changes?

No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>

* refactor: remove unused `type_coercion/aggregate.rs` functions (#18091)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

There's a few functions in
`datafusion/expr-common/src/type_coercion/aggregates.rs` that are unused
elsewhere in the codebase, likely a remnant before the refactor to UDF,
so removing them. Some are still used (`coerce_avg_type()` and
`avg_return_type()`) so these are inlined into the Avg aggregate
function (similar to Sum). Also refactor some window functions to use
already available macros.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Remove some unused functions
- Inline avg coerce & return type logic
- Refactor Spark Avg a bit to remove unnecessary code
- Refactor ntile & nth window functions to use available macros

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

Yes as these functions were publicly exported; however I'm not sure they
were meant to be used by users anyway, given what they do.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

* Add extra case_when benchmarks (#18097)

## Which issue does this PR close?

None

## Rationale for this change

More microbenchmarks make it easier to asses the performance impact of
`CaseExpr` implementation changes.

## What changes are included in this PR?

Add microbenchmarks for `case` expressions that are a bit more
representative for real world queries.

## Are these changes tested?

n/a

## Are there any user-facing changes?

no

* fix: Add dictionary coercion support for numeric comparison operations (#18099)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Fixes comparison errors when using dictionary-encoded types with
comparison functions like NULLIF.

## Rationale for this change

When using dictionary-encoded columns (e.g., Dictionary(Int32, Utf8)) in
comparison operations with literals or other types, DataFusion would
throw an error stating the types are not comparable. This was
particularly problematic for functions like NULLIF which rely on
comparison coercion.

The issue was that comparison_coercion_numeric didn't handle dictionary
types, even though the general comparison_coercion function did have
dictionary support.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

1. Refactored dictionary comparison logic: Extracted common dictionary
coercion logic into dictionary_comparison_coercion_generic to avoid code
duplication.
2. Added numeric-specific dictionary coercion: Introduced
dictionary_comparison_coercion_numeric that uses numeric-preferring
comparison rules when dealing with dictionary value types.
3. Updated comparison_coercion_numeric: Added a call to
dictionary_comparison_coercion_numeric in the coercion chain to properly
handle dictionary types.
4. Added sqllogictest cases demonstrating the fix works for various
dictionary comparison scenarios.

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes, added tests in datafusion/sqllogictest/test_files/nullif.slt
covering:
  - Dictionary type compared with string literal
  - String compared with dictionary type
  - Dictionary compared with dictionary

All tests pass with the fix and would fail without it.

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

This is a bug fix that enables previously failing queries to work
correctly. No breaking changes or API modifications.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

* Adds instrumentation to delimited LIST operations in CLI (#18134)

## Which issue does this PR close?

This does not fully close, but is an incremental building block
component for:
 - https://github.com/apache/datafusion/issues/17207

The full context of how this code is likely to progress can be seen in
the POC for this effort:
 - https://github.com/apache/datafusion/pull/17266

## Rationale for this change

Continued progress filling out methods that are instrumented by the
instrumented object store

## What changes are included in this PR?

- Adds instrumentation around delimited list operations into the
instrumented object store
 - Adds test cases for the new code

## Are these changes tested?

Yes, unit tests have been added.

Example output:
```sql
DataFusion CLI v50.2.0
> CREATE EXTERNAL TABLE overture_partitioned
STORED AS PARQUET LOCATION 's3://overturemaps-us-west-2/release/2025-09-24.0/theme=addresses/';
0 row(s) fetched.
Elapsed 2.307 seconds.

> \object_store_profiling trace
ObjectStore Profile mode set to Trace
> select count(*) from overture_partitioned;
+-----------+
| count(*)  |
+-----------+
| 446544475 |
+-----------+
1 row(s) fetched.
Elapsed 1.932 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Trace, inner: AmazonS3(overturemaps-us-west-2)
2025-10-17T17:05:27.922724180+00:00 operation=List duration=0.132154s path=release/2025-09-24.0/theme=addresses
2025-10-17T17:05:28.054894440+00:00 operation=List duration=0.049048s path=release/2025-09-24.0/theme=addresses/type=address
2025-10-17T17:05:28.104233937+00:00 operation=Get duration=0.053522s size=8 range: bytes=1070778162-1070778169 path=release/2025-09-24.0/theme=addresses/type=address/part-00000-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet
2025-10-17T17:05:28.106862343+00:00 operation=Get duration=0.108103s size=8 range: bytes=1017940335-1017940342 path=release/2025-09-24.0/theme=addresses/type=address/part-00003-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet

...

2025-10-17T17:05:28.589084204+00:00 operation=Get duration=0.084737s size=836971 range: bytes=1112791717-1113628687 path=release/2025-09-24.0/theme=addresses/type=address/part-00009-52872134-68de-44a6-822d-15fa29a0f606-c000.zstd.parquet

Summaries:
List
count: 2
duration min: 0.049048s
duration max: 0.132154s
duration avg: 0.090601s

Get
count: 33
duration min: 0.045500s
duration max: 0.162114s
duration avg: 0.089775s
size min: 8 B
size max: 917946 B
size avg: 336000 B
size sum: 11088026 B

>
```
Note that a `LIST` report showing a duration must be a
`list_with_delimiter()` call because a standard `list` call does not
currently report a duration.

## Are there any user-facing changes?

No-ish

cc @alamb

* feat: add fp16 support to Substrait (#18086)

## Which issue does this PR close?

- Closes #16298

## Rationale for this change

Float16 is an Arrow type. Substrait serialization for the type is
defined in
https://github.com/apache/arrow/blame/main/format/substrait/extension_types.yaml
as part of Arrow. We should support it.

This picks up where https://github.com/apache/datafusion/pull/16793
leaves off.

## What changes are included in this PR?

Support for converting DataType::Float16 to/from Substrait.
Support for converting ScalarValue::Float16 to/from Substrait.

## Are these changes tested?

Yes

## Are there any user-facing changes?

Yes.

The `SubstraitProducer` trait received a new method (`register_type`)
which downstream implementors will need to provide an implementation
for. The example custom producer has been updated with a default
implementation.

One public method that changed is
[`datafusion_substrait::logical_plan::producer::from_empty_relation`](https://docs.rs/datafusion-substrait/50.2.0/datafusion_substrait/logical_plan/producer/fn.from_empty_relation.html).
I'm not sure if that is meant to be part of the public API (for one
thing, it is undocumented, though maybe this is because it serves an
obvious purpose. It also returns a `Rel` which is a pretty internal
structure).

* fix(substrait): schema errors for Aggregates with no groupings (#17909)

## Which issue does this PR close?
Closes https://github.com/apache/datafusion/issues/16590

## Rationale for this change
When consuming Substrait plans containing aggregates with no groupings,
we would see the following error
```
Error: Substrait("Named schema must contain names for all fields")
```

The Substrait plan had one _less_ field than DataFusion expected because
DataFusion was adding an extra "__grouping_id" to the output of the
Aggregate node. This happens when the

https://github.com/apache/datafusion/blob/daeb6597a0c7344735460bb2dce13879fd89d7bd/datafusion/expr/src/logical_plan/plan.rs#L3418
condition is true.

A natural followup question to this is "Why are we creating an Aggregate
with a single empty GroupingSet for the group by, instead of just
leaving the group by entirely?".

## What changes are included in this PR?
Instead of setting group_exprs to a vector with a single empty grouping
set, let's just leave group_exprs empty entirely. This means that the
`is_grouping_set` is not triggered, so the Datafusion schema matches the
Substrait schema.

## Are these changes tested?
Yes

I have added direct tests via example Substrait plans

## Are there any user-facing changes?
Substrait plans that were not consumable before are now consumable.

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Improve datafusion-cli object store profiling summary display (#18085)

## Which issue does this PR close?

- part of https://github.com/apache/datafusion/issues/17207

## Rationale for this change

As suggested by @BlakeOrth in
https://github.com/apache/datafusion/pull/18045#issuecomment-3403692516
here is an attempt to improve the output of datafusion object store
trace profiling:

## What changes are included in this PR?

Update the output format when `\object_store_profiling summary` is set

Current format (on main, before this PR):
```sql
Summaries:
Get
count: 2
duration min: 0.024603s
duration max: 0.031946s
duration avg: 0.028274s
size min: 8 B
size max: 34322 B
size avg: 17165 B
size sum: 34330 B
```


New format (after this PR):

```sql
DataFusion CLI v50.2.0
> \object_store_profiling summary
ObjectStore Profile mode set to Summary
> select count(*) from 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
+----------+
| count(*) |
+----------+
| 1000000  |
+----------+
1 row(s) fetched.
Elapsed 6.754 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: HttpStore
Summaries:
+-----------+----------+-----------+-----------+-----------+-----------+-------+
| Operation | Metric   | min       | max       | avg       | sum       | count |
+-----------+----------+-----------+-----------+-----------+-----------+-------+
| Get       | duration | 0.031645s | 0.047780s | 0.039713s | 0.079425s | 2     |
| Get       | size     | 8 B       | 34322 B   | 17165 B   | 34330 B   | 2     |
+-----------+----------+-----------+-----------+-----------+-----------+-------+
```



## Are these changes tested?
Yes
## Are there any user-facing changes?
Nicer datafusion-cli output

* test: `to_timestamp(double)` for vectorized input (#18147)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #16678.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

The issue has been fixed in #16639, this PR just adds a testcase for it.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Add a test case for `to_timestamp(double)` with vectorized input.
Similar to the one presented in the issue.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

No

* Fix `concat_elements_utf8view` capacity initialization. (#18003)

## Which issue does this PR close?

- Relates to #17857 (See
https://github.com/apache/datafusion/issues/17857#issuecomment-3368519097)

## Rationale for this change

The capacity calculation replaced with `left.len()` (assuming
`left.len()` and `right.len()` are the same). As the `with_capacity`
refers to the length of the views (or strings), not to the length of the
bytes

## Are these changes tested?

The function is already covered by tests.

## Are there any user-facing changes?
No

* Use < instead of = in case benchmark predicates, use Integers (#18144)

## Which issue does this PR close?

- Followup to #18097

## Rationale for this change

The last benchmark was incorrectly essentially indentical to the second
to last one. The actual predicate was using `=` instead of `<`.

## What changes are included in this PR?

- Adjust the operator in the case predicates to `<`
- Adds two additional benchmarks covering `case x when ...`

## Are these changes tested?

Verified with debugger.

## Are there …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants