Skip to content

Conversation

@zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Nov 5, 2025

Which issue does this PR close?

Rationale for this change

When our internal project to use datafusion, we found it default to enable encryption even for latest datafusion.

Problem Analysis

In datafusion/sqllogictest/Cargo.toml:

datafusion = { workspace = true, default-features = true, features = ["avro", "parquet_encryption"] }

The Problem:
datafusion-sqllogictest depends on datafusion with default-features = true
And, sqllogictest is the member of datafusion, it will default to encryption when we use datafusion.

[workspace]
members = [
    "datafusion/common",
    "datafusion/common-runtime",
    "datafusion/catalog",
    "datafusion/catalog-listing",
    "datafusion/datasource",
    "datafusion/datasource-avro",
    "datafusion/datasource-csv",
    "datafusion/datasource-json",
    "datafusion/datasource-parquet",
    "datafusion/core",
    "datafusion/expr",
    "datafusion/expr-common",
    "datafusion/execution",
    "datafusion/ffi",
    "datafusion/functions",
    "datafusion/functions-aggregate",
    "datafusion/functions-aggregate-common",
    "datafusion/functions-table",
    "datafusion/functions-nested",
    "datafusion/functions-window",
    "datafusion/functions-window-common",
    "datafusion/optimizer",
    "datafusion/physical-expr",
    "datafusion/physical-expr-adapter",
    "datafusion/physical-expr-common",
    "datafusion/physical-optimizer",
    "datafusion/pruning",
    "datafusion/physical-plan",
    "datafusion/proto",
    "datafusion/proto/gen",
    "datafusion/proto-common",
    "datafusion/proto-common/gen",
    "datafusion/session",
    "datafusion/spark",
    "datafusion/sql",
    "datafusion/sqllogictest",
    "datafusion/substrait",
    "datafusion-cli",
    "datafusion-examples",
    "datafusion-examples/examples/ffi/ffi_example_table_provider",
    "datafusion-examples/examples/ffi/ffi_module_interface",
    "datafusion-examples/examples/ffi/ffi_module_loader",
    "test-utils",
    "benchmarks",
    "datafusion/macros",
    "datafusion/doc",
]
exclude = ["dev/depcheck"]
resolver = "2"

What changes are included in this PR?

Fixed above.

Are these changes tested?

Yes

Are there any user-facing changes?

Make encryption a feature instead of default.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 5, 2025
@github-actions github-actions bot added the development-process Related to development process of DataFusion label Nov 5, 2025
Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Also, the documentation said this feature is optional, so there is no need to update the doc.

- `parquet_encryption`: support for using [Parquet Modular Encryption]

@zhuqi-lucas
Copy link
Contributor Author

datafusion/README.md

Thank you @2010YOUY01 for review and quick checking.

@Jefffrey
Copy link
Contributor

Jefffrey commented Nov 5, 2025

And, sqllogictest is the member of datafusion, it will default to encryption when we use datafusion.

Could you help me understand this point?

@zhuqi-lucas
Copy link
Contributor Author

And, sqllogictest is the member of datafusion, it will default to encryption when we use datafusion.

Could you help me understand this point?

Sure, let me clarify that point, correct me if i am wrong, but this PR works in our production now, it fixed the problem. Our use case is to use workspace which including the datafusion default features.

Because datafusion-sqllogictest is part of the same workspace, and in its Cargo.toml it depends on datafusion with:

datafusion = { workspace = true, default-features = true, features = ["avro", "parquet_encryption"] }

Cargo feature unification means that if any crate in the workspace enables a feature for a dependency, that feature becomes enabled for the entire build. So even if users import datafusion separately, as long as they depend on this workspace, the parquet_encryption feature is still activated through sqllogictest.

In other words, sqllogictest enabling parquet_encryption causes datafusion to be compiled with encryption enabled by default for anyone building the workspace.

That’s why it appears as if encryption is "enabled by default" from the user perspective.

This PR changes the default behavior so that encryption is only included when explicitly requested.

export TPCH_DATA=`realpath datafusion/sqllogictest/test_files/tpch/data`
cargo test plan_q --package datafusion-benchmarks --profile ci --features=ci -- --test-threads=1
INCLUDE_TPCH=true cargo test --features backtrace --profile ci --package datafusion-sqllogictest --test sqllogictests
INCLUDE_TPCH=true cargo test --features backtrace, parquet_encryption --profile ci --package datafusion-sqllogictest --test sqllogictests
Copy link
Contributor

@Jefffrey Jefffrey Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this fix in other places too? e.g.

cargo test --features backtrace --profile release-nonlto --test sqllogictests -- --include-sqlite

(Also does having the space there between the comma and parquet_encryption impact anything?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this command is reused, I think it's better to put them inside a .sh script.

Copy link
Contributor Author

@zhuqi-lucas zhuqi-lucas Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do this fix in other places too? e.g.

Good catch, the extended test should be consistent.

If this command is reused, I think it's better to put them inside a .sh script.

This is a good future improvement for common logic.

@zhuqi-lucas
Copy link
Contributor Author

Thanks @Jefffrey for review, I added the extend test also consistent with sqllogictest now.

@alamb alamb changed the title Datafuson default will enable encryption, we should disable it by default Disable parquet_encryption by default in datafusion-sqllogictests Nov 5, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me -- thank you @zhuqi-lucas

I think this change only applies to people who use datafusion-sqllogictest in their project

chrono = { workspace = true, optional = true }
clap = { version = "4.5.50", features = ["derive", "env"] }
datafusion = { workspace = true, default-features = true, features = ["avro", "parquet_encryption"] }
datafusion = { workspace = true, default-features = true, features = ["avro"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider enabling the default-features too 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb for review, do you mean the default-features to false? Here it use true already, i think enable default-features for datafusion here is ok for sqllogictest.

Copy link
Contributor

@alamb alamb Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was suggesting turning off the default features.

I think given this says default-features=true, that means that anyone who uses datafusion-sqllogictest will get all default features, and there will not be any way to turn the default features off

Copy link
Contributor Author

@zhuqi-lucas zhuqi-lucas Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @alamb , i think default features are ok, i just tried to set false here, the sqllogictest will fail in my local test.

And here we remove here for parquet_encryption because it removed from datafusion default, also i think the encryption should be always optional instead of the default.

Or we can set to false, if we can easily find the minimal features here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do anything more on this PR

Or we can set to false, if we can easily find the minimal features here?

Yeah that would probably be the ideal outcome (we could file a ticket to track the idea) I don't think it is particularly high priority

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed a ticket to track this, thanks!
#18506

@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Nov 5, 2025

This makes sense to me -- thank you @zhuqi-lucas

I think this change only applies to people who use datafusion-sqllogictest in their project

Right, thank you @alamb for double check.

@zhuqi-lucas zhuqi-lucas added this pull request to the merge queue Nov 6, 2025
Merged via the queue into apache:main with commit f2437d1 Nov 6, 2025
32 checks passed
@zhuqi-lucas
Copy link
Contributor Author

Thank you @2010YOUY01 @Jefffrey @alamb @xudong963 for review!

@xudong963
Copy link
Member

@zhuqi-lucas It seems that the extend tests failed after the PR

@Jefffrey
Copy link
Contributor

Jefffrey commented Nov 6, 2025

Maybe my suggestion here was incorrect #18492 (comment)

😅

@zhuqi-lucas
Copy link
Contributor Author

@xudong963 @Jefffrey I am checking the extend tests fails.

@zhuqi-lucas
Copy link
Contributor Author

Created the fix PR, this is expected since we make parquet_encryption optional feature now, so the extended test should also add this optional feature: cc @xudong963 @Jefffrey
#18507

github-merge-queue bot pushed a commit that referenced this pull request Nov 6, 2025
## Which issue does this PR close?

Now we make parquet_encryption optional feature

- Closes
#18492 (comment)

## Rationale for this change

Make the extended test to add this feature.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
…pache#18492)

## Which issue does this PR close?

- Closes [apache#18490](apache#18490)

## Rationale for this change

When our internal project to use datafusion, we found it default to
enable encryption even for latest datafusion.

Problem Analysis

In datafusion/sqllogictest/Cargo.toml:
```rust
datafusion = { workspace = true, default-features = true, features = ["avro", "parquet_encryption"] }
```
The Problem:
datafusion-sqllogictest depends on datafusion with default-features =
true
And, sqllogictest is the member of datafusion, it will default to
encryption when we use datafusion.

```rust
[workspace]
members = [
    "datafusion/common",
    "datafusion/common-runtime",
    "datafusion/catalog",
    "datafusion/catalog-listing",
    "datafusion/datasource",
    "datafusion/datasource-avro",
    "datafusion/datasource-csv",
    "datafusion/datasource-json",
    "datafusion/datasource-parquet",
    "datafusion/core",
    "datafusion/expr",
    "datafusion/expr-common",
    "datafusion/execution",
    "datafusion/ffi",
    "datafusion/functions",
    "datafusion/functions-aggregate",
    "datafusion/functions-aggregate-common",
    "datafusion/functions-table",
    "datafusion/functions-nested",
    "datafusion/functions-window",
    "datafusion/functions-window-common",
    "datafusion/optimizer",
    "datafusion/physical-expr",
    "datafusion/physical-expr-adapter",
    "datafusion/physical-expr-common",
    "datafusion/physical-optimizer",
    "datafusion/pruning",
    "datafusion/physical-plan",
    "datafusion/proto",
    "datafusion/proto/gen",
    "datafusion/proto-common",
    "datafusion/proto-common/gen",
    "datafusion/session",
    "datafusion/spark",
    "datafusion/sql",
    "datafusion/sqllogictest",
    "datafusion/substrait",
    "datafusion-cli",
    "datafusion-examples",
    "datafusion-examples/examples/ffi/ffi_example_table_provider",
    "datafusion-examples/examples/ffi/ffi_module_interface",
    "datafusion-examples/examples/ffi/ffi_module_loader",
    "test-utils",
    "benchmarks",
    "datafusion/macros",
    "datafusion/doc",
]
exclude = ["dev/depcheck"]
resolver = "2"
```

## What changes are included in this PR?

Fixed above.

## Are these changes tested?

Yes

## Are there any user-facing changes?

Make encryption a feature instead of default.
codetyri0n pushed a commit to codetyri0n/datafusion that referenced this pull request Nov 11, 2025
…#18507)

## Which issue does this PR close?

Now we make parquet_encryption optional feature

- Closes
apache#18492 (comment)

## Rationale for this change

Make the extended test to add this feature.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datafuson default will enable encryption, we should disable it by default.

5 participants