-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Disable parquet_encryption by default in datafusion-sqllogictests
#18492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Also, the documentation said this feature is optional, so there is no need to update the doc.
Line 131 in b52a81d
| - `parquet_encryption`: support for using [Parquet Modular Encryption] |
Thank you @2010YOUY01 for review and quick checking. |
Could you help me understand this point? |
Sure, let me clarify that point, correct me if i am wrong, but this PR works in our production now, it fixed the problem. Our use case is to use workspace which including the datafusion default features. Because datafusion-sqllogictest is part of the same workspace, and in its Cargo.toml it depends on datafusion with: datafusion = { workspace = true, default-features = true, features = ["avro", "parquet_encryption"] }Cargo feature unification means that if any crate in the workspace enables a feature for a dependency, that feature becomes enabled for the entire build. So even if users import datafusion separately, as long as they depend on this workspace, the parquet_encryption feature is still activated through sqllogictest. In other words, sqllogictest enabling parquet_encryption causes datafusion to be compiled with encryption enabled by default for anyone building the workspace. That’s why it appears as if encryption is "enabled by default" from the user perspective. This PR changes the default behavior so that encryption is only included when explicitly requested. |
.github/workflows/rust.yml
Outdated
| export TPCH_DATA=`realpath datafusion/sqllogictest/test_files/tpch/data` | ||
| cargo test plan_q --package datafusion-benchmarks --profile ci --features=ci -- --test-threads=1 | ||
| INCLUDE_TPCH=true cargo test --features backtrace --profile ci --package datafusion-sqllogictest --test sqllogictests | ||
| INCLUDE_TPCH=true cargo test --features backtrace, parquet_encryption --profile ci --package datafusion-sqllogictest --test sqllogictests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do this fix in other places too? e.g.
datafusion/.github/workflows/extended.yml
Line 172 in 8363c89
| cargo test --features backtrace --profile release-nonlto --test sqllogictests -- --include-sqlite |
(Also does having the space there between the comma and parquet_encryption impact anything?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this command is reused, I think it's better to put them inside a .sh script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do this fix in other places too? e.g.
Good catch, the extended test should be consistent.
If this command is reused, I think it's better to put them inside a .sh script.
This is a good future improvement for common logic.
|
Thanks @Jefffrey for review, I added the extend test also consistent with sqllogictest now. |
parquet_encryption by default in datafusion-sqllogictests
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me -- thank you @zhuqi-lucas
I think this change only applies to people who use datafusion-sqllogictest in their project
| chrono = { workspace = true, optional = true } | ||
| clap = { version = "4.5.50", features = ["derive", "env"] } | ||
| datafusion = { workspace = true, default-features = true, features = ["avro", "parquet_encryption"] } | ||
| datafusion = { workspace = true, default-features = true, features = ["avro"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider enabling the default-features too 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alamb for review, do you mean the default-features to false? Here it use true already, i think enable default-features for datafusion here is ok for sqllogictest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was suggesting turning off the default features.
I think given this says default-features=true, that means that anyone who uses datafusion-sqllogictest will get all default features, and there will not be any way to turn the default features off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @alamb , i think default features are ok, i just tried to set false here, the sqllogictest will fail in my local test.
And here we remove here for parquet_encryption because it removed from datafusion default, also i think the encryption should be always optional instead of the default.
Or we can set to false, if we can easily find the minimal features here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do anything more on this PR
Or we can set to false, if we can easily find the minimal features here?
Yeah that would probably be the ideal outcome (we could file a ticket to track the idea) I don't think it is particularly high priority
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed a ticket to track this, thanks!
#18506
Right, thank you @alamb for double check. |
|
Thank you @2010YOUY01 @Jefffrey @alamb @xudong963 for review! |
|
@zhuqi-lucas It seems that the extend tests failed after the PR |
|
Maybe my suggestion here was incorrect #18492 (comment) 😅 |
|
@xudong963 @Jefffrey I am checking the extend tests fails. |
|
Created the fix PR, this is expected since we make parquet_encryption optional feature now, so the extended test should also add this optional feature: cc @xudong963 @Jefffrey |
## Which issue does this PR close? Now we make parquet_encryption optional feature - Closes #18492 (comment) ## Rationale for this change Make the extended test to add this feature. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…pache#18492) ## Which issue does this PR close? - Closes [apache#18490](apache#18490) ## Rationale for this change When our internal project to use datafusion, we found it default to enable encryption even for latest datafusion. Problem Analysis In datafusion/sqllogictest/Cargo.toml: ```rust datafusion = { workspace = true, default-features = true, features = ["avro", "parquet_encryption"] } ``` The Problem: datafusion-sqllogictest depends on datafusion with default-features = true And, sqllogictest is the member of datafusion, it will default to encryption when we use datafusion. ```rust [workspace] members = [ "datafusion/common", "datafusion/common-runtime", "datafusion/catalog", "datafusion/catalog-listing", "datafusion/datasource", "datafusion/datasource-avro", "datafusion/datasource-csv", "datafusion/datasource-json", "datafusion/datasource-parquet", "datafusion/core", "datafusion/expr", "datafusion/expr-common", "datafusion/execution", "datafusion/ffi", "datafusion/functions", "datafusion/functions-aggregate", "datafusion/functions-aggregate-common", "datafusion/functions-table", "datafusion/functions-nested", "datafusion/functions-window", "datafusion/functions-window-common", "datafusion/optimizer", "datafusion/physical-expr", "datafusion/physical-expr-adapter", "datafusion/physical-expr-common", "datafusion/physical-optimizer", "datafusion/pruning", "datafusion/physical-plan", "datafusion/proto", "datafusion/proto/gen", "datafusion/proto-common", "datafusion/proto-common/gen", "datafusion/session", "datafusion/spark", "datafusion/sql", "datafusion/sqllogictest", "datafusion/substrait", "datafusion-cli", "datafusion-examples", "datafusion-examples/examples/ffi/ffi_example_table_provider", "datafusion-examples/examples/ffi/ffi_module_interface", "datafusion-examples/examples/ffi/ffi_module_loader", "test-utils", "benchmarks", "datafusion/macros", "datafusion/doc", ] exclude = ["dev/depcheck"] resolver = "2" ``` ## What changes are included in this PR? Fixed above. ## Are these changes tested? Yes ## Are there any user-facing changes? Make encryption a feature instead of default.
…#18507) ## Which issue does this PR close? Now we make parquet_encryption optional feature - Closes apache#18492 (comment) ## Rationale for this change Make the extended test to add this feature. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
Which issue does this PR close?
Rationale for this change
When our internal project to use datafusion, we found it default to enable encryption even for latest datafusion.
Problem Analysis
In datafusion/sqllogictest/Cargo.toml:
The Problem:
datafusion-sqllogictest depends on datafusion with default-features = true
And, sqllogictest is the member of datafusion, it will default to encryption when we use datafusion.
What changes are included in this PR?
Fixed above.
Are these changes tested?
Yes
Are there any user-facing changes?
Make encryption a feature instead of default.