Skip to content

Conversation

@miroim
Copy link
Member

@miroim miroim commented Aug 12, 2025

Which issue does this PR close?

Rationale for this change

as discussion from @alamb on: #16649 (comment)

What changes are included in this PR?

  • Removed the parquet_encryption feature from the default features.
  • Added documentation to ParquetEncryptionOptions explaining that the parquet_encryption feature flag must be enabled to use Parquet encryption.
  • Enabled the parquet_encryption feature in the datafusion-cli.
  • Enabled the parquet_encryption feature in the datafusion/sqllogictest.
  • Enabled the parquet_encryption feature in the datafusion-examples.
  • Updated the Crate features in README.

Are these changes tested?

doc test

Are there any user-facing changes?

Change parquet_encryption feature from the default feature to non-default

@github-actions github-actions bot added core Core DataFusion crate common Related to common crate labels Aug 12, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 12, 2025
@alamb
Copy link
Contributor

alamb commented Aug 12, 2025

CI failure tracked by

@alamb
Copy link
Contributor

alamb commented Aug 12, 2025

Updating to get clean CI run

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me -- thank you @miroim

I think we need to document the new config as well as add a new CI test to make sure the crate works without the feature. For example:

- name: Check datafusion (parquet)
run: cargo check --profile ci --no-default-features -p datafusion --features=parquet

And compile datafusion with the parquet and parquet_encryption features.

cc @adamreeve and @corwinjoy I would be interested to hear your opinion on this change

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 14, 2025
@corwinjoy
Copy link
Contributor

This change looks reasonable to me.

@adamreeve
Copy link
Contributor

👍 makes sense to me too.

There is already a cargo check run in CI that just enables the parquet_encryption feature:

run: cargo check --profile ci --no-default-features -p datafusion --features=parquet_encryption

I think we need to enable parquet_encryption for the cargo test run here though:

--features serde,avro,json,backtrace,integration-tests

@miroim
Copy link
Member Author

miroim commented Aug 15, 2025

Should we also enable parquet_encryption for the cargo test (macos-aarch64) here?

run: cargo test --profile ci --exclude datafusion-cli --workspace --lib --tests --bins --features avro,json,backtrace,integration-tests

@alamb
Copy link
Contributor

alamb commented Aug 15, 2025

Should we also enable parquet_encryption for the cargo test (macos-aarch64) here?

run: cargo test --profile ci --exclude datafusion-cli --workspace --lib --tests --bins --features avro,json,backtrace,integration-tests

I don't have any strong opinions -- I don't think it is necessary

@alamb
Copy link
Contributor

alamb commented Aug 15, 2025

I think we need to enable parquet_encryption for the cargo test run here though:

I agree this is needed to ensure the feature keeps working

We also need to check that the flags work in isolation as described in #17137 (review)

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Aug 15, 2025
Comment on lines 223 to 224
- name: Check parquet encryption (parquet, parquet_encryption)
run: cargo check --profile ci --no-default-features -p datafusion --features=parquet,parquet_encryption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be redundant but then I noticed that the parquet_encryption feature doesn't actually depend on the parquet feature. Maybe we should remove this extra check run as the one above already tests the parquet_encryption feature, and update datafusion/core/Cargo.toml so that the parquet_encryption feature depends on the parquet feature rather than just dep:parquet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adamreeve for review

@alamb alamb added the api change Changes the API exposed to users of the crate label Aug 18, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again @miroim and @adamreeve

@alamb
Copy link
Contributor

alamb commented Aug 22, 2025

I merged up from main and minimized this diff for clarity about what actually changed.

@alamb alamb merged commit 73ad492 into apache:main Aug 22, 2025
29 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 22, 2025

Thanks again @miroim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate common Related to common crate core Core DataFusion crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make parquet_encryption a non-default feature

4 participants