-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: Parquet modular encryption #16351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…uite, column encryption is broken.
Co-authored-by: Adam Reeve <[email protected]>
Co-authored-by: Adam Reeve <[email protected]>
…operties to use references.
… "." instead of "::"
2. Fixed unused header warning in config.rs. 3. Fix test case in encryption.rs to call conversion to ConfigFileDecryption properties correctly.
Co-authored-by: Copilot <[email protected]>
Add an example to read and write encrypted parquet files.
a3f0721
to
6a77842
Compare
…main Signed-off-by: Corwin Joy <[email protected]>
…xample Signed-off-by: Corwin Joy <[email protected]>
Signed-off-by: Adam Reeve <[email protected]>
store: &dyn ObjectStore, | ||
meta: &ObjectMeta, | ||
size_hint: Option<usize>, | ||
decryption_properties: Option<&FileDecryptionProperties>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a breaking change (and same for fetch_statistics
). Should we introduce a new method with this parameter added to avoid breaking the API? It is documented as "subject to change" though so maybe this is ok...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this probably needs to change anyway to handle having an encryption
feature.
.unwrap(), | ||
column_specific_options, | ||
key_value_metadata: Default::default(), | ||
crypto: Default::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we aren't going to support conversion to protobuf yet, we should probably raise an error in impl TryFrom<&TableParquetOptions> for protobuf::TableParquetOptions
if any crypto options are set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we? I see that key_value_metadata
is always set to empty here despite being defined in datafusion/proto-common/src/generated/prost.rs. So, it's not clear to me that we should throw an error here.
Use ConfigFileDecryptionProperties in ParquetReadOptions Signed-off-by: Corwin Joy <[email protected]>
Signed-off-by: Corwin Joy <[email protected]>
Signed-off-by: Corwin Joy <[email protected]>
Signed-off-by: Corwin Joy <[email protected]>
I merged up to resolve a conflict |
/// See ConfigFileEncryptionProperties and ConfigFileDecryptionProperties in datafusion/common/src/config.rs | ||
/// These can be set via 'format.crypto', for example: | ||
/// ```sql | ||
/// OPTIONS ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @corwinjoy -- this looks wonderful to me
https://github.com/apache/datafusion/actions/runs/15936215612/job/44956521967?pr=16351 That is a new failure for me
I'll trigger it again and see if I can reproduce |
Thanks again @corwinjoy / @adamreeve and everyone else. This is great |
Thanks @alamb much appreciated for the review and helpful feedback! We hope to have a followup PR soon with a config to make encryption optional. |
Thank you @corwinjoy |
Which issue does this PR close?
What changes are included in this PR?
This PR adds support for encryption in DataFusion’s Parquet implementation. The changes introduce new configuration options for file encryption and decryption properties, update various components (including proto conversion, file reading/writing, and tests), and add an end-to-end encrypted Parquet example.
Are these changes tested?
Tests and examples have been added to demonstrate and test functionality versus Parquet modular encryption. These could use feedback since there may be additional DataFusion usage cases that should be covered.
Are there any user-facing changes?
Additional options have been added to allow encryption/decryption configuration. We are soliciting additional feedback on how to handle key columns in a way that best fits the existing API.
Catalog of changes via copilot
Show a summary per file