-
Notifications
You must be signed in to change notification settings - Fork 1k
Add arrow-avro support for Duration type and minor fixes for UUID decoding #7889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Fixed `Uuid` support, now represented as `Utf8` in Arrow and added testing logic. - Added `Duration` support, mapped to Arrow's `IntervalMonthDayNano`, with schema handling, decoding, and integration tests. - Updated `Cargo.toml` to include the `uuid` crate as a dev dependency for UUID checking. - Added integration tests with the new `duration_uuid.avro` test file.
749d435 to
e2faf46
Compare
Co-authored-by: Matthijs Brobbel <[email protected]>
- Changed `Uuid` from `Utf8` back to `FixedSizeBinary(16)` for proper Arrow UUID representation. - Removed `uuid` crate dependency. - Updated schema handling, decoding logic, and relevant tests for the new `Uuid` type. - Added utility functions and tests to parse UUID strings into binary format.
Co-authored-by: Matthijs Brobbel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jecsand838. I think it would be good to get at least one more review, because I'm not familiar with this crate.
- Introduced `canonical_extension_types` feature for standardized UUID handling. - Added `Uuid` crate dependency for parsing and validating UUIDs. - Updated `field_with_name` method to support canonical UUID representation. - Removed custom UUID parsing logic and replaced it with `Uuid` crate functionality. - Updated `Cargo.toml` accordingly.
66dd25a to
aa00f95
Compare
@mbrobbel Thank you for the solid review and great suggestions. @alamb @scovich Would either of you be able to provide the additional review(s) if you get a chance? |
f21d1ec to
5c56183
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - thanks @jecsand838 and @mbrobbel
Which issue does this PR close?
Part of Add Avro Support #4886
Related to Avro codec enhancements #6965
Rationale for this change
The
arrow-avrocrate currently lacks support for the Avrodurationtype, which is a standard and commonly used type in Avro schemas. This omission prevents users from reading Avro files containing duration types, limiting the crate's utility.This change introduces support for decoding Avro duration types by mapping them to the Arrow
Intervaltype. This is a logical and efficient representation. Implementing this feature brings thearrow-avrocrate closer to full Avro specification compliance and makes it more robust for real-world use cases.What changes are included in this PR?
This PR contains:
utf8type to better align with the Avro specificationduration_uuid.avrofile created using this python script: https://gist.github.com/jecsand838/cbdaaf581af78f357778bf87d2f3cf15Are these changes tested?
Yes, this PR includes for integration and unit tests covering these modifications.
Are there any user-facing changes?
N/A
Follow-Up PRs
test_duration_uuidonce Added duration_uuid.avro file arrow-testing#108 is merged in.