Skip to content

Conversation

@jecsand838
Copy link
Contributor

@jecsand838 jecsand838 commented Jul 10, 2025

Which issue does this PR close?

Rationale for this change

The arrow-avro crate currently lacks support for the Avro duration type, which is a standard and commonly used type in Avro schemas. This omission prevents users from reading Avro files containing duration types, limiting the crate's utility.

This change introduces support for decoding Avro duration types by mapping them to the Arrow Interval type. This is a logical and efficient representation. Implementing this feature brings the arrow-avro crate closer to full Avro specification compliance and makes it more robust for real-world use cases.

What changes are included in this PR?

This PR contains:

Are these changes tested?

Yes, this PR includes for integration and unit tests covering these modifications.

Are there any user-facing changes?

N/A

Follow-Up PRs

  1. PR to update test_duration_uuid once Added duration_uuid.avro file arrow-testing#108 is merged in.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 10, 2025
- Fixed `Uuid` support, now represented as `Utf8` in Arrow and added testing logic.
- Added `Duration` support, mapped to Arrow's `IntervalMonthDayNano`, with schema handling, decoding, and integration tests.
- Updated `Cargo.toml` to include the `uuid` crate as a dev dependency for UUID checking.
- Added integration tests with the new `duration_uuid.avro` test file.
@jecsand838 jecsand838 force-pushed the avro-codec-duration branch from 749d435 to e2faf46 Compare July 10, 2025 01:43
Co-authored-by: Matthijs Brobbel <[email protected]>
- Changed `Uuid` from `Utf8` back to `FixedSizeBinary(16)` for proper Arrow UUID representation.
- Removed `uuid` crate dependency.
- Updated schema handling, decoding logic, and relevant tests for the new `Uuid` type.
- Added utility functions and tests to parse UUID strings into binary format.
@jecsand838 jecsand838 requested a review from mbrobbel July 10, 2025 21:41
@jecsand838 jecsand838 requested a review from mbrobbel July 11, 2025 09:50
Copy link
Member

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jecsand838. I think it would be good to get at least one more review, because I'm not familiar with this crate.

- Introduced `canonical_extension_types` feature for standardized UUID handling.
- Added `Uuid` crate dependency for parsing and validating UUIDs.
- Updated `field_with_name` method to support canonical UUID representation.
- Removed custom UUID parsing logic and replaced it with `Uuid` crate functionality.
- Updated `Cargo.toml` accordingly.
@jecsand838 jecsand838 force-pushed the avro-codec-duration branch from 66dd25a to aa00f95 Compare July 11, 2025 15:12
@jecsand838
Copy link
Contributor Author

Thanks @jecsand838. I think it would be good to get at least one more review, because I'm not familiar with this crate.

@mbrobbel Thank you for the solid review and great suggestions.

@alamb @scovich Would either of you be able to provide the additional review(s) if you get a chance?

@jecsand838 jecsand838 force-pushed the avro-codec-duration branch from f21d1ec to 5c56183 Compare July 11, 2025 19:16
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - thanks @jecsand838 and @mbrobbel

@alamb alamb merged commit 02e06c5 into apache:main Jul 14, 2025
24 checks passed
@jecsand838 jecsand838 deleted the avro-codec-duration branch July 14, 2025 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants