Skip to content

Conversation

jecsand838
Copy link
Contributor

This adds new fixed256_decimal.avro, fixed_length_decimal_legacy_32.avro, int128_decimal.avro, int256_decimal.avro files which will be used to test arrow-avro.

Here's the python script used to generate the new Avro file: https://gist.github.com/3890349bdb33082a3e8fdcae3257eef7.git

Part of apache/arrow-rs#4886

@jecsand838 jecsand838 changed the title Add new Avro files for decimal data types. Add new Avro files for decimal data types Sep 1, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jecsand838 -- this makes sense to me

@alamb
Copy link
Contributor

alamb commented Sep 8, 2025

This PR now sadly has a conflict

alamb pushed a commit to apache/arrow-rs that referenced this pull request Sep 8, 2025
# Which issue does this PR close?

- Part of #4886

# Rationale for this change

Apache Avro’s `decimal` logical type annotates either `bytes` or `fixed`
and carries `precision` and `scale`. Implementations should reject
invalid combinations such as `scale > precision`, and the underlying
bytes are the two’s‑complement big‑endian representation of the unscaled
integer. On the Arrow side, Rust now exposes first‑class `Decimal32`,
`Decimal64`, `Decimal128`, and `Decimal256` data types with documented
maximum precisions (9, 18, 38, 76 respectively). Until now, `arrow-avro`
decoded all Avro decimals to 128/256‑bit Arrow decimals, even when a
narrower type would suffice.

# What changes are included in this PR?

**`arrow-avro/src/codec.rs`**

* Map `Codec::Decimal(precision, scale, _size)` to Arrow’s
`Decimal32`/`64`/`128`/`256` **by precision**, preferring the narrowest
type (≤9→32, ≤18→64, ≤38→128, otherwise 256).
* Strengthen decimal attribute parsing:
  * Error if `scale > precision`.
  * Error if `precision` exceeds Arrow’s maximum (Decimal256).
* If Avro uses `fixed`, check that declared `precision` fits the byte
width (≤4→max 9, ≤8→18, ≤16→38, ≤32→76).
* Update docstring of `Codec::Decimal` to mention `Decimal32`/`64`. 

**`arrow-avro/src/reader/record.rs`**

* Add `Decoder::Decimal32` and `Decoder::Decimal64` variants with
corresponding builders (`Decimal32Builder`, `Decimal64Builder`).
* Builder selection:

* If Avro uses **fixed**: choose by size (≤4→Decimal32, ≤8→Decimal64,
≤16→Decimal128, ≤32→Decimal256).
* If Avro uses **bytes**: choose by declared precision (≤9/≤18/≤38/≤76).
* Implement decode paths that sign‑extend Avro’s two’s‑complement
payload to 4/8 bytes and append values to the new builders; update
`append_null`/`flush` for 32/64‑bit decimals.

**`arrow-avro/src/reader/mod.rs` (tests)**

* Expand `test_decimal` to assert that:

* bytes‑backed decimals with precision 4 map to `Decimal32`; precision
10 map to `Decimal64`;
  * legacy fixed\[8] decimals map to `Decimal64`;
  * fixed\[16] decimals map to `Decimal128`.
* Add a nulls path test for bytes‑backed `Decimal32`.

# Are these changes tested?

Yes. Unit tests under `arrow-avro/src/reader/mod.rs` construct expected
`Decimal32Array`/`Decimal64Array`/`Decimal128Array` with
`with_precision_and_scale`, and compare against batches decoded from
Avro files (including legacy fixed and bytes‑backed cases). The tests
also exercise small batch sizes to cover buffering paths; a new Avro
data file is added for higher‑width decimals.

New Avro test file details:
- test/data/int256_decimal.avro # bytes logicalType:
decimal(precision=76, scale=10)
- test/data/fixed256_decimal.avro # fixed[32] logicalType:
decimal(precision=76, scale=10)
- test/data/fixed_length_decimal_legacy_32.avro # fixed[4] logicalType:
decimal(precision=9, scale=2)
- test/data/int128_decimal.avro # bytes logicalType:
decimal(precision=38, scale=2)

These new Avro test files were created using this script:
https://gist.github.com/jecsand838/3890349bdb33082a3e8fdcae3257eef7

There is also an arrow-testing PR for these new files:
apache/arrow-testing#112

# Are there any user-facing changes?

N/A due to `arrow-avro` not being public.
@alamb alamb merged commit 0b1aad0 into apache:master Sep 17, 2025
@alamb
Copy link
Contributor

alamb commented Sep 17, 2025

Thanks @jecsand838

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants