Skip to content

Conversation

@Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Mar 13, 2025

Which issue does this PR close?

Closes #1520.

Rationale for this change

This is a problem I found when working on #1511, the null bits were not correctly written and caused test failures. This patch is an attempt to fix it.

This patch is only aiming for fixing correctness problems. As #1190 (comment) pointed out, the fast BatchWriter may write full data buffer for sliced Utf8 arrays, so there's still some performance implications when working with sliced arrays.

What changes are included in this PR?

Correctly take slicing indices and length into account when writing BooleanBuffers. This applies to null bits of all arrays, and the values of boolean arrays.

How are these changes tested?

Added a new round-trip test for sliced record batches.

@Kontinuation Kontinuation force-pushed the fix-codec-write-nulls branch from 1816f03 to 05a4b04 Compare March 13, 2025 04:06
@Kontinuation Kontinuation marked this pull request as ready for review March 13, 2025 04:50
@codecov-commenter
Copy link

codecov-commenter commented Mar 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.97%. Comparing base (f09f8af) to head (05a4b04).
Report is 251 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1522      +/-   ##
============================================
+ Coverage     56.12%   58.97%   +2.84%     
- Complexity      976     1028      +52     
============================================
  Files           119      122       +3     
  Lines         11743    12268     +525     
  Branches       2251     2309      +58     
============================================
+ Hits           6591     7235     +644     
+ Misses         4012     3875     -137     
- Partials       1140     1158      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Kontinuation

@kazuyukitanimura kazuyukitanimura merged commit 2b5b918 into apache:main Mar 25, 2025
79 checks passed
@kazuyukitanimura
Copy link
Contributor

Thanks @Kontinuation merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect null handling in fast shuffle encoder

3 participants