FFI: data corruption on boolean Array with non-zero offset

### Background
We encountered a [new issue](https://github.com/apache/datafusion-comet/pull/2040#issuecomment-3111563959) while trying to upgrade to DF49 where a test using the `first` or `last` aggregations when ignoring nulls returns incorrect results. DF49 did have some changes to these code paths, but I think this is just exposing a Comet issue, rather than a DF issue. After digging into it, the data coming out of the partial aggregation is correct on the native side with 3 columns showing a single row for one of the parititions that is `[2, null, false]` but when we import the vector on the Spark side the value changes to `[2, null, true]` which breaks the final aggregation and produces wrong results. When I dig into the FFI snapshot of the problematic batches, the only difference I found was that `offset` was non-zero (1 in this case).

More discussion on a proposed DataFusion workaround: https://github.com/apache/datafusion/pull/16918.

Digging into past issues, `offset` with FFI between C and Rust with Java seems problematic...
https://github.com/apache/arrow-rs/issues/3671
https://github.com/apache/arrow-rs/pull/3675
https://github.com/apache/arrow-rs/issues/5959
...and perhaps most importantly...
https://github.com/apache/arrow-java/issues/88

From what I've inferred in issue discussion and documentation, a non-zero `offset` is not guaranteed to be supported by all FFI consumers:
https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.offset
> Producers MAY specify that they will only produce 0-offset arrays to ease implementation of consumer code. Consumers MAY decide not to support non-0-offset arrays, but they should document this limitation.

I don't currently see anything in the Arrow Java documentation that says it doesn't support non-zero `offset` but the previously linked issues are concerning.

### Proposed fix

I have a test branch locally that just does a `take` on any arrays with a non-zero offset before sending them over the JNI boundary with FFI. Running `make test` locally with a debug print if a non-zero offset array occurs makes this seem like a very rare code path. I will likely open a PR later today with this workaround, but wanted an issue to add a comment in the code related and to collect discussion.

### Steps to reproduce

It's slightly non-deterministic due to the first/last function behavior, but this test with DF49 reproduces almost every time for me:

https://github.com/apache/datafusion-comet/pull/2040#issuecomment-3111563959

### Expected behavior

A `false` should not flip to `true` when crossing the JNI boundary with Arrow FFI.

### Additional context

A fuzzer between Arrow-rs and Arrow Java that just does JNI calls back and forth to FFI batches would be amazing, but is perhaps outside of the scope of the Comet repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FFI: data corruption on boolean Array with non-zero offset #2051

Background

Proposed fix

Steps to reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FFI: data corruption on boolean Array with non-zero offset #2051

Description

Background

Proposed fix

Steps to reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions