-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix array_has_all and array_has_any with empty array #15039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jayzhan211 Could you help approve the test workflows and review the PR? Thanks in advance |
| ComparisonType::All => true, | ||
| ComparisonType::Any => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird to me that All is true and Any is false but I agree it matches the descriptions:
array_has_all: Returns true if all elements of sub-array exist in array.
array_has_any: Returns true if any elements exist in both arrays.
I guess, from a set perspective, we have "true if the intersection of the two sets has the same length as needle" (all) and "true if the intersection of the two sets is non-empty" (any).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D select array_has_any([1,2,3], []);
┌────────────────────────────────────────────────────────────┐
│ array_has_any(main.list_value(1, 2, 3), main.list_value()) │
│ boolean │
├────────────────────────────────────────────────────────────┤
│ false │
└────────────────────────────────────────────────────────────┘
D select array_has_all([1,2,3], []);
┌────────────────────────────────────────────────────────────┐
│ array_has_all(main.list_value(1, 2, 3), main.list_value()) │
│ boolean │
├────────────────────────────────────────────────────────────┤
│ true │
└────────────────────────────────────────────────────────────┘
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yes. I just meant it was strange intuition (I was rambling). I was not arguing correctness. I agree this is good.
| ComparisonType::All => true, | ||
| ComparisonType::Any => false, | ||
| }; | ||
| let result = BooleanArray::from(vec![result_value; haystack.len()]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using BooleanBuffer::new_set and BooleanBuffer::new_unset should be slightly more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
if needle.values().len() == 0 {
return match comparison_type {
ComparisonType::All => Ok(Arc::new(BooleanBuffer::new_set(haystack.len()))),
ComparisonType::Any => Ok(Arc::new(BooleanBuffer::new_unset(haystack.len())))
};
}
Runs into
the trait bound `BooleanBuffer: arrow::array::Array` is not satisfied
the following other types implement trait `arrow::array::Array`:
&T
BooleanArray
DictionaryArray<T>
FixedSizeBinaryArray
FixedSizeListArray
GenericByteArray<T>
GenericByteViewArray<T>
GenericListArray<OffsetSize>
and 10 others
required for the cast from `std::sync::Arc<BooleanBuffer>` to `std::sync::Arc<dyn arrow::array::Array>`rustc[Click for full compiler diagnostic](rust-analyzer-diagnostics-view:/diagnostic%20message%20[0]?0#file:///Users/lu/Projects/Github/Work/datafusion/datafusion/functions-nested/src/array_has.rs)
So confusing about the different types loll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use BooleanArray::from(BooleanBuffer::new_/* ... */)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! Changed to use BooleanBuffer
Weijun-H
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @LuQQiu
Co-authored-by: Alex Huang <[email protected]>
| false false false false | ||
| false false false false | ||
|
|
||
| query BB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this query passes on main for me. In other words I don't think the test covers the bug fix 🤔
DataFusion CLI v46.0.0
> select array_has_all(make_array(1,2,3), []),
array_has_any(make_array(1,2,3), []);
+--------------------------------------------------------------------+--------------------------------------------------------------------+
| array_has_all(make_array(Int64(1),Int64(2),Int64(3)),make_array()) | array_has_any(make_array(Int64(1),Int64(2),Int64(3)),make_array()) |
+--------------------------------------------------------------------+--------------------------------------------------------------------+
| true | false |
+--------------------------------------------------------------------+--------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.024 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that! Sorry not quite familar with the testing system yet
main cargo run --bin datafusion-cli
DataFusion CLI v46.0.0
> select array_has_all(make_array('1','2','3'),[]), array_has_any(make_array('1','2','3'),[]);
Arrow error: Invalid argument error: RowConverter column schema mismatch, expected Utf8 got Int64
will add this test case
|
This PR appears to have some build / CI failures that are preventing it from merging so marking it as drat @LuQQiu can you please resolve the issues so it can be merged? |
|
THanks again ! |
|
Thanks for all the reviews and suggestions! |
Which issue does this PR close?
What changes are included in this PR?
Fix array_has_any and array_has_all with empty array input
Are these changes tested?
Added the unit test in array.slt
Are there any user-facing changes?
NA