Skip to content

Conversation

@LuQQiu
Copy link
Contributor

@LuQQiu LuQQiu commented Mar 6, 2025

Which issue does this PR close?

What changes are included in this PR?

Fix array_has_any and array_has_all with empty array input

Are these changes tested?

Added the unit test in array.slt

Are there any user-facing changes?

NA

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 6, 2025
@LuQQiu
Copy link
Contributor Author

LuQQiu commented Mar 6, 2025

@jayzhan211 Could you help approve the test workflows and review the PR? Thanks in advance

Comment on lines 444 to 445
ComparisonType::All => true,
ComparisonType::Any => false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's weird to me that All is true and Any is false but I agree it matches the descriptions:

array_has_all: Returns true if all elements of sub-array exist in array.
array_has_any: Returns true if any elements exist in both arrays.

I guess, from a set perspective, we have "true if the intersection of the two sets has the same length as needle" (all) and "true if the intersection of the two sets is non-empty" (any).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D select array_has_any([1,2,3], []);
┌────────────────────────────────────────────────────────────┐
│ array_has_any(main.list_value(1, 2, 3), main.list_value()) │
│                          boolean                           │
├────────────────────────────────────────────────────────────┤
│ false                                                      │
└────────────────────────────────────────────────────────────┘
D select array_has_all([1,2,3], []);
┌────────────────────────────────────────────────────────────┐
│ array_has_all(main.list_value(1, 2, 3), main.list_value()) │
│                          boolean                           │
├────────────────────────────────────────────────────────────┤
│ true                                                       │
└────────────────────────────────────────────────────────────┘

LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes. I just meant it was strange intuition (I was rambling). I was not arguing correctness. I agree this is good.

ComparisonType::All => true,
ComparisonType::Any => false,
};
let result = BooleanArray::from(vec![result_value; haystack.len()]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using BooleanBuffer::new_set and BooleanBuffer::new_unset should be slightly more efficient.

Copy link
Contributor Author

@LuQQiu LuQQiu Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using

    if needle.values().len() == 0 {
        return match comparison_type {
            ComparisonType::All => Ok(Arc::new(BooleanBuffer::new_set(haystack.len()))),
            ComparisonType::Any =>  Ok(Arc::new(BooleanBuffer::new_unset(haystack.len())))
        };
    }

Runs into

the trait bound `BooleanBuffer: arrow::array::Array` is not satisfied
the following other types implement trait `arrow::array::Array`:
  &T
  BooleanArray
  DictionaryArray<T>
  FixedSizeBinaryArray
  FixedSizeListArray
  GenericByteArray<T>
  GenericByteViewArray<T>
  GenericListArray<OffsetSize>
and 10 others
required for the cast from `std::sync::Arc<BooleanBuffer>` to `std::sync::Arc<dyn arrow::array::Array>`rustc[Click for full compiler diagnostic](rust-analyzer-diagnostics-view:/diagnostic%20message%20[0]?0#file:///Users/lu/Projects/Github/Work/datafusion/datafusion/functions-nested/src/array_has.rs)

So confusing about the different types loll

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use BooleanArray::from(BooleanBuffer::new_/* ... */)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! Changed to use BooleanBuffer

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @LuQQiu

false false false false
false false false false

query BB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this query passes on main for me. In other words I don't think the test covers the bug fix 🤔

DataFusion CLI v46.0.0
> select array_has_all(make_array(1,2,3), []),
       array_has_any(make_array(1,2,3), []);
+--------------------------------------------------------------------+--------------------------------------------------------------------+
| array_has_all(make_array(Int64(1),Int64(2),Int64(3)),make_array()) | array_has_any(make_array(Int64(1),Int64(2),Int64(3)),make_array()) |
+--------------------------------------------------------------------+--------------------------------------------------------------------+
| true                                                               | false                                                              |
+--------------------------------------------------------------------+--------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.024 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that! Sorry not quite familar with the testing system yet

        main  cargo run --bin datafusion-cli                                                                     
DataFusion CLI v46.0.0
> select array_has_all(make_array('1','2','3'),[]), array_has_any(make_array('1','2','3'),[]);
Arrow error: Invalid argument error: RowConverter column schema mismatch, expected Utf8 got Int64

will add this test case

@alamb
Copy link
Contributor

alamb commented Mar 17, 2025

This PR appears to have some build / CI failures that are preventing it from merging so marking it as drat

@LuQQiu can you please resolve the issues so it can be merged?

@alamb alamb marked this pull request as draft March 17, 2025 16:04
@LuQQiu LuQQiu marked this pull request as ready for review March 20, 2025 20:45
@LuQQiu
Copy link
Contributor Author

LuQQiu commented Mar 20, 2025

@alamb @Weijun-H Thanks for the review!

@alamb alamb merged commit afa5c8f into apache:main Mar 23, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 23, 2025

THanks again !

@LuQQiu
Copy link
Contributor Author

LuQQiu commented Mar 24, 2025

Thanks for all the reviews and suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_has_any(column, []) with empty array throws RowConverter column schema mismatch, expected Utf8 got Int64

6 participants