Skip to content

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Closes #7835

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Oct 21, 2023
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from ac9ffc3 to e27d7f9 Compare November 12, 2023 02:27
@github-actions github-actions bot removed logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Nov 12, 2023
@jayzhan211
Copy link
Contributor Author

wait on #7862

@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from e27d7f9 to 5a5a88d Compare December 9, 2023 11:13
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from 5a5a88d to a006450 Compare December 14, 2023 12:37
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 16, 2023
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch 2 times, most recently from e57eca5 to 1c82d51 Compare December 18, 2023 00:21
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch 2 times, most recently from addf685 to bd98b9a Compare January 6, 2024 06:19
let should_fail_on_seralize: Vec<ScalarValue> = vec![
// Should fail due to empty values
ScalarValue::Struct(
Some(vec![]),

This comment was marked as outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to round_trip_scalar_values, since it is able to serialized

        ScalarValue::try_from(&DataType::Struct(Fields::from(vec![
            Field::new("a", DataType::Int32, true),
            Field::new("a", DataType::Boolean, false),
        ])))
        .unwrap(),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is no need to test serializing an empty array as it isn't a valid input anyways

@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch 2 times, most recently from c786df7 to 60f4d2a Compare January 10, 2024 13:27
@jayzhan211 jayzhan211 marked this pull request as ready for review January 10, 2024 13:48
@jayzhan211
Copy link
Contributor Author

@alamb Ready for review!

"| |",
"| {a: , b: } |",
"| {a: , b: {ba: , bb: }} |",
"| {a: 1, b: } |",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no way to construct StructArray like the left-hand side.

explain select struct(1, 2.3, 'abc');
----
logical_plan
Projection: Struct({c0:Int64(1),c1:Float64(2.3),c2:Utf8("abc")}) AS struct(Int64(1),Float64(2.3),Utf8("abc"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test format

.into_iter()
.map(|(name, scalar)| (Field::new(name, scalar.data_type(), false), scalar))
.unzip();
// Wrapper for ScalarValue::Struct that checks the length of the arrays, without nulls
Copy link
Contributor Author

@jayzhan211 jayzhan211 Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Remove these two wrappers, no longer needed after changing to Scalar<T>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these still TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we haven't changed to Scalar yet.

let struct_type = DataType::Struct(Fields::from(fields));
let mut column_wise_ordering_values = vec![];
let num_columns = fields.len();
for i in 0..num_columns {
Copy link
Contributor Author

@jayzhan211 jayzhan211 Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a better design for StructArray (previous design is based on old ScalarValue::Struct). I avoid changing the logic or data structure in this PR.

May benefit #8558?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a better design for StructArray (previous design is based on old ScalarValue::Struct). I avoid changing the logic or data structure in this PR.

May benefit #8558?

I don't think it will benefit #8558 (won't harm either). However it will be a better change anyway.

}

/// Return a `null` literal representing a struct type like: `{ a: bool }`
// / Return a `null` literal representing a struct type like: `{ a: bool }`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// / Return a `null` literal representing a struct type like: `{ a: bool }`
/// Return a `null` literal representing a struct type like: `{ a: bool }`


Ok(ordering_columns_per_row)
} else {
exec_err!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: internal_err

@jayzhan211 jayzhan211 requested a review from alamb January 13, 2024 02:10
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from 3f53ead to 98afb20 Compare January 29, 2024 11:56
@jayzhan211
Copy link
Contributor Author

Rebase

@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from e431707 to a756a8b Compare January 30, 2024 14:14
@alamb
Copy link
Contributor

alamb commented Feb 2, 2024

@jayzhan211 -- is this PR ready for a review?

@jayzhan211
Copy link
Contributor Author

@jayzhan211 -- is this PR ready for a review?

Yes, it keeps getting conflicts, but I think you can take a first scan, unless the conflicts are critical

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@jayzhan211 jayzhan211 force-pushed the arrayref-for-scalarvalue-struct branch from a756a8b to 4088750 Compare February 5, 2024 13:15
@jayzhan211
Copy link
Contributor Author

Rebase

@alamb
Copy link
Contributor

alamb commented Feb 5, 2024

Sorry -- starting to look now

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good @jayzhan211 -- thank you both for the PR as well as for sticking with it for so long

I had a few comments about how to improve the implementation by using arrow kernels, but I also think we could merge this as is and then implement those improvements as a follow on PR if you prefer.

Again, thank you for your patience.

}

impl OrderSensitiveArrayAggAccumulator {
fn evaluate_orderings(&self) -> Result<ScalarValue> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can file a follow on ticket to track this idea

let should_fail_on_seralize: Vec<ScalarValue> = vec![
// Should fail due to empty values
ScalarValue::Struct(
Some(vec![]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is no need to test serializing an empty array as it isn't a valid input anyways

);
}

let mut valid = BooleanBufferBuilder::new(arrays.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using https://docs.rs/arrow/latest/arrow/compute/kernels/concat/index.html here? I think you should be able to simply concat the arrays together without having to have special handling (and if concat doesn't support StructArray we can potentially file a ticket upstream in arrow-rs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.into_iter()
.map(|(name, scalar)| (Field::new(name, scalar.data_type(), false), scalar))
.unzip();
// Wrapper for ScalarValue::Struct that checks the length of the arrays, without nulls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these still TODO?

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
@alamb
Copy link
Contributor

alamb commented Feb 6, 2024

Thanks @jayzhan211 -- this is looking great. There are a few more outstanding suggestions, but I think we could do them as follow on PRs -- shall I merge this one?

@jayzhan211
Copy link
Contributor Author

Thanks @jayzhan211 -- this is looking great. There are a few more outstanding suggestions, but I think we could do them as follow on PRs -- shall I merge this one?

Sure!

@alamb alamb merged commit 8413da8 into apache:main Feb 7, 2024
@alamb
Copy link
Contributor

alamb commented Feb 7, 2024

Thanks again @jayzhan211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change ScalarValue::Struct to store ArrayRef

4 participants