Skip to content

Conversation

uros-db
Copy link
Contributor

@uros-db uros-db commented May 23, 2024

What changes were proposed in this pull request?

String lowercase/uppercase conversion in UTF8_BINARY_LCASE now works using ICU default locale, similar to how other ICU collations currently work in Spark.

Why are the changes needed?

All collations apart from UTF8_BINARY should use the same interface (UCharacter) that utilizes ICU toLowerCase/toUpperCase implementation, rather than mixing JVM & ICU implementations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests and e2e sql tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@uros-db uros-db changed the title [SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation May 23, 2024
@github-actions github-actions bot added the SQL label May 23, 2024
@uros-db uros-db changed the title [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation May 24, 2024
@uros-db uros-db changed the title [WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation [SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation May 27, 2024
@uros-db uros-db requested a review from mkaravel May 31, 2024 12:20
@uros-db uros-db requested a review from dbatomic June 5, 2024 07:58
@uros-db uros-db changed the title [SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation [SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE & ICU collations Jun 5, 2024
Copy link
Contributor

@mkaravel mkaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please add a few more interesting test cases for uppercasing in this PR or a follow up one.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 61fd936 Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants