-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-40798][SQL] Alter partition should verify value follow storeAssignmentPolicy #38257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bc270cc to
997f078
Compare
997f078 to
92a7f42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change partition column from int type to string to make the test work
|
also cc @MaxGekk @cloud-fan |
|
@ulysses-you Thanks for the ping. Could you add a legacy flag and migration guide? |
|
@gengliangwang added legacy config and migration guide |
docs/sql-migration-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Since Spark 3.4, Spark will do validation for partition spec follows the behavior of `spark.sql.storeAssignmentPolicy` which may cause exception if type conversion error, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column p is int type. To restore the legacy behavior, set `spark.sql.legacy.skipPartitionSpecTypeValidation` to `true`. | |
| - Since Spark 3.4, Spark will do validation for partition spec in ADD PARTITION to follow the behavior of `spark.sql.storeAssignmentPolicy` which may cause an exception if type conversion fails, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column `p` is int type. To restore the legacy behavior, set `spark.sql.legacy.skipPartitionSpecTypeValidation` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .doc("When true, skip do validation for partition spec. E.g., " + | |
| .doc("When true, skip validation for partition spec in ADD PARTITION. E.g., " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name is weird as the input is already a String. How about normalizePartitionStringValue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use checkError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| val SKIP_PARTITION_SPEC_TYPE_VALIDATION = | |
| val SKIP_TYPE_VALIDATION_ON_ALTER_PARTITION = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| buildConf("spark.sql.legacy.skipPartitionSpecTypeValidation") | |
| buildConf("spark.sql.legacy.skipTypeValidationOnAlterPartition") |
|
@cloud-fan @gengliangwang addressed comments |
aff1a46 to
8de1b94
Compare
|
thanks, merging to master! |
…BLE PARTITION test ### What changes were proposed in this pull request? This PR is a followup of #38257 that fixes the test case of v2 when ANSI is enabled. When ANSI mode is enabled, it fails (https://github.com/apache/spark/actions/runs/3353657619/jobs/5556624621) as below: ``` - ALTER TABLE .. ADD PARTITION using V2 catalog V2 command: SPARK-40798: Alter partition should verify partition value - legacy *** FAILED *** (22 milliseconds) org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value 'aaa' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 1) == ALTER TABLE test_catalog.ns.tbl ADD PARTITION (p='aaa') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:164) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2$adapted(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.buildCast(Cast.scala:588) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$1(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:1285) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:526) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.$anonfun$convertToPartIdent$1(ResolvePartitionSpec.scala:81) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.convertToPartIdent(ResolvePartitionSpec.scala:78) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.org$apache$spark$sql$catalyst$analysis$ResolvePartitionSpec$$resolvePartitionSpec(ResolvePartitionSpec.scala:71) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$$anonfun$apply$2$$anonfun$applyOrElse$1.applyOrElse(ResolvePartitionSpec.scala:48) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$$anonfun$apply$2$$anonfun$applyOrElse$1.applyOrElse(ResolvePartitionSpec.scala:41) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:159) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:200) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:200) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:211) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:216) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) ``` ### Why are the changes needed? To recover the broken ANSI build. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran this test case locally. Closes #38443 from HyukjinKwon/SPARK-40798-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
| - Valid hexadecimal strings should include only allowed symbols (0-9A-Fa-f). | ||
| - Valid values for `fmt` are case-insensitive `hex`, `base64`, `utf-8`, `utf8`. | ||
| - Since Spark 3.4, Spark throws only `PartitionsAlreadyExistException` when it creates partitions but some of them exist already. In Spark 3.3 or earlier, Spark can throw either `PartitionsAlreadyExistException` or `PartitionAlreadyExistsException`. | ||
| - Since Spark 3.4, Spark will do validation for partition spec in ALTER PARTITION to follow the behavior of `spark.sql.storeAssignmentPolicy` which may cause an exception if type conversion fails, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column `p` is int type. To restore the legacy behavior, set `spark.sql.legacy.skipPartitionSpecTypeValidation` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conf should be spark.sql.legacy.skipTypeValidationOnAlterPartition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just made a PR :-) #38667
…t migration guide ### What changes were proposed in this pull request? This PR is a followup of #38257 to fix a typo from `spark.sql.legacy.skipPartitionSpecTypeValidation` to `spark.sql.legacy.skipTypeValidationOnAlterPartition`. ### Why are the changes needed? To show users the correct configuration name for legacy behaviours. ### Does this PR introduce _any_ user-facing change? No, doc-only. ### How was this patch tested? N/A Closes #38667 from HyukjinKwon/SPARK-40798-followup3. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…t migration guide ### What changes were proposed in this pull request? This PR is a followup of apache#38257 to fix a typo from `spark.sql.legacy.skipPartitionSpecTypeValidation` to `spark.sql.legacy.skipTypeValidationOnAlterPartition`. ### Why are the changes needed? To show users the correct configuration name for legacy behaviours. ### Does this PR introduce _any_ user-facing change? No, doc-only. ### How was this patch tested? N/A Closes apache#38667 from HyukjinKwon/SPARK-40798-followup3. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…signmentPolicy ### What changes were proposed in this pull request? extract the check insertion field cast methold so that we can do validate patition value at PartitioningUtils.normalizePartitionSpec ### Why are the changes needed? Insertion follow the behavior of config `spark.sql.storeAssignmentPolicy`, which will fail if the value can not cast to target data type by default. Alter partition should also follow it. For example: ```SQL CREATE TABLE t (c int) USING PARQUET PARTITIONED BY(p int); -- This DDL should fail but worked: ALTER TABLE t ADD PARTITION(p='aaa'); -- FAILED which follows spark.sql.storeAssignmentPolicy INSERT INTO t PARTITION(p='aaa') SELECT 1 ``` ### Does this PR introduce _any_ user-facing change? yes, the added partition value will follow the behavior of `storeAssignmentPolicy`. To restore the previous behavior, set spark.sql.legacy.skipPartitionSpecTypeValidation = true; ### How was this patch tested? add test Closes apache#38257 from ulysses-you/verify-partition. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…BLE PARTITION test ### What changes were proposed in this pull request? This PR is a followup of apache#38257 that fixes the test case of v2 when ANSI is enabled. When ANSI mode is enabled, it fails (https://github.com/apache/spark/actions/runs/3353657619/jobs/5556624621) as below: ``` - ALTER TABLE .. ADD PARTITION using V2 catalog V2 command: SPARK-40798: Alter partition should verify partition value - legacy *** FAILED *** (22 milliseconds) org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value 'aaa' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 1) == ALTER TABLE test_catalog.ns.tbl ADD PARTITION (p='aaa') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:164) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2$adapted(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.buildCast(Cast.scala:588) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$1(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:1285) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:526) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.$anonfun$convertToPartIdent$1(ResolvePartitionSpec.scala:81) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.convertToPartIdent(ResolvePartitionSpec.scala:78) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.org$apache$spark$sql$catalyst$analysis$ResolvePartitionSpec$$resolvePartitionSpec(ResolvePartitionSpec.scala:71) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$$anonfun$apply$2$$anonfun$applyOrElse$1.applyOrElse(ResolvePartitionSpec.scala:48) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$$anonfun$apply$2$$anonfun$applyOrElse$1.applyOrElse(ResolvePartitionSpec.scala:41) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:159) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:200) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:200) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:211) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:216) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) ``` ### Why are the changes needed? To recover the broken ANSI build. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran this test case locally. Closes apache#38443 from HyukjinKwon/SPARK-40798-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…t migration guide ### What changes were proposed in this pull request? This PR is a followup of apache#38257 to fix a typo from `spark.sql.legacy.skipPartitionSpecTypeValidation` to `spark.sql.legacy.skipTypeValidationOnAlterPartition`. ### Why are the changes needed? To show users the correct configuration name for legacy behaviours. ### Does this PR introduce _any_ user-facing change? No, doc-only. ### How was this patch tested? N/A Closes apache#38667 from HyukjinKwon/SPARK-40798-followup3. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…t migration guide ### What changes were proposed in this pull request? This PR is a followup of apache#38257 to fix a typo from `spark.sql.legacy.skipPartitionSpecTypeValidation` to `spark.sql.legacy.skipTypeValidationOnAlterPartition`. ### Why are the changes needed? To show users the correct configuration name for legacy behaviours. ### Does this PR introduce _any_ user-facing change? No, doc-only. ### How was this patch tested? N/A Closes apache#38667 from HyukjinKwon/SPARK-40798-followup3. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…t migration guide ### What changes were proposed in this pull request? This PR is a followup of apache#38257 to fix a typo from `spark.sql.legacy.skipPartitionSpecTypeValidation` to `spark.sql.legacy.skipTypeValidationOnAlterPartition`. ### Why are the changes needed? To show users the correct configuration name for legacy behaviours. ### Does this PR introduce _any_ user-facing change? No, doc-only. ### How was this patch tested? N/A Closes apache#38667 from HyukjinKwon/SPARK-40798-followup3. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
extract the check insertion field cast methold so that we can do validate patition value at PartitioningUtils.normalizePartitionSpec
Why are the changes needed?
Insertion follow the behavior of config
spark.sql.storeAssignmentPolicy, which will fail if the value can not cast to target data type by default. Alter partition should also follow it. For example:Does this PR introduce any user-facing change?
yes, the added partition value will follow the behavior of
storeAssignmentPolicy.To restore the previous behavior, set spark.sql.legacy.skipPartitionSpecTypeValidation = true;
How was this patch tested?
add test