[SPARK-40798][SQL] Alter partition should verify value follow storeAssignmentPolicy #38257

ulysses-you · 2022-10-14T10:08:00Z

What changes were proposed in this pull request?

extract the check insertion field cast methold so that we can do validate patition value at PartitioningUtils.normalizePartitionSpec

Why are the changes needed?

Insertion follow the behavior of config spark.sql.storeAssignmentPolicy, which will fail if the value can not cast to target data type by default. Alter partition should also follow it. For example:

CREATE TABLE t (c int) USING PARQUET PARTITIONED BY(p int);

-- This DDL should fail but worked:
ALTER TABLE t ADD PARTITION(p='aaa'); 

-- FAILED which follows spark.sql.storeAssignmentPolicy
INSERT INTO t PARTITION(p='aaa') SELECT 1

Does this PR introduce any user-facing change?

yes, the added partition value will follow the behavior of storeAssignmentPolicy.

To restore the previous behavior, set spark.sql.legacy.skipPartitionSpecTypeValidation = true;

How was this patch tested?

add test

HyukjinKwon · 2022-10-17T03:13:49Z

cc @gengliangwang

ulysses-you · 2022-10-17T10:40:50Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

change partition column from int type to string to make the test work

ulysses-you · 2022-10-18T01:21:48Z

also cc @MaxGekk @cloud-fan

gengliangwang · 2022-10-18T05:28:51Z

@ulysses-you Thanks for the ping. Could you add a legacy flag and migration guide?

ulysses-you · 2022-10-18T08:54:35Z

@gengliangwang added legacy config and migration guide

cloud-fan · 2022-10-18T14:38:36Z

docs/sql-migration-guide.md

Suggested change

- Since Spark 3.4, Spark will do validation for partition spec follows the behavior of `spark.sql.storeAssignmentPolicy` which may cause exception if type conversion error, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column p is int type. To restore the legacy behavior, set `spark.sql.legacy.skipPartitionSpecTypeValidation` to `true`.

- Since Spark 3.4, Spark will do validation for partition spec in ADD PARTITION to follow the behavior of `spark.sql.storeAssignmentPolicy` which may cause an exception if type conversion fails, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column `p` is int type. To restore the legacy behavior, set `spark.sql.legacy.skipPartitionSpecTypeValidation` to `true`.

cloud-fan · 2022-10-18T14:38:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Suggested change

.doc("When true, skip do validation for partition spec. E.g., " +

.doc("When true, skip validation for partition spec in ADD PARTITION. E.g., " +

cloud-fan · 2022-10-18T14:42:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/util/PartitioningUtils.scala

the name is weird as the input is already a String. How about normalizePartitionStringValue?

cloud-fan · 2022-10-18T14:43:25Z

.../src/test/scala/org/apache/spark/sql/execution/command/AlterTableAddPartitionSuiteBase.scala

let's use checkError

gengliangwang · 2022-10-18T23:04:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Suggested change

val SKIP_PARTITION_SPEC_TYPE_VALIDATION =

val SKIP_TYPE_VALIDATION_ON_ALTER_PARTITION =

gengliangwang · 2022-10-18T23:05:16Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Suggested change

buildConf("spark.sql.legacy.skipPartitionSpecTypeValidation")

buildConf("spark.sql.legacy.skipTypeValidationOnAlterPartition")

ulysses-you · 2022-10-19T02:30:40Z

@cloud-fan @gengliangwang addressed comments

cloud-fan · 2022-10-24T07:20:52Z

thanks, merging to master!

…BLE PARTITION test ### What changes were proposed in this pull request? This PR is a followup of #38257 that fixes the test case of v2 when ANSI is enabled. When ANSI mode is enabled, it fails (https://github.com/apache/spark/actions/runs/3353657619/jobs/5556624621) as below: ``` - ALTER TABLE .. ADD PARTITION using V2 catalog V2 command: SPARK-40798: Alter partition should verify partition value - legacy *** FAILED *** (22 milliseconds) org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value 'aaa' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 1) == ALTER TABLE test_catalog.ns.tbl ADD PARTITION (p='aaa') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:164) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2$adapted(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.buildCast(Cast.scala:588) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$1(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:1285) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:526) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.$anonfun$convertToPartIdent$1(ResolvePartitionSpec.scala:81) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.convertToPartIdent(ResolvePartitionSpec.scala:78) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.org$apache$spark$sql$catalyst$analysis$ResolvePartitionSpec$$resolvePartitionSpec(ResolvePartitionSpec.scala:71) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$$anonfun$apply$2$$anonfun$applyOrElse$1.applyOrElse(ResolvePartitionSpec.scala:48) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$$anonfun$apply$2$$anonfun$applyOrElse$1.applyOrElse(ResolvePartitionSpec.scala:41) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:159) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:200) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:200) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:211) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:216) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) ``` ### Why are the changes needed? To recover the broken ANSI build. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran this test case locally. Closes #38443 from HyukjinKwon/SPARK-40798-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

anchovYu · 2022-11-15T19:44:07Z

docs/sql-migration-guide.md

    - Valid hexadecimal strings should include only allowed symbols (0-9A-Fa-f).
    - Valid values for `fmt` are case-insensitive `hex`, `base64`, `utf-8`, `utf8`.
  - Since Spark 3.4, Spark throws only `PartitionsAlreadyExistException` when it creates partitions but some of them exist already. In Spark 3.3 or earlier, Spark can throw either `PartitionsAlreadyExistException` or `PartitionAlreadyExistsException`.
+  - Since Spark 3.4, Spark will do validation for partition spec in ALTER PARTITION to follow the behavior of `spark.sql.storeAssignmentPolicy` which may cause an exception if type conversion fails, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column `p` is int type. To restore the legacy behavior, set `spark.sql.legacy.skipPartitionSpecTypeValidation` to `true`.


The conf should be spark.sql.legacy.skipTypeValidationOnAlterPartition.

just made a PR :-) #38667

…t migration guide ### What changes were proposed in this pull request? This PR is a followup of #38257 to fix a typo from `spark.sql.legacy.skipPartitionSpecTypeValidation` to `spark.sql.legacy.skipTypeValidationOnAlterPartition`. ### Why are the changes needed? To show users the correct configuration name for legacy behaviours. ### Does this PR introduce _any_ user-facing change? No, doc-only. ### How was this patch tested? N/A Closes #38667 from HyukjinKwon/SPARK-40798-followup3. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…t migration guide ### What changes were proposed in this pull request? This PR is a followup of apache#38257 to fix a typo from `spark.sql.legacy.skipPartitionSpecTypeValidation` to `spark.sql.legacy.skipTypeValidationOnAlterPartition`. ### Why are the changes needed? To show users the correct configuration name for legacy behaviours. ### Does this PR introduce _any_ user-facing change? No, doc-only. ### How was this patch tested? N/A Closes apache#38667 from HyukjinKwon/SPARK-40798-followup3. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…signmentPolicy ### What changes were proposed in this pull request? extract the check insertion field cast methold so that we can do validate patition value at PartitioningUtils.normalizePartitionSpec ### Why are the changes needed? Insertion follow the behavior of config `spark.sql.storeAssignmentPolicy`, which will fail if the value can not cast to target data type by default. Alter partition should also follow it. For example: ```SQL CREATE TABLE t (c int) USING PARQUET PARTITIONED BY(p int); -- This DDL should fail but worked: ALTER TABLE t ADD PARTITION(p='aaa'); -- FAILED which follows spark.sql.storeAssignmentPolicy INSERT INTO t PARTITION(p='aaa') SELECT 1 ``` ### Does this PR introduce _any_ user-facing change? yes, the added partition value will follow the behavior of `storeAssignmentPolicy`. To restore the previous behavior, set spark.sql.legacy.skipPartitionSpecTypeValidation = true; ### How was this patch tested? add test Closes apache#38257 from ulysses-you/verify-partition. Authored-by: ulysses-you <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…BLE PARTITION test ### What changes were proposed in this pull request? This PR is a followup of apache#38257 that fixes the test case of v2 when ANSI is enabled. When ANSI mode is enabled, it fails (https://github.com/apache/spark/actions/runs/3353657619/jobs/5556624621) as below: ``` - ALTER TABLE .. ADD PARTITION using V2 catalog V2 command: SPARK-40798: Alter partition should verify partition value - legacy *** FAILED *** (22 milliseconds) org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value 'aaa' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. == SQL(line 1, position 1) == ALTER TABLE test_catalog.ns.tbl ADD PARTITION (p='aaa') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:164) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51) at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2$adapted(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.buildCast(Cast.scala:588) at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$1(Cast.scala:927) at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:1285) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:526) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.$anonfun$convertToPartIdent$1(ResolvePartitionSpec.scala:81) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.convertToPartIdent(ResolvePartitionSpec.scala:78) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$.org$apache$spark$sql$catalyst$analysis$ResolvePartitionSpec$$resolvePartitionSpec(ResolvePartitionSpec.scala:71) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$$anonfun$apply$2$$anonfun$applyOrElse$1.applyOrElse(ResolvePartitionSpec.scala:48) at org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec$$anonfun$apply$2$$anonfun$applyOrElse$1.applyOrElse(ResolvePartitionSpec.scala:41) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:159) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:200) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:200) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:211) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:216) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) ``` ### Why are the changes needed? To recover the broken ANSI build. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ran this test case locally. Closes apache#38443 from HyukjinKwon/SPARK-40798-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

…t migration guide ### What changes were proposed in this pull request? This PR is a followup of apache#38257 to fix a typo from `spark.sql.legacy.skipPartitionSpecTypeValidation` to `spark.sql.legacy.skipTypeValidationOnAlterPartition`. ### Why are the changes needed? To show users the correct configuration name for legacy behaviours. ### Does this PR introduce _any_ user-facing change? No, doc-only. ### How was this patch tested? N/A Closes apache#38667 from HyukjinKwon/SPARK-40798-followup3. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

github-actions bot added the SQL label Oct 14, 2022

ulysses-you force-pushed the verify-partition branch from bc270cc to 997f078 Compare October 17, 2022 02:59

ulysses-you force-pushed the verify-partition branch from 997f078 to 92a7f42 Compare October 17, 2022 07:16

ulysses-you commented Oct 17, 2022

View reviewed changes

github-actions bot added the DOCS label Oct 18, 2022

cloud-fan reviewed Oct 18, 2022

View reviewed changes

.../src/test/scala/org/apache/spark/sql/execution/command/AlterTableAddPartitionSuiteBase.scala Outdated

Copy link

Contributor

cloud-fan Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use checkError

cloud-fan approved these changes Oct 18, 2022

View reviewed changes

gengliangwang reviewed Oct 18, 2022

View reviewed changes

ulysses-you added 4 commits October 21, 2022 09:40

Alter partition should verify value follow storeAssignmentPolicy

e8c39f8

legacy and migration

2ad44b2

address comment

f950ffc

fix test

8de1b94

ulysses-you force-pushed the verify-partition branch from aff1a46 to 8de1b94 Compare October 21, 2022 01:40

cloud-fan closed this in 9140795 Oct 24, 2022

ulysses-you deleted the verify-partition branch October 27, 2022 02:24

HyukjinKwon mentioned this pull request Oct 31, 2022

[SPARK-40798][SQL][TESTS][FOLLOW-UP] Disable ANSI mode in v2 ALTER TABLE PARTITION test #38443

Closed

anchovYu reviewed Nov 15, 2022

View reviewed changes

HyukjinKwon mentioned this pull request Nov 16, 2022

[SPARK-40798][DOCS][FOLLOW-UP] Fix a typo in the configuration name at migration guide #38667

Closed

	- Since Spark 3.4, Spark will do validation for partition spec follows the behavior of `spark.sql.storeAssignmentPolicy` which may cause exception if type conversion error, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column p is int type. To restore the legacy behavior, set `spark.sql.legacy.skipPartitionSpecTypeValidation` to `true`.
	- Since Spark 3.4, Spark will do validation for partition spec in ADD PARTITION to follow the behavior of `spark.sql.storeAssignmentPolicy` which may cause an exception if type conversion fails, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column `p` is int type. To restore the legacy behavior, set `spark.sql.legacy.skipPartitionSpecTypeValidation` to `true`.

	.doc("When true, skip do validation for partition spec. E.g., " +
	.doc("When true, skip validation for partition spec in ADD PARTITION. E.g., " +

	val SKIP_PARTITION_SPEC_TYPE_VALIDATION =
	val SKIP_TYPE_VALIDATION_ON_ALTER_PARTITION =

	buildConf("spark.sql.legacy.skipPartitionSpecTypeValidation")
	buildConf("spark.sql.legacy.skipTypeValidationOnAlterPartition")

[SPARK-40798][SQL] Alter partition should verify value follow storeAssignmentPolicy #38257

[SPARK-40798][SQL] Alter partition should verify value follow storeAssignmentPolicy #38257

Uh oh!

Conversation

ulysses-you commented Oct 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HyukjinKwon commented Oct 17, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ulysses-you commented Oct 18, 2022

Uh oh!

gengliangwang commented Oct 18, 2022

Uh oh!

ulysses-you commented Oct 18, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ulysses-you commented Oct 19, 2022

Uh oh!

cloud-fan commented Oct 24, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ulysses-you commented Oct 14, 2022 •

edited

Loading