Revert " Closes #10 from gengliangwang/revert." #11

gengliangwang · 2022-09-06T23:41:32Z

This reverts commit b72c63e.

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

This reverts commit b72c63e.

…n properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#45214 from zhengruifeng/connect_fix_read_join. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly cherry-pick bugfix apache#45214 to 3.5 ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#46291 from zhengruifeng/connect_fix_read_join_35. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>

…r `postgreSQL/float4.sql` and `postgreSQL/int8.sql` ### What changes were proposed in this pull request? This pr regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` to fix Java 21 daily test. ### Why are the changes needed? Fix Java 21 daily test: - https://github.com/apache/spark/actions/runs/10823897095/job/30030200710 ``` [info] - postgreSQL/float4.sql *** FAILED *** (1 second, 100 milliseconds) [info] postgreSQL/float4.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]expression" : "'N A ...", but got "...arameters" : { [info] "[]expression" : "'N A ..." Result did not match for query #11 [info] SELECT float('N A N') (SQLQueryTestSuite.scala:663) ... [info] - postgreSQL/int8.sql *** FAILED *** (2 seconds, 474 milliseconds) [info] postgreSQL/int8.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]sourceType" : "\"BIG...", but got "...arameters" : { [info] "[]sourceType" : "\"BIG..." Result did not match for query apache#66 [info] SELECT CAST(q1 AS int) FROM int8_tbl WHERE q2 <> 456 (SQLQueryTestSuite.scala:663) ... [info] *** 2 TESTS FAILED *** [error] Failed: Total 3559, Failed 2, Errors 0, Passed 3557, Ignored 4 [error] Failed tests: [error] org.apache.spark.sql.SQLQueryTestSuite [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass Github Acitons - Manual checked: `build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" with Java 21, all test passed ` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48089 from LuciferYang/SPARK-49578-FOLLOWUP. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…e` building ### What changes were proposed in this pull request? This PR aims to add `libwebp-dev` to recover `spark-rm/Dockerfile` building. ### Why are the changes needed? `Apache Spark` release docker image compilation has been broken for last 7 days due to the SparkR package compilation. - https://github.com/apache/spark/actions/workflows/release.yml - https://github.com/apache/spark/actions/runs/17425825244 ``` #11 559.4 No package 'libwebpmux' found ... #11 559.4 -------------------------- [ERROR MESSAGE] --------------------------- #11 559.4 <stdin>:1:10: fatal error: ft2build.h: No such file or directory #11 559.4 compilation terminated. #11 559.4 -------------------------------------------------------------------- #11 559.4 ERROR: configuration failed for package 'ragg' ``` ### Does this PR introduce _any_ user-facing change? No, this is a fix for Apache Spark release tool. ### How was this patch tested? Manually build. ``` $ cd dev/create-release/spark-rm $ docker build . ``` **BEFORE** ``` ... Dockerfile:83 -------------------- 82 | # See more in SPARK-39959, roxygen2 < 7.2.1 83 | >>> RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', \ 84 | >>> 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', \ 85 | >>> 'ggplot2', 'mvtnorm', 'statmod', 'xml2'), repos='https://cloud.r-project.org/')" && \ 86 | >>> Rscript -e "devtools::install_version('roxygen2', version='7.2.0', repos='https://cloud.r-project.org')" && \ 87 | >>> Rscript -e "devtools::install_version('lintr', version='2.0.1', repos='https://cloud.r-project.org')" && \ 88 | >>> Rscript -e "devtools::install_version('pkgdown', version='2.0.1', repos='https://cloud.r-project.org')" && \ 89 | >>> Rscript -e "devtools::install_version('preferably', version='0.4', repos='https://cloud.r-project.org')" 90 | -------------------- ERROR: failed to build: failed to solve: ``` **AFTER** ``` ... => [ 6/22] RUN add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/' 3.8s => [ 7/22] RUN Rscript -e "install.packages(c('devtools', 'knitr', 'markdown', 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', 892.2s => [ 8/22] RUN add-apt-repository ppa:pypy/ppa 15.3s ... ``` After merging this PR, we can validate via the daily release dry-run CI. - https://github.com/apache/spark/actions/workflows/release.yml ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52290 from dongjoon-hyun/SPARK-53539. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

Revert " Closes #10 from gengliangwang/revert."

0d0744a

This reverts commit b72c63e.

gengliangwang closed this Sep 6, 2022

github-actions bot added BUILD INFRA labels Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert " Closes #10 from gengliangwang/revert." #11

Revert " Closes #10 from gengliangwang/revert." #11

Uh oh!

gengliangwang commented Sep 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Revert " Closes #10 from gengliangwang/revert." #11

Revert " Closes #10 from gengliangwang/revert." #11

Uh oh!

Conversation

gengliangwang commented Sep 6, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant