Skip to content

Conversation

@wankunde
Copy link
Contributor

What changes were proposed in this pull request?

This PR try to improve InferFiltersFromConstraints performance via avoid generating too many constraints.

For example:

  test("Expression explosion when analyze test") {
    RuleExecutor.resetMetrics()
    Seq((1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
      .toDF("a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
        "k", "l", "m", "n")
      .write.saveAsTable("test")
    val df = spark.table("test")
    val df2 = df.filter("a+b+c+d+e+f+g+h+i+j+k+l+m+n > 100")
    val df3 = df2.select('a as 'a1, 'b as 'b1,
      'c as 'c1, 'd as 'd1, 'e as 'e1, 'f as 'f1,
      'g as 'g1, 'h as 'h1, 'i as 'i1, 'j as 'j1,
      'k as 'k1, 'l as 'l1, 'm as 'm1, 'n as 'n1)
    val df4 = df3.join(df2, df3("a1") === df2("a"))
    df4.explain(true)
    logWarning(RuleExecutor.dumpTimeSpent())
  }

Why are the changes needed?

Improve InferFiltersFromConstraints performance

Before this PR:

=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1187
Total time: 5.022786805 seconds

Rule                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Effective Time / Total Time                     Effective Runs / Total Runs                    

org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               4528820409 / 4529498144                         1 / 2                                          
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      0 / 38907142                                    0 / 13                                         
Combined[org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$InConversion, org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings, org.apache.spark.sql.catalyst.analysis.DecimalPrecision, org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$FunctionArgumentConversion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$ConcatCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$MapZipWithCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$EltCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CaseWhenCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$IfCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$StackCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$Division, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$IntegralDivision, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$ImplicitTypeCasts, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$DateTimeOperations, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$WindowFrameCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$StringLiteralCoercion] 0 / 30035714                                    0 / 13                                         
org.apache.spark.sql.execution.datasources.SchemaPruning                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          0 / 20202429                                    0 / 2                                          
org.apache.spark.sql.execution.datasources.PreprocessTableCreation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                0 / 15898208                                    0 / 8                                          
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 7497131 / 15098789                              2 / 13                                         
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  11633805 / 13755605                             1 / 13                                         

After this PR:

=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 1187
Total time: 0.559125361 seconds

Rule                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Effective Time / Total Time                     Effective Runs / Total Runs                    

org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               44387973 / 45044872                             1 / 2                                          
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      0 / 40652311                                    0 / 13                                         
Combined[org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$InConversion, org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings, org.apache.spark.sql.catalyst.analysis.DecimalPrecision, org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$FunctionArgumentConversion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$ConcatCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$MapZipWithCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$EltCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CaseWhenCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$IfCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$StackCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$Division, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$IntegralDivision, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$ImplicitTypeCasts, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$DateTimeOperations, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$WindowFrameCoercion, org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$StringLiteralCoercion] 0 / 30068620                                    0 / 13                                         
org.apache.spark.sql.execution.datasources.SchemaPruning                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          0 / 20810353                                    0 / 2                                          
org.apache.spark.sql.execution.datasources.PreprocessTableCreation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                0 / 19485336                                    0 / 8                                          
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 8476540 / 16209891                              2 / 13                                         
org.apache.spark.sql.execution.datasources.FindDataSourceTable                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    10826285 / 14306609                             1 / 13                                         
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  11935867 / 14163328                             1 / 13                                         

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Exists Unit tests.

@github-actions github-actions bot added the SQL label May 12, 2021
@wankunde wankunde changed the title Improve InferFiltersFromConstraints rule performance when parsing spark sql [SPARK-35379][SQL]Improve InferFiltersFromConstraints rule performance when parsing spark sql May 12, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@wankunde
Copy link
Contributor Author

Pull request update check was failure due to Java 11 build with Maven

I think this error has nothing to do with this PR.

Any one can help me ?

Exception in thread "main" java.lang.StackOverflowError
scala.reflect.internal.Trees$UnderConstructionTransformer.transform$(Trees.scala:1740)
	at scala.reflect.internal.Trees$UnderConstructionTransformer.transform(Trees.scala:1741)
scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.transform(ExplicitOuter.scala:289)
	at scala.reflect.internal.Trees$UnderConstructionTransformer.transform$(Trees.scala:1740)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:477)
	at scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.transform(ExplicitOuter.scala:289)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:330)
	at scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:477)
scala.reflect.internal.Trees.itransform(Trees.scala:1383)
	at scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:330)
scala.reflect.internal.Trees.itransform$(Trees.scala:1374)
	at scala.reflect.api.Trees$Transformer.$anonfun$transformStats$1(Trees.scala:2597)
scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
	at scala.reflect.api.Trees$Transformer.transformStats(Trees.scala:2595)
scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
	at scala.reflect.internal.Trees.itransform(Trees.scala:1404)
scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
	at scala.reflect.internal.Trees.itransform$(Trees.scala:1374)
scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:51)
	at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.scala$reflect$internal$Trees$UnderConstructionTransformer$$super$transform(ExplicitOuter.scala:212)
	at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
scala.reflect.internal.Trees$UnderConstructionTransformer.transform(Trees.scala:1745)
	at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
scala.reflect.internal.Trees$UnderConstructionTransformer.transform$(Trees.scala:1740)
	at scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:51)
scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.transform(ExplicitOuter.scala:289)
	at scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.scala$reflect$internal$Trees$UnderConstructionTransformer$$super$transform(ExplicitOuter.scala:212)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:437)
	at scala.reflect.internal.Trees$UnderConstructionTransformer.transform(Trees.scala:1745)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:330)
	at scala.reflect.internal.Trees$UnderConstructionTransformer.transform$(Trees.scala:1740)
scala.reflect.internal.Trees.itransform(Trees.scala:1410)
	at scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.transform(ExplicitOuter.scala:289)
scala.reflect.internal.Trees.itransform$(Trees.scala:1374)
	at scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:477)
scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
	at scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:330)
scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
	at scala.reflect.internal.Trees.itransform(Trees.scala:1383)
scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
	at scala.reflect.internal.Trees.itransform$(Trees.scala:1374)
scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:51)
	at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.scala$reflect$internal$Trees$UnderConstructionTransformer$$super$transform(ExplicitOuter.scala:212)
	at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:28)
scala.reflect.internal.Trees$UnderConstructionTransformer.transform(Trees.scala:1745)
	at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2563)
scala.reflect.internal.Trees$UnderConstructionTransformer.transform$(Trees.scala:1740)
	at scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:51)
scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.transform(ExplicitOuter.scala:289)
	at scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.scala$reflect$internal$Trees$UnderConstructionTransformer$$super$transform(ExplicitOuter.scala:212)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:477)
	at scala.reflect.internal.Trees$UnderConstructionTransformer.transform(Trees.scala:1745)
scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:330)
	at scala.reflect.internal.Trees$UnderConstructionTransformer.transform$(Trees.scala:1740)
	at scala.tools.nsc.transform.ExplicitOuter$OuterPathTransformer.transform(ExplicitOuter.scala:289)
	at scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:437)
	at scala.tools.nsc.transform.ExplicitOuter$ExplicitOuterTransformer.transform(ExplicitOuter.scala:330)
	at scala.reflect.internal.Trees.itransform(Trees.scala:1410)
	at scala.reflect.internal.Trees.itransform$(Trees.scala:1374)

@wankunde
Copy link
Contributor Author

It seems to be a duplicate of #30894 , which has not been merged.

@tanelk @maropu @HyukjinKwon @gengliangwang

@HyukjinKwon
Copy link
Member

Can you help reviews on #30894 since the PR is already open and in progress?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Aug 21, 2021
@github-actions github-actions bot closed this Aug 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants