[SPARK-1914] [SQL] Simplify CountFunction not to traverse to evaluate all child expressions. #861

ueshin · 2014-05-23T10:43:43Z

CountFunction should count up only if the child's evaluated value is not null.

Because it traverses to evaluate all child expressions, even if the child is null, it counts up if one of the all children is not null.

…ons.

AmplabJenkins · 2014-05-23T10:47:58Z

Merged build triggered.

AmplabJenkins · 2014-05-23T10:48:06Z

Merged build started.

AmplabJenkins · 2014-05-23T12:08:52Z

Merged build finished.

AmplabJenkins · 2014-05-23T12:08:53Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15164/

Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala

AmplabJenkins · 2014-05-26T03:32:58Z

Merged build triggered.

AmplabJenkins · 2014-05-26T03:33:05Z

Merged build started.

ueshin · 2014-05-26T03:35:48Z

Merged master to fix conflicts.

AmplabJenkins · 2014-05-26T04:47:04Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-26T04:47:05Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15200/

liancheng · 2014-05-26T07:14:13Z

LGTM. Added steps to reproduce this bug in sbt hive/console in the JIRA issue description.

This should be a blocking issue for Spark 1.0 release. @rxin @marmbrus

liancheng · 2014-05-26T07:15:58Z

And, good catch @ueshin, thanks very much!

rxin · 2014-05-26T07:17:04Z

Thanks. I've merged this in master & branch-1.0.

… all child expressions. `CountFunction` should count up only if the child's evaluated value is not null. Because it traverses to evaluate all child expressions, even if the child is null, it counts up if one of the all children is not null. Author: Takuya UESHIN <[email protected]> Closes #861 from ueshin/issues/SPARK-1914 and squashes the following commits: 3b37315 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-1914 2afa238 [Takuya UESHIN] Simplify CountFunction not to traverse to evaluate all child expressions. (cherry picked from commit d6395d8) Signed-off-by: Reynold Xin <[email protected]>

… all child expressions. `CountFunction` should count up only if the child's evaluated value is not null. Because it traverses to evaluate all child expressions, even if the child is null, it counts up if one of the all children is not null. Author: Takuya UESHIN <[email protected]> Closes apache#861 from ueshin/issues/SPARK-1914 and squashes the following commits: 3b37315 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-1914 2afa238 [Takuya UESHIN] Simplify CountFunction not to traverse to evaluate all child expressions.

Co-authored-by: Egor Krivokon <>

…861) * [SPARK-36183][SQL][FOLLOWUP] Fix push down limit 1 through Aggregate ### What changes were proposed in this pull request? Use `Aggregate.aggregateExpressions` instead of `Aggregate.output` when pushing down limit 1 through Aggregate. For example: ```scala spark.range(10).selectExpr("id % 5 AS a", "id % 5 AS b").write.saveAsTable("t1") spark.sql("SELECT a, b, a AS alias FROM t1 GROUP BY a, b LIMIT 1").explain(true) ``` Before this pr: ``` == Optimized Logical Plan == GlobalLimit 1 +- LocalLimit 1 +- !Project [a#227L, b#228L, alias#226L] +- LocalLimit 1 +- Relation default.t1[a#227L,b#228L] parquet ``` After this pr: ``` == Optimized Logical Plan == GlobalLimit 1 +- LocalLimit 1 +- Project [a#227L, b#228L, a#227L AS alias#226L] +- LocalLimit 1 +- Relation default.t1[a#227L,b#228L] parquet ``` ### Why are the changes needed? Fix bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #35286 from wangyum/SPARK-36183-2. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 9b12571)

Co-authored-by: Egor Krivokon <>

Simplify CountFunction not to traverse to evaluate all child expressi…

2afa238

…ons.

Merge branch 'master' into issues/SPARK-1914

3b37315

Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala

asfgit closed this in d6395d8 May 26, 2014

agirish pushed a commit to HPEEzmeral/apache-spark that referenced this pull request May 5, 2022

MapR [SPARK-927] Update Hive in Spark-3.1.2 (apache#861)

38158e3

Co-authored-by: Egor Krivokon <>

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024

MapR [SPARK-927] Update Hive in Spark-3.1.2 (apache#861)

12e6c0b

Co-authored-by: Egor Krivokon <>

mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025

MapR [SPARK-927] Update Hive in Spark-3.1.2 (apache#861)

502bae3

Co-authored-by: Egor Krivokon <>

[SPARK-1914] [SQL] Simplify CountFunction not to traverse to evaluate all child expressions. #861

[SPARK-1914] [SQL] Simplify CountFunction not to traverse to evaluate all child expressions. #861

Uh oh!

Conversation

ueshin commented May 23, 2014

Uh oh!

AmplabJenkins commented May 23, 2014

Uh oh!

AmplabJenkins commented May 23, 2014

Uh oh!

AmplabJenkins commented May 23, 2014

Uh oh!

AmplabJenkins commented May 23, 2014

Uh oh!

AmplabJenkins commented May 26, 2014

Uh oh!

AmplabJenkins commented May 26, 2014

Uh oh!

ueshin commented May 26, 2014

Uh oh!

AmplabJenkins commented May 26, 2014

Uh oh!

AmplabJenkins commented May 26, 2014

Uh oh!

liancheng commented May 26, 2014

Uh oh!

liancheng commented May 26, 2014

Uh oh!

rxin commented May 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants