Skip to content

Conversation

@xwu0226
Copy link
Contributor

@xwu0226 xwu0226 commented Oct 28, 2015

The root cause is that when spark.sql.hive.convertMetastoreParquet=true by default, the cached InMemoryRelation of the ParquetRelation can not be looked up from the cachedData of CacheManager because the key comparison fails even though it is the same LogicalPlan representing the Subquery that wraps the ParquetRelation.
The solution in this PR is overriding the LogicalPlan.sameResult function in Subquery case class to eliminate subquery node first before directly comparing the child (ParquetRelation), which will find the key to the cached InMemoryRelation.

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Oct 28, 2015

Test build #44527 has finished for PR 9326 at commit 402d8e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Oct 28, 2015

Actually, sameResult will remove SubQuery nodes. I think the problem is that cleanRight.cleanArgs == cleanLeft.cleanArgs returns false when two LogicalRelations are considered generating the same result (because of expectedOutputAttributes that we added in #8889 ). I think the right fix is to correctly override the cleanArgs field of LogicalRelation to exclude expectedOutputAttributes.

@xwu0226
Copy link
Contributor Author

xwu0226 commented Oct 29, 2015

@yhuai Thanks so much for your comments and suggestions!
I have thought through your suggestions and override cleanArgs field in LogicalRelation, that returns the BaseRelation (in this case, ParquetRelation) without the expectedOutputAttributes.. I removed the original fix from Subquery and pushed the change.

Let me know what you think. Thank you again!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new line above this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will do.

@yhuai
Copy link
Contributor

yhuai commented Oct 29, 2015

Just a minor comment about the format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xwu0226
Copy link
Contributor Author

xwu0226 commented Oct 29, 2015

@yhuai Just pushed again and also removed some unnecessary imports that I used for debugging today and forgot to remove.
Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will. Thanks!

@xwu0226
Copy link
Contributor Author

xwu0226 commented Oct 29, 2015

@yhuai Just pushed. Please help take a look, thanks!

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44564 has finished for PR 9326 at commit 3f25d6a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44562 has finished for PR 9326 at commit 5ed1429.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44563 has finished for PR 9326 at commit fb70954.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 29, 2015

Test build #44568 has finished for PR 9326 at commit 88edd14.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Oct 29, 2015

LGTM. Merging to master and branch 1.5.

@asfgit asfgit closed this in f7a51de Oct 29, 2015
asfgit pushed a commit that referenced this pull request Oct 29, 2015
The root cause is that when spark.sql.hive.convertMetastoreParquet=true by default, the cached InMemoryRelation of the ParquetRelation can not be looked up from the cachedData of CacheManager because the key comparison fails even though it is the same LogicalPlan representing the Subquery that wraps the ParquetRelation.
The solution in this PR is overriding the LogicalPlan.sameResult function in Subquery case class to eliminate subquery node first before directly comparing the child (ParquetRelation), which will find the key  to the cached InMemoryRelation.

Author: xin Wu <[email protected]>

Closes #9326 from xwu0226/spark-11246-commit.

(cherry picked from commit f7a51de)
Signed-off-by: Yin Huai <[email protected]>

Conflicts:
	sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala
@xwu0226
Copy link
Contributor Author

xwu0226 commented Oct 29, 2015

@yhuai Thank you very much for merging it.

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 3, 2015
The root cause is that when spark.sql.hive.convertMetastoreParquet=true by default, the cached InMemoryRelation of the ParquetRelation can not be looked up from the cachedData of CacheManager because the key comparison fails even though it is the same LogicalPlan representing the Subquery that wraps the ParquetRelation.
The solution in this PR is overriding the LogicalPlan.sameResult function in Subquery case class to eliminate subquery node first before directly comparing the child (ParquetRelation), which will find the key  to the cached InMemoryRelation.

Author: xin Wu <[email protected]>

Closes apache#9326 from xwu0226/spark-11246-commit.

(cherry picked from commit f7a51de)
Signed-off-by: Yin Huai <[email protected]>

Conflicts:
	sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants