-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11246] [SQL] Table cache for Parquet broken in 1.5 #9326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #44527 has finished for PR 9326 at commit
|
|
Actually, |
|
@yhuai Thanks so much for your comments and suggestions! Let me know what you think. Thank you again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new line above this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Will do.
|
Just a minor comment about the format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@yhuai Just pushed again and also removed some unnecessary imports that I used for debugging today and forgot to remove. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I will. Thanks!
|
@yhuai Just pushed. Please help take a look, thanks! |
|
Test build #44564 has finished for PR 9326 at commit
|
|
Test build #44562 has finished for PR 9326 at commit
|
|
Test build #44563 has finished for PR 9326 at commit
|
|
Test build #44568 has finished for PR 9326 at commit
|
|
LGTM. Merging to master and branch 1.5. |
The root cause is that when spark.sql.hive.convertMetastoreParquet=true by default, the cached InMemoryRelation of the ParquetRelation can not be looked up from the cachedData of CacheManager because the key comparison fails even though it is the same LogicalPlan representing the Subquery that wraps the ParquetRelation. The solution in this PR is overriding the LogicalPlan.sameResult function in Subquery case class to eliminate subquery node first before directly comparing the child (ParquetRelation), which will find the key to the cached InMemoryRelation. Author: xin Wu <[email protected]> Closes #9326 from xwu0226/spark-11246-commit. (cherry picked from commit f7a51de) Signed-off-by: Yin Huai <[email protected]> Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala
|
@yhuai Thank you very much for merging it. |
The root cause is that when spark.sql.hive.convertMetastoreParquet=true by default, the cached InMemoryRelation of the ParquetRelation can not be looked up from the cachedData of CacheManager because the key comparison fails even though it is the same LogicalPlan representing the Subquery that wraps the ParquetRelation. The solution in this PR is overriding the LogicalPlan.sameResult function in Subquery case class to eliminate subquery node first before directly comparing the child (ParquetRelation), which will find the key to the cached InMemoryRelation. Author: xin Wu <[email protected]> Closes apache#9326 from xwu0226/spark-11246-commit. (cherry picked from commit f7a51de) Signed-off-by: Yin Huai <[email protected]> Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala
The root cause is that when spark.sql.hive.convertMetastoreParquet=true by default, the cached InMemoryRelation of the ParquetRelation can not be looked up from the cachedData of CacheManager because the key comparison fails even though it is the same LogicalPlan representing the Subquery that wraps the ParquetRelation.
The solution in this PR is overriding the LogicalPlan.sameResult function in Subquery case class to eliminate subquery node first before directly comparing the child (ParquetRelation), which will find the key to the cached InMemoryRelation.