Skip to content

Conversation

@liancheng
Copy link
Contributor

This PR adds a private[sql] method metadata to SparkPlan, which can be used to describe detail information about a physical plan during visualization. Specifically, this PR uses this method to provide details of PhysicalRDDs translated from a data source relation. For example, a ParquetRelation converted from Hive metastore table default.psrc is now shown as the following screenshot:

image

And here is the screenshot for a regular ParquetRelation (not converted from Hive metastore table) loaded from a really long path:

output

@liancheng
Copy link
Contributor Author

cc @andrewor14 @yhuai

@SparkQA
Copy link

SparkQA commented Nov 26, 2015

Test build #46765 has finished for PR 10004 at commit b6a32c7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 26, 2015

I wouldn't have the top level name be "PhysicalRDD" - that term just makes no sense to users.

I'd consider just putting "Orc" there. If it is a Hive table, just saying "Scan: table name" might work too".

BTW - if the path is super long, is the box going to be expanded to super wide as well?

@liancheng
Copy link
Contributor Author

@rxin Updated, together with a new screenshot in the PR description. Now it shows ParquetRelation and OrcRelation etc. instead of PhysicalRDD. Hive metastore table name is also attached. For the long input paths scenario, yes it makes the box super wide. So in the last commit, I truncate the text if it's longer than 50 characters. This is somewhat brutal, but I'm not sure about the best practice here. Maybe doing the truncation in JS side is a better idea?

@SparkQA
Copy link

SparkQA commented Nov 27, 2015

Test build #46825 has finished for PR 10004 at commit 8cbf0ff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 28, 2015

Test build #46833 has finished for PR 10004 at commit 6925eb3.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 28, 2015

Test build #46835 has finished for PR 10004 at commit 6925eb3.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the paths string because it's now shown as part of the metadata in both simpleString and visualized plan node.

@SparkQA
Copy link

SparkQA commented Nov 28, 2015

Test build #46841 has finished for PR 10004 at commit 5d12dd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Can we hide PushedFilters if there are no pushed filters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the type to this val? Right now it's not super clear what it is

@liancheng liancheng force-pushed the spark-12012.physical-rdd-metadata branch from 5d12dd7 to 6891876 Compare December 1, 2015 09:38
@liancheng
Copy link
Contributor Author

Comments addressed.

@rxin
Copy link
Contributor

rxin commented Dec 1, 2015

Is it possible to only show the short text in the normal box, and then when hover, show the full path? I'm very concerned about long paths.

cc @zsxwing

@SparkQA
Copy link

SparkQA commented Dec 1, 2015

Test build #46959 has finished for PR 10004 at commit 6891876.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

@rxin One problem I found is that, the tooltip box is of fixed width. This means that the full path can never be fully observed if it's too long. But I agree that we should keep the normal box relatively smaller. I'll just remove the metadata entries in the normal box.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47050 has finished for PR 10004 at commit 863df7c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nongli
Copy link
Contributor

nongli commented Dec 8, 2015

Can you add a screenshot if the path is really long?

@liancheng
Copy link
Contributor Author

@nongli Uploaded new screenshot and rebased to master.

@liancheng liancheng force-pushed the spark-12012.physical-rdd-metadata branch from 863df7c to 02448a6 Compare December 9, 2015 08:49
@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47422 has finished for PR 10004 at commit 02448a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor Author

I know that the current visual effect of super long paths isn't perfect, but at least it doesn't introduce super wide plan nodes. And this is also how our current UI handles super long text.

@rxin
Copy link
Contributor

rxin commented Dec 9, 2015

I'm going to merge this into master/1.6.

@asfgit asfgit closed this in 6e1c55e Dec 9, 2015
@rxin
Copy link
Contributor

rxin commented Dec 9, 2015

hm i had trouble cherrypicking it into 1.6. Can you do it yourself?

liancheng added a commit to liancheng/spark that referenced this pull request Dec 10, 2015
…visualizing SQL query plan

This PR adds a `private[sql]` method `metadata` to `SparkPlan`, which can be used to describe detail information about a physical plan during visualization. Specifically, this PR uses this method to provide details of `PhysicalRDD`s translated from a data source relation. For example, a `ParquetRelation` converted from Hive metastore table `default.psrc` is now shown as the following screenshot:

![image](https://cloud.githubusercontent.com/assets/230655/11526657/e10cb7e6-9916-11e5-9afa-f108932ec890.png)

And here is the screenshot for a regular `ParquetRelation` (not converted from Hive metastore table) loaded from a really long path:

![output](https://cloud.githubusercontent.com/assets/230655/11680582/37c66460-9e94-11e5-8f50-842db5309d5a.png)

Author: Cheng Lian <[email protected]>

Closes apache#10004 from liancheng/spark-12012.physical-rdd-metadata.
@liancheng
Copy link
Contributor Author

Since there's a conflict and 1.6 is in RC phase, I opened #10250 to backport this one to branch-1.6, so that it's tested on Jenkins.

@liancheng liancheng deleted the spark-12012.physical-rdd-metadata branch December 10, 2015 08:13
asfgit pushed a commit that referenced this pull request Dec 10, 2015
…tadata when visualizing SQL query plan

This PR backports PR #10004 to branch-1.6

It adds a private[sql] method metadata to SparkPlan, which can be used to describe detail information about a physical plan during visualization. Specifically, this PR uses this method to provide details of PhysicalRDDs translated from a data source relation.

Author: Cheng Lian <[email protected]>

Closes #10250 from liancheng/spark-12012.for-1.6.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants