[SPARK-6943][WIP][Alternative] Show RDD DAG visualization on stage UI #5728

andrewor14 · 2015-04-27T23:53:01Z

Closing this because the necessary closure cleaner fixes that #5729 blocks on are now merged.

This commit provides a mechanism to set and unset the call scope around each RDD operation defined in RDD.scala. This is useful for tagging an RDD with the scope in which it is created. This will be extended to similar methods in SparkContext.scala and other relevant files in a future commit.

This includes the scope field that we added in previous commits, and the parent IDs for tracking the lineage through the listener API.

It turns out that the previous scope information is insufficient for producing a valid dot file. In particular, the scope hierarchy was missing, but crucial to differentiate between a parent RDD being in the same encompassing scope and it being in a completely distinct scope. Also, unique scope identifiers are needed to simplify the code significantly. This commit further adds the translation logic in a UI listener that converts RDDInfos to dot files.

The previous "working" implementation frequently ran into NotSerializableExceptions. Why? ClosureCleaner doesn't like closures being wrapped in other closures, and these closures are simply not cleaned (details are intentionally omitted here). This commit reimplements scoping through annotations. All methods that should be scoped are now annotated with @RDDScope. Then, on creation, each RDD derives its scope from the stack trace, similar to how it derives its call site. This is the cleanest approach that bypasses NotSerializableExceptions with least significant limitations.

Just a small code re-organization.

Before this commit, this patch relies on a JavaScript version of GraphViz that was compiled from C. Even the minified version of this resource was ~2.5M. The main motivation for switching away from this library, however, is that this is a complete black box of which we have absolutely no control. It is not at all extensible, and if something breaks we will have a hard time understanding why. The new library, dagre-d3, is not perfect either. It does not officially support clustering of nodes; for certain large graphs, the clusters will have a lot of unnecessary whitespace. A few in the dagre-d3 community are looking into a solution, but until then we will have to live with this (minor) inconvenience.

For instance, this adds ability to throw away old stage graphs.

SparkQA · 2015-04-28T00:10:36Z

Test build #31077 has finished for PR 5728 at commit 09d361e.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds no public classes.
This patch removes the following dependencies:
- RoaringBitmap-0.4.5.jar
- activation-1.1.jar
- akka-actor_2.10-2.3.4-spark.jar
- akka-remote_2.10-2.3.4-spark.jar
- akka-slf4j_2.10-2.3.4-spark.jar
- aopalliance-1.0.jar
- arpack_combined_all-0.1.jar
- avro-1.7.7.jar
- breeze-macros_2.10-0.11.2.jar
- breeze_2.10-0.11.2.jar
- chill-java-0.5.0.jar
- chill_2.10-0.5.0.jar
- commons-beanutils-1.7.0.jar
- commons-beanutils-core-1.8.0.jar
- commons-cli-1.2.jar
- commons-codec-1.10.jar
- commons-collections-3.2.1.jar
- commons-compress-1.4.1.jar
- commons-configuration-1.6.jar
- commons-digester-1.8.jar
- commons-httpclient-3.1.jar
- commons-io-2.1.jar
- commons-lang-2.5.jar
- commons-lang3-3.3.2.jar
- commons-math-2.1.jar
- commons-math3-3.4.1.jar
- commons-net-2.2.jar
- compress-lzf-1.0.0.jar
- config-1.2.1.jar
- core-1.1.2.jar
- curator-client-2.4.0.jar
- curator-framework-2.4.0.jar
- curator-recipes-2.4.0.jar
- gmbal-api-only-3.0.0-b023.jar
- grizzly-framework-2.1.2.jar
- grizzly-http-2.1.2.jar
- grizzly-http-server-2.1.2.jar
- grizzly-http-servlet-2.1.2.jar
- grizzly-rcm-2.1.2.jar
- groovy-all-2.3.7.jar
- guava-14.0.1.jar
- guice-3.0.jar
- hadoop-annotations-2.2.0.jar
- hadoop-auth-2.2.0.jar
- hadoop-client-2.2.0.jar
- hadoop-common-2.2.0.jar
- hadoop-hdfs-2.2.0.jar
- hadoop-mapreduce-client-app-2.2.0.jar
- hadoop-mapreduce-client-common-2.2.0.jar
- hadoop-mapreduce-client-core-2.2.0.jar
- hadoop-mapreduce-client-jobclient-2.2.0.jar
- hadoop-mapreduce-client-shuffle-2.2.0.jar
- hadoop-yarn-api-2.2.0.jar
- hadoop-yarn-client-2.2.0.jar
- hadoop-yarn-common-2.2.0.jar
- hadoop-yarn-server-common-2.2.0.jar
- ivy-2.4.0.jar
- jackson-annotations-2.4.0.jar
- jackson-core-2.4.4.jar
- jackson-core-asl-1.8.8.jar
- jackson-databind-2.4.4.jar
- jackson-jaxrs-1.8.8.jar
- jackson-mapper-asl-1.8.8.jar
- jackson-module-scala_2.10-2.4.4.jar
- jackson-xc-1.8.8.jar
- jansi-1.4.jar
- javax.inject-1.jar
- javax.servlet-3.0.0.v201112011016.jar
- javax.servlet-3.1.jar
- javax.servlet-api-3.0.1.jar
- jaxb-api-2.2.2.jar
- jaxb-impl-2.2.3-1.jar
- jcl-over-slf4j-1.7.10.jar
- jersey-client-1.9.jar
- jersey-core-1.9.jar
- jersey-grizzly2-1.9.jar
- jersey-guice-1.9.jar
- jersey-json-1.9.jar
- jersey-server-1.9.jar
- jersey-test-framework-core-1.9.jar
- jersey-test-framework-grizzly2-1.9.jar
- jets3t-0.7.1.jar
- jettison-1.1.jar
- jetty-util-6.1.26.jar
- jline-0.9.94.jar
- jline-2.10.4.jar
- jodd-core-3.6.3.jar
- json4s-ast_2.10-3.2.10.jar
- json4s-core_2.10-3.2.10.jar
- json4s-jackson_2.10-3.2.10.jar
- jsr305-1.3.9.jar
- jtransforms-2.4.0.jar
- jul-to-slf4j-1.7.10.jar
- kryo-2.21.jar
- log4j-1.2.17.jar
- lz4-1.2.0.jar
- management-api-3.0.0-b012.jar
- mesos-0.21.0-shaded-protobuf.jar
- metrics-core-3.1.0.jar
- metrics-graphite-3.1.0.jar
- metrics-json-3.1.0.jar
- metrics-jvm-3.1.0.jar
- minlog-1.2.jar
- netty-3.8.0.Final.jar
- netty-all-4.0.23.Final.jar
- objenesis-1.2.jar
- opencsv-2.3.jar
- oro-2.0.8.jar
- paranamer-2.6.jar
- parquet-column-1.6.0rc3.jar
- parquet-common-1.6.0rc3.jar
- parquet-encoding-1.6.0rc3.jar
- parquet-format-2.2.0-rc1.jar
- parquet-generator-1.6.0rc3.jar
- parquet-hadoop-1.6.0rc3.jar
- parquet-jackson-1.6.0rc3.jar
- protobuf-java-2.4.1.jar
- protobuf-java-2.5.0-spark.jar
- py4j-0.8.2.1.jar
- pyrolite-2.0.1.jar
- quasiquotes_2.10-2.0.1.jar
- reflectasm-1.07-shaded.jar
- scala-compiler-2.10.4.jar
- scala-library-2.10.4.jar
- scala-reflect-2.10.4.jar
- scalap-2.10.4.jar
- scalatest_2.10-2.2.1.jar
- slf4j-api-1.7.10.jar
- slf4j-log4j12-1.7.10.jar
- snappy-java-1.1.1.7.jar
- spark-bagel_2.10-1.4.0-SNAPSHOT.jar
- spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
- spark-core_2.10-1.4.0-SNAPSHOT.jar
- spark-graphx_2.10-1.4.0-SNAPSHOT.jar
- spark-launcher_2.10-1.4.0-SNAPSHOT.jar
- spark-mllib_2.10-1.4.0-SNAPSHOT.jar
- spark-network-common_2.10-1.4.0-SNAPSHOT.jar
- spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
- spark-repl_2.10-1.4.0-SNAPSHOT.jar
- spark-sql_2.10-1.4.0-SNAPSHOT.jar
- spark-streaming_2.10-1.4.0-SNAPSHOT.jar
- spire-macros_2.10-0.7.4.jar
- spire_2.10-0.7.4.jar
- stax-api-1.0.1.jar
- stream-2.7.0.jar
- tachyon-0.6.4.jar
- tachyon-client-0.6.4.jar
- uncommons-maths-1.2.2a.jar
- unused-1.0.0.jar
- xmlenc-0.52.jar
- xz-1.0.jar
- zookeeper-3.4.5.jar

SparkQA · 2015-04-28T01:38:13Z

Test build #726 has finished for PR 5728 at commit 09d361e.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.
This patch adds the following new dependencies:
- RoaringBitmap-0.4.5.jar
- activation-1.1.jar
- akka-actor_2.10-2.3.4-spark.jar
- akka-remote_2.10-2.3.4-spark.jar
- akka-slf4j_2.10-2.3.4-spark.jar
- aopalliance-1.0.jar
- arpack_combined_all-0.1.jar
- avro-1.7.7.jar
- breeze-macros_2.10-0.11.2.jar
- breeze_2.10-0.11.2.jar
- chill-java-0.5.0.jar
- chill_2.10-0.5.0.jar
- commons-beanutils-1.7.0.jar
- commons-beanutils-core-1.8.0.jar
- commons-cli-1.2.jar
- commons-codec-1.10.jar
- commons-collections-3.2.1.jar
- commons-compress-1.4.1.jar
- commons-configuration-1.6.jar
- commons-digester-1.8.jar
- commons-httpclient-3.1.jar
- commons-io-2.1.jar
- commons-lang-2.5.jar
- commons-lang3-3.3.2.jar
- commons-math-2.1.jar
- commons-math3-3.4.1.jar
- commons-net-2.2.jar
- compress-lzf-1.0.0.jar
- config-1.2.1.jar
- core-1.1.2.jar
- curator-client-2.4.0.jar
- curator-framework-2.4.0.jar
- curator-recipes-2.4.0.jar
- gmbal-api-only-3.0.0-b023.jar
- grizzly-framework-2.1.2.jar
- grizzly-http-2.1.2.jar
- grizzly-http-server-2.1.2.jar
- grizzly-http-servlet-2.1.2.jar
- grizzly-rcm-2.1.2.jar
- groovy-all-2.3.7.jar
- guava-14.0.1.jar
- guice-3.0.jar
- hadoop-annotations-2.2.0.jar
- hadoop-auth-2.2.0.jar
- hadoop-client-2.2.0.jar
- hadoop-common-2.2.0.jar
- hadoop-hdfs-2.2.0.jar
- hadoop-mapreduce-client-app-2.2.0.jar
- hadoop-mapreduce-client-common-2.2.0.jar
- hadoop-mapreduce-client-core-2.2.0.jar
- hadoop-mapreduce-client-jobclient-2.2.0.jar
- hadoop-mapreduce-client-shuffle-2.2.0.jar
- hadoop-yarn-api-2.2.0.jar
- hadoop-yarn-client-2.2.0.jar
- hadoop-yarn-common-2.2.0.jar
- hadoop-yarn-server-common-2.2.0.jar
- ivy-2.4.0.jar
- jackson-annotations-2.4.0.jar
- jackson-core-2.4.4.jar
- jackson-core-asl-1.8.8.jar
- jackson-databind-2.4.4.jar
- jackson-jaxrs-1.8.8.jar
- jackson-mapper-asl-1.8.8.jar
- jackson-module-scala_2.10-2.4.4.jar
- jackson-xc-1.8.8.jar
- jansi-1.4.jar
- javax.inject-1.jar
- javax.servlet-3.0.0.v201112011016.jar
- javax.servlet-3.1.jar
- javax.servlet-api-3.0.1.jar
- jaxb-api-2.2.2.jar
- jaxb-impl-2.2.3-1.jar
- jcl-over-slf4j-1.7.10.jar
- jersey-client-1.9.jar
- jersey-core-1.9.jar
- jersey-grizzly2-1.9.jar
- jersey-guice-1.9.jar
- jersey-json-1.9.jar
- jersey-server-1.9.jar
- jersey-test-framework-core-1.9.jar
- jersey-test-framework-grizzly2-1.9.jar
- jets3t-0.7.1.jar
- jettison-1.1.jar
- jetty-util-6.1.26.jar
- jline-0.9.94.jar
- jline-2.10.4.jar
- jodd-core-3.6.3.jar
- json4s-ast_2.10-3.2.10.jar
- json4s-core_2.10-3.2.10.jar
- json4s-jackson_2.10-3.2.10.jar
- jsr305-1.3.9.jar
- jtransforms-2.4.0.jar
- jul-to-slf4j-1.7.10.jar
- kryo-2.21.jar
- log4j-1.2.17.jar
- lz4-1.2.0.jar
- management-api-3.0.0-b012.jar
- mesos-0.21.0-shaded-protobuf.jar
- metrics-core-3.1.0.jar
- metrics-graphite-3.1.0.jar
- metrics-json-3.1.0.jar
- metrics-jvm-3.1.0.jar
- minlog-1.2.jar
- netty-3.8.0.Final.jar
- netty-all-4.0.23.Final.jar
- objenesis-1.2.jar
- opencsv-2.3.jar
- oro-2.0.8.jar
- paranamer-2.6.jar
- parquet-column-1.6.0rc3.jar
- parquet-common-1.6.0rc3.jar
- parquet-encoding-1.6.0rc3.jar
- parquet-format-2.2.0-rc1.jar
- parquet-generator-1.6.0rc3.jar
- parquet-hadoop-1.6.0rc3.jar
- parquet-jackson-1.6.0rc3.jar
- protobuf-java-2.4.1.jar
- protobuf-java-2.5.0-spark.jar
- py4j-0.8.2.1.jar
- pyrolite-2.0.1.jar
- quasiquotes_2.10-2.0.1.jar
- reflectasm-1.07-shaded.jar
- scala-compiler-2.10.4.jar
- scala-library-2.10.4.jar
- scala-reflect-2.10.4.jar
- scalap-2.10.4.jar
- scalatest_2.10-2.2.1.jar
- slf4j-api-1.7.10.jar
- slf4j-log4j12-1.7.10.jar
- snappy-java-1.1.1.7.jar
- spark-bagel_2.10-1.4.0-SNAPSHOT.jar
- spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
- spark-core_2.10-1.4.0-SNAPSHOT.jar
- spark-graphx_2.10-1.4.0-SNAPSHOT.jar
- spark-launcher_2.10-1.4.0-SNAPSHOT.jar
- spark-mllib_2.10-1.4.0-SNAPSHOT.jar
- spark-network-common_2.10-1.4.0-SNAPSHOT.jar
- spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
- spark-repl_2.10-1.4.0-SNAPSHOT.jar
- spark-sql_2.10-1.4.0-SNAPSHOT.jar
- spark-streaming_2.10-1.4.0-SNAPSHOT.jar
- spire-macros_2.10-0.7.4.jar
- spire_2.10-0.7.4.jar
- stax-api-1.0.1.jar
- stream-2.7.0.jar
- tachyon-0.6.4.jar
- tachyon-client-0.6.4.jar
- uncommons-maths-1.2.2a.jar
- unused-1.0.0.jar
- xmlenc-0.52.jar
- xz-1.0.jar
- zookeeper-3.4.5.jar

SparkQA · 2015-04-28T01:53:03Z

Test build #31082 has finished for PR 5728 at commit 52187fc.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

andrewor14 · 2015-04-28T02:27:32Z

retest this please

SparkQA · 2015-04-28T03:43:24Z

Test build #31102 has finished for PR 5728 at commit 52187fc.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

andrewor14 · 2015-05-03T00:28:54Z

Closing this in favor of #5729.

Andrew Or added 14 commits April 16, 2015 17:33

Add a few missing scopes to certain RDD methods

a9ed4f9

Expose the necessary information in RDDInfo

5143523

This includes the scope field that we added in previous commits, and the parent IDs for tracking the lineage through the listener API.

First working implementation of visualization with vis.js

f22f337

Revert a few unintended style changes

494d5c2

Move RDD scope util methods and logic to its own file

6a7cdca

Just a small code re-organization.

Merge branch 'master' of github.com:apache/spark into viz

5e22946

Merge branch 'master' of github.com:apache/spark into viz

fe7816f

Fill in documentation + miscellaneous minor changes

8dd5af2

For instance, this adds ability to throw away old stage graphs.

Embed the viz in the UI in a toggleable manner

71281fa

Add ID to node label (minor)

09d361e

andrewor14 changed the title ~~[SPARK-6943][WIP] Show RDD DAG visualization on stage UI~~ [SPARK-6943][WIP][Alternative] Show RDD DAG visualization on stage UI Apr 27, 2015

andrewor14 mentioned this pull request Apr 28, 2015

[SPARK-6943][SPARK-6944] DAG visualization on SparkUI #5729

Closed

Rat excludes

52187fc

andrewor14 closed this May 3, 2015

andrewor14 deleted the viz branch May 3, 2015 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[SPARK-6943][WIP][Alternative] Show RDD DAG visualization on stage UI #5728

[SPARK-6943][WIP][Alternative] Show RDD DAG visualization on stage UI #5728

Uh oh!

andrewor14 commented Apr 27, 2015

Uh oh!

SparkQA commented Apr 28, 2015

Uh oh!

SparkQA commented Apr 28, 2015

Uh oh!

SparkQA commented Apr 28, 2015

Uh oh!

andrewor14 commented Apr 28, 2015

Uh oh!

SparkQA commented Apr 28, 2015

Uh oh!

andrewor14 commented May 3, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[SPARK-6943][WIP][Alternative] Show RDD DAG visualization on stage UI #5728

[SPARK-6943][WIP][Alternative] Show RDD DAG visualization on stage UI #5728

Uh oh!

Conversation

andrewor14 commented Apr 27, 2015

Uh oh!

SparkQA commented Apr 28, 2015

Uh oh!

SparkQA commented Apr 28, 2015

Uh oh!

SparkQA commented Apr 28, 2015

Uh oh!

andrewor14 commented Apr 28, 2015

Uh oh!

SparkQA commented Apr 28, 2015

Uh oh!

andrewor14 commented May 3, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants