[SPARK-22786][SQL] only use AppStatusPlugin in history server #19981

cloud-fan · 2017-12-14T17:37:55Z

What changes were proposed in this pull request?

In #19681 we introduced a new interface called AppStatusPlugin, to register listeners and set up the UI for both live and history UI.

However I think it's an overkill for live UI. For example, we should not register SQLListener if users are not using SQL functions. Previously we register the SQLListener and set up SQL tab when SparkSession is firstly created, which indicates users are going to use SQL functions. But in #19681 , we register the SQL functions during SparkContext creation. The same thing should apply to streaming too.

I think we should keep the previous behavior, and only use this new interface for history server.

To reflect this change, I also rename the new interface to SparkHistoryUIPlugin

This PR also refines the tests for sql listener.

How was this patch tested?

existing tests

cloud-fan · 2017-12-14T17:38:53Z

cc @vanzin @gengliangwang @gatorsmile

vanzin · 2017-12-14T17:41:29Z

I intentionally created that interface to be used both by live applications and the SHS. What actual problem are you running into?

we should not register SQLListener if users are not using SQL functions

Why not? That listener is basically a no-op if you're not running any SQL.

SparkQA · 2017-12-14T17:47:19Z

Test build #84922 has finished for PR 19981 at commit ba38723.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-12-14T17:48:19Z

I don't think AppStatusPlugin is well designed to support both live and history server. The fact that SQLAppStatusPlugin needs flags to decide to register the listener in setupListeners or setupUI looks pretty hacky.

I also not sure why we need a plugin interface for live UI. The SQL and Streaming UI are implemented pretty well and clean. I do agree that an interface for history server is very useful, and that's why we have SparkHistoryListenerFactory before.

vanzin · 2017-12-14T17:51:56Z

I also not sure why we need a plugin interface for live UI.

So your solution to that is to have completely separate code for both cases? I really prefer to have a single place to go to to understand how the listeners and the UI are initialized, even if the SQL plugin implementation is sub-optimal.

Again, you're stating a matter of preference, but you haven't explained what problem this is causing. Not using the interface doesn't mean you get rid of the conditionals, it just means the conditionals are implied by having the code that installs the listener and UI for live applications live in different places.

cloud-fan · 2017-12-14T18:12:57Z

So the problem is, it takes me a while to understand the new code, because of the weird design for AppStatusPlugin. At least we should have a separate plugin interface for live UI. Code readability is important to Spark, we should not sacrifice it without real benefits.

On the other hand, I think it's better to not register a listener if unnecessary. SQLListener is fine because it's a no-op for non-sql events, but can we guarantee it in all other places like streaming? (need confirmation from @zsxwing @marmbrus ).

And this is also an internal behavior change(the timing of registering SQL listeners), which may stop us from catching Spark events in SQL listener in the future. Do we have a good reason to change it? Having a central place for live UI setup seems not a strong reason to me.

vanzin · 2017-12-14T18:18:50Z

Code readability is important to Spark, we should not sacrifice it without real benefits.

I agree, but I also think that duplicating code in disjoint places hurts readability, not helps it.

but can we guarantee it in all other places like streaming

Those are not being installed with this interface, are they?

which may stop us from catching Spark events in SQL listener in the future

The listener is now being installed before it was before (when the context starts, instead of later), so it's less likely that it will miss events.

As I mentioned in the other PR you commented on, I do understand that the code here is not super pretty. But I really don't think going back to the previous way is the right thing.

For example, a way to simplify the plugin code is to change the way visibility of the SQL tab is done. Either always show it (simpler), or add a new method in SparkUITab that says whether the tab is visible, and have the SQL tab override it. Either of those would simplify the plugin code a lot (setupListener would just install the listener, always, setupUI would just add the tab, always).

cloud-fan · 2017-12-14T18:33:01Z

Those are not being installed with this interface, are they?

We are going to, right? Otherwise creating the interface just for SQL doesn't align with your goal to centralize this part.

But I really don't think going back to the previous way is the right thing.

I do think the previous code is cleaner than the current one. This may due to personal tastes so need more feedback from other people.

The listener is now being installed before it was before (when the context starts, instead of later), so it's less likely that it will miss events.

I'm talking about not catching event if users are not using SQL functions. Again this is an internal behavior change that can affect Spark event catching logic inside SQL listener, which may be added in the future. I'm not saying we can't change it, but with a proper reason.

vanzin · 2017-12-14T18:37:45Z

We are going to, right? Otherwise creating the interface just for SQL doesn't align with your goal to centralize this part.

At some point, probably, but supporting streaming UIs in the SHS requires a lot more work before we even reach that point. There are fundamental problems with event logs and streaming apps that need to be resolved first.

not catching event if users are not using SQL functions

Not sure why that would be an issue - or rather, why that's different from the what's always been the case. Even if you have a SparkSession today you can run non-SQL jobs using the underlying context, and those will generate events that will be delivered to the SQL listener, and it has to deal with them (as it does).

I do think the previous code is cleaner than the current one.

What do you think about either of my suggestions to simplify the code?

cloud-fan · 2017-12-14T18:54:27Z

Not sure why that would be an issue - or rather, why that's different from the what's always been the case.

It's possible that people writing Spark applications with Spark SQL dependency, but not using SQL functions(just create SparkContext). This can happen if someone builds a lib based on Spark and Spark SQL, but the users only use the non-SQL APIs.

What do you think about either of my suggestions to simplify the code?

I do think we should only create the SQL tab when users actually use SQL functions. And the same thing should apply to SQL listener.
And previously they were consistent: we register the listener and setup the UI when creating SparkSession. But now they are not: we register the listener during SparkContext creation, and setup the UI after first SQL execution starts.

No offense, but I would reject that PR if I was reviewing it, because of this behavior change.

vanzin · 2017-12-14T18:58:41Z

It's possible that people writing Spark applications with Spark SQL dependency, but not using SQL

Yes, and the SQL listener will ignore all the events, as it should even in the case that SQL was used, because users are allowed to run SQL and non-SQL workloads in the same application.

I do think we should only create the SQL tab when users actually use SQL functions.

Create or show? If it's not shown the user-visible behavior is the same, isn't it? That was the second of my suggestions.

because of this behavior change.

The behavior change is all internal and not visible to users, and that was the goal of the change.

cloud-fan · 2017-12-14T19:17:24Z

Internal behavior change also needs careful review, I'd like to wait for feedback from others.

But is the previous code really that hacky and worth a factor with a new interface? https://github.com/apache/spark/pull/19981/files#diff-42e78d37f5dcb2a1576f83b53bbf4b55R88 this looks pretty nice to me, and so is the streaming one in StreamingManager. If you can find a way to make the code better, I'm totally fine. But before that, can we fall back to the previous code which is obviously better than the current one?

vanzin · 2017-12-14T19:27:37Z

It's less about cleanliness and more about discoverability IMO. Answer the question quickly: where is the SQL UI initialized?

my code: in the AppStatePlugin implementation
your code: it depends. And the two pieces of code don't even live in the same file or share any code.

Again, I'll make the same suggestion as before: if you move the SQL tab visibility calculation to a new method in the SQL tab itself (and add a filter in WebUI.getTabs to only show visible tabs), the code will be simpler than either your or my version, and the user-visible behavior will remain the same.

This would be the plugin code for both live and SHS:

  override def setupListeners(
      conf: SparkConf,
      store: ElementTrackingStore,
      addListenerFn: SparkListener => Unit,
      live: Boolean): Unit = {
    addListenerFn(new SQLAppStatusListener(conf, store, live, None))
  }

  override def setupUI(ui: SparkUI): Unit = {
    val listener = ui.sc.map { /* call LiveListenerBus.findListenersByClass */ }
    new SQLTab(new SQLAppStatusStore(kvstore, Some(listener)), ui)
  }

Plus you can delete a bunch of code from SQLAppStatusListener.

cloud-fan · 2017-12-15T06:58:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusStore.scala

Following the existing convention of Spark SQL, classes under the execution package are meant to be private and doesn't need the private[sql]

cloud-fan · 2017-12-15T07:07:00Z

sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala

I just realized this is important. Previously we can use SharedState.listener to query the SQL status. SharedState is actually a semi-public interface, as it's marked as Unstable in SparkSession.sharedState.

Sometimes for debugging I just type spark.sharedState.listener.xxx in Spark Shell and check some status, but it's impossible now after #19681

There might be some other people like me that love this ability, we should not just remove it for future UI discoverbility(I think it's just SQL and streaming, really not a big gain)

cc @hvanhovell @viirya @gatorsmile

+1, this is behavior change. It was good to have SharedState.listener, which is the only entry to fetch sql metrics without web browser.
Also, SQL is such important and frequently used feature. I don't see any problem to create SQL listener in spark context initialization, which is consistent with SHS V1.

Makes sense to me.

Even if you add the property back, it won't be the same listener. There's no way to keep the old API without keeping all of the old code.

SharedState is actually a semi-public interface

It's in sql.internal. That's not semi-public.

at least it's developer-facing, as a developer I don't care about the naming changing, or API changing, but I just want the same functionality.

Sure, it's fine if you want to expose it. But I'm pointing out that it's pretty weird to expose a class in a ".internal" package through the API. Those are not documented nor go through mima checks, so there's absolutely zero guarantees about them.

That's how it works, those things are not critical to the end users and it's ok to break them if no one objects.

Also this is consistent to SparkContext.statusStore, we need central places to query the core/sql/streaming status.

I'd use @Private instead of @Unstable for the SparkSession method, that's all I'm saying. It more clearly maps to what you and the code are saying.

cloud-fan · 2017-12-15T07:08:45Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala

only set this config for this suite.

I commented on the other PR where you mentioned this, but I still don't get what this is changing. I don't see any global state that is being overriden by override protected def sparkConf. This test suite only extends traits (e.g. SharedSQLContext which extends SharedSparkSession), and those only keep suite-level state, not global state.

There are other tests that do the same thing.

This is also too late to do this; by the time this code runs all listeners have been initialized and read the default value for LIVE_ENTITY_UPDATE_PERIOD.

cloud-fan · 2017-12-15T07:10:59Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala

I prefer the testing style before #19681 , which just call the event handling methods of the listener, instead of indirectly using an intermedia reply bus.

cloud-fan · 2017-12-15T07:11:57Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala

Previously we did not attach the testing listener to spark event bus, fixed now.

cloud-fan · 2017-12-15T07:14:48Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala

ditto, but I don't know why this passed before...

This passed before because the listener was automatically being added to the bus using the plugin interface you've removed in this PR.

In fact this will probably still pass if you restore the LIVE_ENTITY_UPDATE_PERIOD config in the session, since you'll still have a listener in the shared session (it's should still be installed automatically by your code, just not using the plugin).

cloud-fan · 2017-12-15T07:17:00Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala

instead of totally disable this test because of an unimplemented feature, I'd like to still run it, but a little slower.

This test is re-enabled in #19751.

ok let me revert this part

SparkQA · 2017-12-15T08:05:02Z

Test build #84943 has finished for PR 19981 at commit 88fdff2.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-12-15T08:08:38Z

retest this please

viirya · 2017-12-15T09:21:14Z

sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala

There seems a global sql listener at SparkSession before. Now we create new sql listener for each SharedState? I think SharedState should be reused, but not sure if any case we create more than one SharedState.

SharedState is kind of a singleton in Spark SQL. If you look at the old code, the global listener in SparkSession is initialized by SharedState.

SparkQA · 2017-12-15T10:59:23Z

Test build #84947 has finished for PR 19981 at commit 88fdff2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-15T21:07:52Z

Test build #84967 has finished for PR 19981 at commit bc300f9.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-12-16T04:05:02Z

retest this please

SparkQA · 2017-12-16T08:05:02Z

Test build #84994 has finished for PR 19981 at commit bc300f9.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-18T12:27:50Z

Test build #85047 has finished for PR 19981 at commit 9f5d21f.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-12-18T12:37:12Z

retest this please

cloud-fan · 2017-12-18T14:19:49Z

retest this please

SparkQA · 2017-12-18T18:32:48Z

Test build #85064 has finished for PR 19981 at commit 9f5d21f.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-18T19:37:08Z

Test build #85065 has finished for PR 19981 at commit a81917c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-12-18T19:40:59Z

core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala

+ * An interface for creating history listeners(to replay event logs) defined in other modules like
+ * SQL, and setup the UI of the plugin to rebuild the history UI.
+ */
+private[spark] trait SparkHistoryUIPlugin {


This is not a UI plugin. It's also only marginally related to this source file.

It should remain in the .status package. If you really feel strongly about the existing name, you can use a different name (e.g. "AppHistoryServerPlugin" or something that doesn't explicit says "UI" or "Listener").

vanzin · 2017-12-18T19:42:43Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

-  private def statusStore: SQLAppStatusStore = {
-    new SQLAppStatusStore(sparkContext.statusStore.store)
+  protected def currentExecutionIds(): Set[Long] = {
+    spark.sharedState.statusStore.executionsList.map(_.executionId).toSet


You can just call statusStore no?

vanzin · 2017-12-18T19:46:41Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala

      Thread.sleep(100)
+      val executionData = statusStore.executionsList().headOption
+      finished = executionData.isDefined && executionData.get.jobs.values
+        .forall(_ == JobExecutionStatus.SUCCEEDED)


This is not the same check as before. It assumes that onJobEnd is called after onExecutionEnd, which might be the case now, but was explicitly mentioned as not being always the case in the original SQLListener code.

vanzin · 2017-12-18T19:47:32Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala

+    var finished = false
+    while (!finished) {
      Thread.sleep(100)
+      val executionData = statusStore.executionsList().headOption


For consistency with the checks below you should be checking .lastOption.

vanzin · 2017-12-18T19:55:59Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala


-class SQLListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTestUtils {
+
+class SQLAppStatusListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTestUtils {


The reason I didn't rename this class is because it contains tests that have nothing to do with the listener itself (like the test for SPARK-18462), and doing the proper thing (break those tests out into a separate suite) would be too noisy for the original change (and would also be pretty noisy here).

Then it's an existing problem of SQLListenerSuite, as previously it didn't only test SQLListener. This should not stop us from renaming it after we rename SQLListener

vanzin · 2017-12-18T19:58:27Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala

-        val statusStore = new SQLAppStatusStore(sc.statusStore.store)
-        assert(statusStore.executionsList().size <= 50)
+        val statusStore = spark.sharedState.statusStore
+        assert(statusStore.executionsCount() == 200)


This is now wrong, isn't it? The configuration explicitly says "50". Since this test is not being run, you should leave the code as before (with just needed changes, if any, for it to compile).

cloud-fan · 2017-12-19T04:39:40Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

do we really need to limit the UI data for history server? cc @vanzin

Yes, both because it's the old behavior, and to limit the app's history data growth. Also because the UI code itself doesn't scale to arbitrarily large lists of things like jobs and stages.

cloud-fan · 2017-12-19T04:44:30Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala

Previously we didn't test the live data for stages.

SparkQA · 2017-12-19T08:01:06Z

Test build #85091 has finished for PR 19981 at commit 60421ac.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-19T08:02:50Z

Test build #85093 has finished for PR 19981 at commit 5b64f88.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-12-20T03:02:21Z

cc @vanzin any more comments?

vanzin

The test changes are fine; I still don't really agree with your reasoning for changing the way the plugin works, but that's pretty low in my list of things to worry about right now. So go ahead with what you think is best.

vanzin · 2017-12-21T00:29:10Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

  import config._

-  private var sparkVersion = SPARK_VERSION
+  private val sparkVersion = SPARK_VERSION


Actually this is a bug; the version should be read from SparkListenerLogStart when it's in the event log. Feel free to file a separate bug.

filed https://issues.apache.org/jira/browse/SPARK-22854 , I'll do it later

vanzin · 2017-12-21T00:32:10Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLHistoryServerPlugin.scala

+
+  override def setupUI(ui: SparkUI): Unit = {
+    val kvStore = ui.store.store
+    new SQLTab(new SQLAppStatusStore(kvStore), ui)


You shouldn't be adding the UI if there is no SQL-related data in the store.

gengliangwang

LGTM

SparkQA · 2017-12-21T05:08:04Z

Test build #85227 has finished for PR 19981 at commit 94ee3c7.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-12-21T06:09:07Z

retest this please

SparkQA · 2017-12-21T08:05:01Z

Test build #85238 has finished for PR 19981 at commit 94ee3c7.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-12-21T09:15:01Z

retest this please

SparkQA · 2017-12-21T12:09:32Z

Test build #85251 has finished for PR 19981 at commit 94ee3c7.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-12-21T13:18:24Z

retest this please

SparkQA · 2017-12-21T16:49:17Z

Test build #85263 has finished for PR 19981 at commit 94ee3c7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-12-21T17:08:22Z

thanks, merging to master!

cloud-fan force-pushed the listener branch from ba38723 to 88fdff2 Compare December 15, 2017 06:55

cloud-fan commented Dec 15, 2017

View reviewed changes

viirya reviewed Dec 15, 2017

View reviewed changes

cloud-fan force-pushed the listener branch from 88fdff2 to bc300f9 Compare December 15, 2017 16:56

cloud-fan force-pushed the listener branch from bc300f9 to 2e66722 Compare December 18, 2017 06:07

only use AppStatusPlugin in history server

a81917c

cloud-fan force-pushed the listener branch from 9f5d21f to a81917c Compare December 18, 2017 16:06

vanzin reviewed Dec 18, 2017

View reviewed changes

cloud-fan commented Dec 19, 2017

View reviewed changes

Merge remote-tracking branch 'origin/master' into listener

5b64f88

cloud-fan force-pushed the listener branch from 60421ac to 5b64f88 Compare December 19, 2017 04:45

vanzin reviewed Dec 21, 2017

View reviewed changes

address comment

94ee3c7

gengliangwang approved these changes Dec 21, 2017

View reviewed changes

asfgit closed this in d3a1d95 Dec 21, 2017

kuwii mentioned this pull request Oct 27, 2020

[SPARK-33249][CORE][UI] Add status plugin for live application #30158

Closed


		class SQLListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTestUtils {

		class SQLAppStatusListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTestUtils {

[SPARK-22786][SQL] only use AppStatusPlugin in history server #19981

[SPARK-22786][SQL] only use AppStatusPlugin in history server #19981

Uh oh!

Conversation

cloud-fan commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

cloud-fan commented Dec 14, 2017

Uh oh!

vanzin commented Dec 14, 2017

Uh oh!

SparkQA commented Dec 14, 2017

Uh oh!

cloud-fan commented Dec 14, 2017

Uh oh!

vanzin commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Dec 14, 2017

Uh oh!

vanzin commented Dec 14, 2017

Uh oh!

cloud-fan commented Dec 14, 2017

Uh oh!

vanzin commented Dec 14, 2017

Uh oh!

cloud-fan commented Dec 14, 2017

Uh oh!

vanzin commented Dec 14, 2017

Uh oh!

cloud-fan commented Dec 14, 2017

Uh oh!

vanzin commented Dec 14, 2017

Uh oh!

cloud-fan Dec 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gengliangwang Dec 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin Dec 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vanzin Dec 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 14, 2017 •

edited

Loading

vanzin commented Dec 14, 2017 •

edited

Loading

cloud-fan Dec 15, 2017 •

edited

Loading

gengliangwang Dec 15, 2017 •

edited

Loading

vanzin Dec 15, 2017 •

edited

Loading

vanzin Dec 15, 2017 •

edited

Loading