Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Dec 14, 2017

What changes were proposed in this pull request?

In #19681 we introduced a new interface called AppStatusPlugin, to register listeners and set up the UI for both live and history UI.

However I think it's an overkill for live UI. For example, we should not register SQLListener if users are not using SQL functions. Previously we register the SQLListener and set up SQL tab when SparkSession is firstly created, which indicates users are going to use SQL functions. But in #19681 , we register the SQL functions during SparkContext creation. The same thing should apply to streaming too.

I think we should keep the previous behavior, and only use this new interface for history server.

To reflect this change, I also rename the new interface to SparkHistoryUIPlugin

This PR also refines the tests for sql listener.

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

@vanzin
Copy link
Contributor

vanzin commented Dec 14, 2017

I intentionally created that interface to be used both by live applications and the SHS. What actual problem are you running into?

we should not register SQLListener if users are not using SQL functions

Why not? That listener is basically a no-op if you're not running any SQL.

@SparkQA
Copy link

SparkQA commented Dec 14, 2017

Test build #84922 has finished for PR 19981 at commit ba38723.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

I don't think AppStatusPlugin is well designed to support both live and history server. The fact that SQLAppStatusPlugin needs flags to decide to register the listener in setupListeners or setupUI looks pretty hacky.

I also not sure why we need a plugin interface for live UI. The SQL and Streaming UI are implemented pretty well and clean. I do agree that an interface for history server is very useful, and that's why we have SparkHistoryListenerFactory before.

@vanzin
Copy link
Contributor

vanzin commented Dec 14, 2017

I also not sure why we need a plugin interface for live UI.

So your solution to that is to have completely separate code for both cases? I really prefer to have a single place to go to to understand how the listeners and the UI are initialized, even if the SQL plugin implementation is sub-optimal.

Again, you're stating a matter of preference, but you haven't explained what problem this is causing. Not using the interface doesn't mean you get rid of the conditionals, it just means the conditionals are implied by having the code that installs the listener and UI for live applications live in different places.

@cloud-fan
Copy link
Contributor Author

So the problem is, it takes me a while to understand the new code, because of the weird design for AppStatusPlugin. At least we should have a separate plugin interface for live UI. Code readability is important to Spark, we should not sacrifice it without real benefits.

On the other hand, I think it's better to not register a listener if unnecessary. SQLListener is fine because it's a no-op for non-sql events, but can we guarantee it in all other places like streaming? (need confirmation from @zsxwing @marmbrus ).

And this is also an internal behavior change(the timing of registering SQL listeners), which may stop us from catching Spark events in SQL listener in the future. Do we have a good reason to change it? Having a central place for live UI setup seems not a strong reason to me.

@vanzin
Copy link
Contributor

vanzin commented Dec 14, 2017

Code readability is important to Spark, we should not sacrifice it without real benefits.

I agree, but I also think that duplicating code in disjoint places hurts readability, not helps it.

but can we guarantee it in all other places like streaming

Those are not being installed with this interface, are they?

which may stop us from catching Spark events in SQL listener in the future

The listener is now being installed before it was before (when the context starts, instead of later), so it's less likely that it will miss events.

As I mentioned in the other PR you commented on, I do understand that the code here is not super pretty. But I really don't think going back to the previous way is the right thing.

For example, a way to simplify the plugin code is to change the way visibility of the SQL tab is done. Either always show it (simpler), or add a new method in SparkUITab that says whether the tab is visible, and have the SQL tab override it. Either of those would simplify the plugin code a lot (setupListener would just install the listener, always, setupUI would just add the tab, always).

@cloud-fan
Copy link
Contributor Author

Those are not being installed with this interface, are they?

We are going to, right? Otherwise creating the interface just for SQL doesn't align with your goal to centralize this part.

But I really don't think going back to the previous way is the right thing.

I do think the previous code is cleaner than the current one. This may due to personal tastes so need more feedback from other people.

The listener is now being installed before it was before (when the context starts, instead of later), so it's less likely that it will miss events.

I'm talking about not catching event if users are not using SQL functions. Again this is an internal behavior change that can affect Spark event catching logic inside SQL listener, which may be added in the future. I'm not saying we can't change it, but with a proper reason.

@vanzin
Copy link
Contributor

vanzin commented Dec 14, 2017

We are going to, right? Otherwise creating the interface just for SQL doesn't align with your goal to centralize this part.

At some point, probably, but supporting streaming UIs in the SHS requires a lot more work before we even reach that point. There are fundamental problems with event logs and streaming apps that need to be resolved first.

not catching event if users are not using SQL functions

Not sure why that would be an issue - or rather, why that's different from the what's always been the case. Even if you have a SparkSession today you can run non-SQL jobs using the underlying context, and those will generate events that will be delivered to the SQL listener, and it has to deal with them (as it does).

I do think the previous code is cleaner than the current one.

What do you think about either of my suggestions to simplify the code?

@cloud-fan
Copy link
Contributor Author

Not sure why that would be an issue - or rather, why that's different from the what's always been the case.

It's possible that people writing Spark applications with Spark SQL dependency, but not using SQL functions(just create SparkContext). This can happen if someone builds a lib based on Spark and Spark SQL, but the users only use the non-SQL APIs.

What do you think about either of my suggestions to simplify the code?

I do think we should only create the SQL tab when users actually use SQL functions. And the same thing should apply to SQL listener.
And previously they were consistent: we register the listener and setup the UI when creating SparkSession. But now they are not: we register the listener during SparkContext creation, and setup the UI after first SQL execution starts.

No offense, but I would reject that PR if I was reviewing it, because of this behavior change.

@vanzin
Copy link
Contributor

vanzin commented Dec 14, 2017

It's possible that people writing Spark applications with Spark SQL dependency, but not using SQL

Yes, and the SQL listener will ignore all the events, as it should even in the case that SQL was used, because users are allowed to run SQL and non-SQL workloads in the same application.

I do think we should only create the SQL tab when users actually use SQL functions.

Create or show? If it's not shown the user-visible behavior is the same, isn't it? That was the second of my suggestions.

because of this behavior change.

The behavior change is all internal and not visible to users, and that was the goal of the change.

@cloud-fan
Copy link
Contributor Author

Internal behavior change also needs careful review, I'd like to wait for feedback from others.

But is the previous code really that hacky and worth a factor with a new interface? https://github.com/apache/spark/pull/19981/files#diff-42e78d37f5dcb2a1576f83b53bbf4b55R88 this looks pretty nice to me, and so is the streaming one in StreamingManager. If you can find a way to make the code better, I'm totally fine. But before that, can we fall back to the previous code which is obviously better than the current one?

@vanzin
Copy link
Contributor

vanzin commented Dec 14, 2017

It's less about cleanliness and more about discoverability IMO. Answer the question quickly: where is the SQL UI initialized?

  • my code: in the AppStatePlugin implementation
  • your code: it depends. And the two pieces of code don't even live in the same file or share any code.

Again, I'll make the same suggestion as before: if you move the SQL tab visibility calculation to a new method in the SQL tab itself (and add a filter in WebUI.getTabs to only show visible tabs), the code will be simpler than either your or my version, and the user-visible behavior will remain the same.

This would be the plugin code for both live and SHS:

  override def setupListeners(
      conf: SparkConf,
      store: ElementTrackingStore,
      addListenerFn: SparkListener => Unit,
      live: Boolean): Unit = {
    addListenerFn(new SQLAppStatusListener(conf, store, live, None))
  }

  override def setupUI(ui: SparkUI): Unit = {
    val listener = ui.sc.map { /* call LiveListenerBus.findListenersByClass */ }
    new SQLTab(new SQLAppStatusStore(kvstore, Some(listener)), ui)
  }

Plus you can delete a bunch of code from SQLAppStatusListener.

Copy link
Contributor Author

@cloud-fan cloud-fan Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the existing convention of Spark SQL, classes under the execution package are meant to be private and doesn't need the private[sql]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized this is important. Previously we can use SharedState.listener to query the SQL status. SharedState is actually a semi-public interface, as it's marked as Unstable in SparkSession.sharedState.

Sometimes for debugging I just type spark.sharedState.listener.xxx in Spark Shell and check some status, but it's impossible now after #19681

There might be some other people like me that love this ability, we should not just remove it for future UI discoverbility(I think it's just SQL and streaming, really not a big gain)

cc @hvanhovell @viirya @gatorsmile

Copy link
Member

@gengliangwang gengliangwang Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, this is behavior change. It was good to have SharedState.listener, which is the only entry to fetch sql metrics without web browser.
Also, SQL is such important and frequently used feature. I don't see any problem to create SQL listener in spark context initialization, which is consistent with SHS V1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

Copy link
Contributor

@vanzin vanzin Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you add the property back, it won't be the same listener. There's no way to keep the old API without keeping all of the old code.

SharedState is actually a semi-public interface

It's in sql.internal. That's not semi-public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least it's developer-facing, as a developer I don't care about the naming changing, or API changing, but I just want the same functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's fine if you want to expose it. But I'm pointing out that it's pretty weird to expose a class in a ".internal" package through the API. Those are not documented nor go through mima checks, so there's absolutely zero guarantees about them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how it works, those things are not critical to the end users and it's ok to break them if no one objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this is consistent to SparkContext.statusStore, we need central places to query the core/sql/streaming status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use @Private instead of @Unstable for the SparkSession method, that's all I'm saying. It more clearly maps to what you and the code are saying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only set this config for this suite.

Copy link
Contributor

@vanzin vanzin Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on the other PR where you mentioned this, but I still don't get what this is changing. I don't see any global state that is being overriden by override protected def sparkConf. This test suite only extends traits (e.g. SharedSQLContext which extends SharedSparkSession), and those only keep suite-level state, not global state.

There are other tests that do the same thing.

This is also too late to do this; by the time this code runs all listeners have been initialized and read the default value for LIVE_ENTITY_UPDATE_PERIOD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the testing style before #19681 , which just call the event handling methods of the listener, instead of indirectly using an intermedia reply bus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we did not attach the testing listener to spark event bus, fixed now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, but I don't know why this passed before...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passed before because the listener was automatically being added to the bus using the plugin interface you've removed in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact this will probably still pass if you restore the LIVE_ENTITY_UPDATE_PERIOD config in the session, since you'll still have a listener in the shared session (it's should still be installed automatically by your code, just not using the plugin).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of totally disable this test because of an unimplemented feature, I'd like to still run it, but a little slower.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is re-enabled in #19751.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let me revert this part

@SparkQA
Copy link

SparkQA commented Dec 15, 2017

Test build #84943 has finished for PR 19981 at commit 88fdff2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems a global sql listener at SparkSession before. Now we create new sql listener for each SharedState? I think SharedState should be reused, but not sure if any case we create more than one SharedState.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SharedState is kind of a singleton in Spark SQL. If you look at the old code, the global listener in SparkSession is initialized by SharedState.

@SparkQA
Copy link

SparkQA commented Dec 15, 2017

Test build #84947 has finished for PR 19981 at commit 88fdff2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 15, 2017

Test build #84967 has finished for PR 19981 at commit bc300f9.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 16, 2017

Test build #84994 has finished for PR 19981 at commit bc300f9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2017

Test build #85047 has finished for PR 19981 at commit 9f5d21f.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

retest this please

1 similar comment
@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 18, 2017

Test build #85064 has finished for PR 19981 at commit 9f5d21f.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2017

Test build #85065 has finished for PR 19981 at commit a81917c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* An interface for creating history listeners(to replay event logs) defined in other modules like
* SQL, and setup the UI of the plugin to rebuild the history UI.
*/
private[spark] trait SparkHistoryUIPlugin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a UI plugin. It's also only marginally related to this source file.

It should remain in the .status package. If you really feel strongly about the existing name, you can use a different name (e.g. "AppHistoryServerPlugin" or something that doesn't explicit says "UI" or "Listener").

private def statusStore: SQLAppStatusStore = {
new SQLAppStatusStore(sparkContext.statusStore.store)
protected def currentExecutionIds(): Set[Long] = {
spark.sharedState.statusStore.executionsList.map(_.executionId).toSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call statusStore no?

Thread.sleep(100)
val executionData = statusStore.executionsList().headOption
finished = executionData.isDefined && executionData.get.jobs.values
.forall(_ == JobExecutionStatus.SUCCEEDED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same check as before. It assumes that onJobEnd is called after onExecutionEnd, which might be the case now, but was explicitly mentioned as not being always the case in the original SQLListener code.

var finished = false
while (!finished) {
Thread.sleep(100)
val executionData = statusStore.executionsList().headOption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the checks below you should be checking .lastOption.


class SQLListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTestUtils {

class SQLAppStatusListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTestUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't rename this class is because it contains tests that have nothing to do with the listener itself (like the test for SPARK-18462), and doing the proper thing (break those tests out into a separate suite) would be too noisy for the original change (and would also be pretty noisy here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it's an existing problem of SQLListenerSuite, as previously it didn't only test SQLListener. This should not stop us from renaming it after we rename SQLListener

val statusStore = new SQLAppStatusStore(sc.statusStore.store)
assert(statusStore.executionsList().size <= 50)
val statusStore = spark.sharedState.statusStore
assert(statusStore.executionsCount() == 200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now wrong, isn't it? The configuration explicitly says "50". Since this test is not being run, you should leave the code as before (with just needed changes, if any, for it to compile).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to limit the UI data for history server? cc @vanzin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both because it's the old behavior, and to limit the app's history data growth. Also because the UI code itself doesn't scale to arbitrarily large lists of things like jobs and stages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we didn't test the live data for stages.

@SparkQA
Copy link

SparkQA commented Dec 19, 2017

Test build #85091 has finished for PR 19981 at commit 60421ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 19, 2017

Test build #85093 has finished for PR 19981 at commit 5b64f88.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

cc @vanzin any more comments?

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test changes are fine; I still don't really agree with your reasoning for changing the way the plugin works, but that's pretty low in my list of things to worry about right now. So go ahead with what you think is best.

import config._

private var sparkVersion = SPARK_VERSION
private val sparkVersion = SPARK_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is a bug; the version should be read from SparkListenerLogStart when it's in the event log. Feel free to file a separate bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


override def setupUI(ui: SparkUI): Unit = {
val kvStore = ui.store.store
new SQLTab(new SQLAppStatusStore(kvStore), ui)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be adding the UI if there is no SQL-related data in the store.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85227 has finished for PR 19981 at commit 94ee3c7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85238 has finished for PR 19981 at commit 94ee3c7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85251 has finished for PR 19981 at commit 94ee3c7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85263 has finished for PR 19981 at commit 94ee3c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants