-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-22786][SQL] only use AppStatusPlugin in history server #19981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I intentionally created that interface to be used both by live applications and the SHS. What actual problem are you running into?
Why not? That listener is basically a no-op if you're not running any SQL. |
|
Test build #84922 has finished for PR 19981 at commit
|
|
I don't think I also not sure why we need a plugin interface for live UI. The SQL and Streaming UI are implemented pretty well and clean. I do agree that an interface for history server is very useful, and that's why we have |
So your solution to that is to have completely separate code for both cases? I really prefer to have a single place to go to to understand how the listeners and the UI are initialized, even if the SQL plugin implementation is sub-optimal. Again, you're stating a matter of preference, but you haven't explained what problem this is causing. Not using the interface doesn't mean you get rid of the conditionals, it just means the conditionals are implied by having the code that installs the listener and UI for live applications live in different places. |
|
So the problem is, it takes me a while to understand the new code, because of the weird design for On the other hand, I think it's better to not register a listener if unnecessary. And this is also an internal behavior change(the timing of registering SQL listeners), which may stop us from catching Spark events in SQL listener in the future. Do we have a good reason to change it? Having a central place for live UI setup seems not a strong reason to me. |
I agree, but I also think that duplicating code in disjoint places hurts readability, not helps it.
Those are not being installed with this interface, are they?
The listener is now being installed before it was before (when the context starts, instead of later), so it's less likely that it will miss events. As I mentioned in the other PR you commented on, I do understand that the code here is not super pretty. But I really don't think going back to the previous way is the right thing. For example, a way to simplify the plugin code is to change the way visibility of the SQL tab is done. Either always show it (simpler), or add a new method in |
We are going to, right? Otherwise creating the interface just for SQL doesn't align with your goal to centralize this part.
I do think the previous code is cleaner than the current one. This may due to personal tastes so need more feedback from other people.
I'm talking about not catching event if users are not using SQL functions. Again this is an internal behavior change that can affect Spark event catching logic inside SQL listener, which may be added in the future. I'm not saying we can't change it, but with a proper reason. |
At some point, probably, but supporting streaming UIs in the SHS requires a lot more work before we even reach that point. There are fundamental problems with event logs and streaming apps that need to be resolved first.
Not sure why that would be an issue - or rather, why that's different from the what's always been the case. Even if you have a SparkSession today you can run non-SQL jobs using the underlying context, and those will generate events that will be delivered to the SQL listener, and it has to deal with them (as it does).
What do you think about either of my suggestions to simplify the code? |
It's possible that people writing Spark applications with Spark SQL dependency, but not using SQL functions(just create
I do think we should only create the SQL tab when users actually use SQL functions. And the same thing should apply to SQL listener. No offense, but I would reject that PR if I was reviewing it, because of this behavior change. |
Yes, and the SQL listener will ignore all the events, as it should even in the case that SQL was used, because users are allowed to run SQL and non-SQL workloads in the same application.
Create or show? If it's not shown the user-visible behavior is the same, isn't it? That was the second of my suggestions.
The behavior change is all internal and not visible to users, and that was the goal of the change. |
|
Internal behavior change also needs careful review, I'd like to wait for feedback from others. But is the previous code really that hacky and worth a factor with a new interface? https://github.com/apache/spark/pull/19981/files#diff-42e78d37f5dcb2a1576f83b53bbf4b55R88 this looks pretty nice to me, and so is the streaming one in |
|
It's less about cleanliness and more about discoverability IMO. Answer the question quickly: where is the SQL UI initialized?
Again, I'll make the same suggestion as before: if you move the SQL tab visibility calculation to a new method in the SQL tab itself (and add a filter in This would be the plugin code for both live and SHS: Plus you can delete a bunch of code from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the existing convention of Spark SQL, classes under the execution package are meant to be private and doesn't need the private[sql]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized this is important. Previously we can use SharedState.listener to query the SQL status. SharedState is actually a semi-public interface, as it's marked as Unstable in SparkSession.sharedState.
Sometimes for debugging I just type spark.sharedState.listener.xxx in Spark Shell and check some status, but it's impossible now after #19681
There might be some other people like me that love this ability, we should not just remove it for future UI discoverbility(I think it's just SQL and streaming, really not a big gain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this is behavior change. It was good to have SharedState.listener, which is the only entry to fetch sql metrics without web browser.
Also, SQL is such important and frequently used feature. I don't see any problem to create SQL listener in spark context initialization, which is consistent with SHS V1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if you add the property back, it won't be the same listener. There's no way to keep the old API without keeping all of the old code.
SharedState is actually a semi-public interface
It's in sql.internal. That's not semi-public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least it's developer-facing, as a developer I don't care about the naming changing, or API changing, but I just want the same functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's fine if you want to expose it. But I'm pointing out that it's pretty weird to expose a class in a ".internal" package through the API. Those are not documented nor go through mima checks, so there's absolutely zero guarantees about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how it works, those things are not critical to the end users and it's ok to break them if no one objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this is consistent to SparkContext.statusStore, we need central places to query the core/sql/streaming status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use @Private instead of @Unstable for the SparkSession method, that's all I'm saying. It more clearly maps to what you and the code are saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only set this config for this suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented on the other PR where you mentioned this, but I still don't get what this is changing. I don't see any global state that is being overriden by override protected def sparkConf. This test suite only extends traits (e.g. SharedSQLContext which extends SharedSparkSession), and those only keep suite-level state, not global state.
There are other tests that do the same thing.
This is also too late to do this; by the time this code runs all listeners have been initialized and read the default value for LIVE_ENTITY_UPDATE_PERIOD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the testing style before #19681 , which just call the event handling methods of the listener, instead of indirectly using an intermedia reply bus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we did not attach the testing listener to spark event bus, fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, but I don't know why this passed before...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This passed before because the listener was automatically being added to the bus using the plugin interface you've removed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact this will probably still pass if you restore the LIVE_ENTITY_UPDATE_PERIOD config in the session, since you'll still have a listener in the shared session (it's should still be installed automatically by your code, just not using the plugin).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of totally disable this test because of an unimplemented feature, I'd like to still run it, but a little slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is re-enabled in #19751.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let me revert this part
|
Test build #84943 has finished for PR 19981 at commit
|
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems a global sql listener at SparkSession before. Now we create new sql listener for each SharedState? I think SharedState should be reused, but not sure if any case we create more than one SharedState.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SharedState is kind of a singleton in Spark SQL. If you look at the old code, the global listener in SparkSession is initialized by SharedState.
|
Test build #84947 has finished for PR 19981 at commit
|
|
Test build #84967 has finished for PR 19981 at commit
|
|
retest this please |
|
Test build #84994 has finished for PR 19981 at commit
|
|
Test build #85047 has finished for PR 19981 at commit
|
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #85064 has finished for PR 19981 at commit
|
|
Test build #85065 has finished for PR 19981 at commit
|
| * An interface for creating history listeners(to replay event logs) defined in other modules like | ||
| * SQL, and setup the UI of the plugin to rebuild the history UI. | ||
| */ | ||
| private[spark] trait SparkHistoryUIPlugin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a UI plugin. It's also only marginally related to this source file.
It should remain in the .status package. If you really feel strongly about the existing name, you can use a different name (e.g. "AppHistoryServerPlugin" or something that doesn't explicit says "UI" or "Listener").
| private def statusStore: SQLAppStatusStore = { | ||
| new SQLAppStatusStore(sparkContext.statusStore.store) | ||
| protected def currentExecutionIds(): Set[Long] = { | ||
| spark.sharedState.statusStore.executionsList.map(_.executionId).toSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just call statusStore no?
| Thread.sleep(100) | ||
| val executionData = statusStore.executionsList().headOption | ||
| finished = executionData.isDefined && executionData.get.jobs.values | ||
| .forall(_ == JobExecutionStatus.SUCCEEDED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the same check as before. It assumes that onJobEnd is called after onExecutionEnd, which might be the case now, but was explicitly mentioned as not being always the case in the original SQLListener code.
| var finished = false | ||
| while (!finished) { | ||
| Thread.sleep(100) | ||
| val executionData = statusStore.executionsList().headOption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the checks below you should be checking .lastOption.
|
|
||
| class SQLListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTestUtils { | ||
|
|
||
| class SQLAppStatusListenerSuite extends SparkFunSuite with SharedSQLContext with JsonTestUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't rename this class is because it contains tests that have nothing to do with the listener itself (like the test for SPARK-18462), and doing the proper thing (break those tests out into a separate suite) would be too noisy for the original change (and would also be pretty noisy here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it's an existing problem of SQLListenerSuite, as previously it didn't only test SQLListener. This should not stop us from renaming it after we rename SQLListener
| val statusStore = new SQLAppStatusStore(sc.statusStore.store) | ||
| assert(statusStore.executionsList().size <= 50) | ||
| val statusStore = spark.sharedState.statusStore | ||
| assert(statusStore.executionsCount() == 200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now wrong, isn't it? The configuration explicitly says "50". Since this test is not being run, you should leave the code as before (with just needed changes, if any, for it to compile).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to limit the UI data for history server? cc @vanzin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, both because it's the old behavior, and to limit the app's history data growth. Also because the UI code itself doesn't scale to arbitrarily large lists of things like jobs and stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we didn't test the live data for stages.
|
Test build #85091 has finished for PR 19981 at commit
|
|
Test build #85093 has finished for PR 19981 at commit
|
|
cc @vanzin any more comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test changes are fine; I still don't really agree with your reasoning for changing the way the plugin works, but that's pretty low in my list of things to worry about right now. So go ahead with what you think is best.
| import config._ | ||
|
|
||
| private var sparkVersion = SPARK_VERSION | ||
| private val sparkVersion = SPARK_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is a bug; the version should be read from SparkListenerLogStart when it's in the event log. Feel free to file a separate bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filed https://issues.apache.org/jira/browse/SPARK-22854 , I'll do it later
|
|
||
| override def setupUI(ui: SparkUI): Unit = { | ||
| val kvStore = ui.store.store | ||
| new SQLTab(new SQLAppStatusStore(kvStore), ui) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't be adding the UI if there is no SQL-related data in the store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #85227 has finished for PR 19981 at commit
|
|
retest this please |
|
Test build #85238 has finished for PR 19981 at commit
|
|
retest this please |
|
Test build #85251 has finished for PR 19981 at commit
|
|
retest this please |
|
Test build #85263 has finished for PR 19981 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
In #19681 we introduced a new interface called
AppStatusPlugin, to register listeners and set up the UI for both live and history UI.However I think it's an overkill for live UI. For example, we should not register
SQLListenerif users are not using SQL functions. Previously we register theSQLListenerand set up SQL tab whenSparkSessionis firstly created, which indicates users are going to use SQL functions. But in #19681 , we register the SQL functions duringSparkContextcreation. The same thing should apply to streaming too.I think we should keep the previous behavior, and only use this new interface for history server.
To reflect this change, I also rename the new interface to
SparkHistoryUIPluginThis PR also refines the tests for sql listener.
How was this patch tested?
existing tests