-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-27446][R] Use existing spark conf if available. #24353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| def init(): (Int, RAuthHelper) = { | ||
| val conf = new SparkConf() | ||
| val conf = Option(SparkEnv.get).map(_.conf).getOrElse(new SparkConf()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @MrBago .
Could you add a unit test case, please? That will prevent a future regression. Maybe, at RBackendSuite.scala?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to think of a way to test this, I was thinking maybe making a "vissableForTesting" method that returns the connection timeout and we can verify that passing the conf sets the correct value. But that seems somewhat indirect and requires an API just for testing.
What do you think?
|
Test build #104529 has finished for PR 24353 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. what's the exact step to reproduce the problem?
IIRC this is done this way because the config is needed before the session is initialized (and therefore can be set by the user programmatically). user can still set/change with spark conf default properties file
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change itself seems reasonable
|
@felixcheung that's a good point. Our spark deployment is unusual in that we have sparkR connect to an existing driver process instead of having it start it's own. These changes should be safe for existing code paths because we always fall back onto creating a new SparkConf. |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
For adding tests, I don't feel strongly since looks tricky but ideally should be good to add.
|
ok thats fine. |
|
Merged to master. |
What changes were proposed in this pull request?
The RBackend and RBackendHandler create new conf objects that don't pick up conf values from the existing SparkSession and therefore always use the default conf values instead of values specified by the user.
In this fix we check to see if the spark env already exists, and get the conf from there. We fall back to creating a new conf. This follows the pattern used in other places including this:
spark/core/src/main/scala/org/apache/spark/api/r/BaseRRunner.scala
Line 261 in 3725b13
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.