-
Couldn't load subscription status.
- Fork 28.9k
[SPARK-18547][core] Propagate I/O encryption key when executors register. #15981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to modify this to avoid going over the 10 argument limit the style checker imposes. Which is also a reason why the key is not stored in the SparkEnv itself.
|
/cc @zsxwing @JoshRosen |
|
Test build #69029 has finished for PR 15981 at commit
|
…ter. This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation credential propagation, this change explicitly distributes the key using the messages exchanged between driver and executor during registration. When RPC encryption is enabled, this means key propagation is also secure. This allows shuffle encryption to work in non-YARN mode, which means that it's easier to write unit tests for areas of the code that are affected by the feature. The key is stored in the SecurityManager; because there are many instances of that class used in the code, the key is only guaranteed to exist in the instance managed by the SparkEnv. This path was chosen to avoid storing the key in the SparkConf, which would risk having the key being written to disk as part of the configuration (as, for example, is done when starting YARN applications). Test by new and existing unit tests (which were moved from the YARN module to core), and by running apps with shuffle encryption enabled.
|
Test build #69033 has finished for PR 15981 at commit
|
|
Ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it sends the key over the wire, I think it's better to throw an exception if securityManager.isSaslEncryptionEnabled() is false.
| conf: SparkConf, | ||
| encryptionKey: Option[Array[Byte]]) { | ||
|
|
||
| def this(defaultSerializer: Serializer, conf: SparkConf) = this(defaultSerializer, conf, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not use the parameter default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I thought I was calling this from Java code but must have removed it. I guess I can use a default argument now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm calling this from UnsafeShuffleWriterSuite.java in the accompanying change, and default values make it hard to call the code from Java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for your clarifying.
| val port = conf.getInt("spark.executor.port", 0) | ||
| val env = SparkEnv.createExecutorEnv( | ||
| conf, executorId, slaveInfo.getHostname, port, cpusPerTask, isLocal = false) | ||
| conf, executorId, slaveInfo.getHostname, port, cpusPerTask, None, isLocal = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means mesos is not suppported. Right? Could you throw an exception if IO encryption is enabled for mesos?
We don't do that now, and there's a whole lot of things that need to configured correctly for things to be really secure (like HDFS and YARN encryption). It would also make unit tests slightly more complicated (or at least their set up). I think it's fine to just have a warning here. |
Sounds good. Just make sure you mention it when updating the doc. |
I cannot find the warning log in your latest commit. Otherwise, LGTM. |
|
Looks like I removed the warning from SparkEnv at some point, let me add it back. |
|
Test build #69277 has finished for PR 15981 at commit
|
|
Test build #69273 has finished for PR 15981 at commit
|
|
LGTM. The build for the latest commit passed. Merging to master and 2.1. |
…ter. This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation credential propagation, this change explicitly distributes the key using the messages exchanged between driver and executor during registration. When RPC encryption is enabled, this means key propagation is also secure. This allows shuffle encryption to work in non-YARN mode, which means that it's easier to write unit tests for areas of the code that are affected by the feature. The key is stored in the SecurityManager; because there are many instances of that class used in the code, the key is only guaranteed to exist in the instance managed by the SparkEnv. This path was chosen to avoid storing the key in the SparkConf, which would risk having the key being written to disk as part of the configuration (as, for example, is done when starting YARN applications). Tested by new and existing unit tests (which were moved from the YARN module to core), and by running apps with shuffle encryption enabled. Author: Marcelo Vanzin <[email protected]> Closes #15981 from vanzin/SPARK-18547. (cherry picked from commit 8b325b1) Signed-off-by: Shixiong Zhu <[email protected]>
…ter. This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation credential propagation, this change explicitly distributes the key using the messages exchanged between driver and executor during registration. When RPC encryption is enabled, this means key propagation is also secure. This allows shuffle encryption to work in non-YARN mode, which means that it's easier to write unit tests for areas of the code that are affected by the feature. The key is stored in the SecurityManager; because there are many instances of that class used in the code, the key is only guaranteed to exist in the instance managed by the SparkEnv. This path was chosen to avoid storing the key in the SparkConf, which would risk having the key being written to disk as part of the configuration (as, for example, is done when starting YARN applications). Tested by new and existing unit tests (which were moved from the YARN module to core), and by running apps with shuffle encryption enabled. Author: Marcelo Vanzin <[email protected]> Closes apache#15981 from vanzin/SPARK-18547.
…ter. This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation credential propagation, this change explicitly distributes the key using the messages exchanged between driver and executor during registration. When RPC encryption is enabled, this means key propagation is also secure. This allows shuffle encryption to work in non-YARN mode, which means that it's easier to write unit tests for areas of the code that are affected by the feature. The key is stored in the SecurityManager; because there are many instances of that class used in the code, the key is only guaranteed to exist in the instance managed by the SparkEnv. This path was chosen to avoid storing the key in the SparkConf, which would risk having the key being written to disk as part of the configuration (as, for example, is done when starting YARN applications). Tested by new and existing unit tests (which were moved from the YARN module to core), and by running apps with shuffle encryption enabled. Author: Marcelo Vanzin <[email protected]> Closes apache#15981 from vanzin/SPARK-18547.
This change modifies the method used to propagate encryption keys used during
shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
this change explicitly distributes the key using the messages exchanged between
driver and executor during registration. When RPC encryption is enabled, this means
key propagation is also secure.
This allows shuffle encryption to work in non-YARN mode, which means that it's
easier to write unit tests for areas of the code that are affected by the feature.
The key is stored in the SecurityManager; because there are many instances of
that class used in the code, the key is only guaranteed to exist in the instance
managed by the SparkEnv. This path was chosen to avoid storing the key in the
SparkConf, which would risk having the key being written to disk as part of the
configuration (as, for example, is done when starting YARN applications).
Tested by new and existing unit tests (which were moved from the YARN module to
core), and by running apps with shuffle encryption enabled.