Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Nov 22, 2016

This change modifies the method used to propagate encryption keys used during
shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
this change explicitly distributes the key using the messages exchanged between
driver and executor during registration. When RPC encryption is enabled, this means
key propagation is also secure.

This allows shuffle encryption to work in non-YARN mode, which means that it's
easier to write unit tests for areas of the code that are affected by the feature.

The key is stored in the SecurityManager; because there are many instances of
that class used in the code, the key is only guaranteed to exist in the instance
managed by the SparkEnv. This path was chosen to avoid storing the key in the
SparkConf, which would risk having the key being written to disk as part of the
configuration (as, for example, is done when starting YARN applications).

Tested by new and existing unit tests (which were moved from the YARN module to
core), and by running apps with shuffle encryption enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to modify this to avoid going over the 10 argument limit the style checker imposes. Which is also a reason why the key is not stored in the SparkEnv itself.

@vanzin
Copy link
Contributor Author

vanzin commented Nov 22, 2016

/cc @zsxwing @JoshRosen

@SparkQA
Copy link

SparkQA commented Nov 22, 2016

Test build #69029 has finished for PR 15981 at commit 7ed0d7c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SparkAppConfig(

Marcelo Vanzin added 2 commits November 22, 2016 13:38
…ter.

This change modifies the method used to propagate encryption keys used during
shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
this change explicitly distributes the key using the messages exchanged between
driver and executor during registration. When RPC encryption is enabled, this means
key propagation is also secure.

This allows shuffle encryption to work in non-YARN mode, which means that it's
easier to write unit tests for areas of the code that are affected by the feature.

The key is stored in the SecurityManager; because there are many instances of
that class used in the code, the key is only guaranteed to exist in the instance
managed by the SparkEnv. This path was chosen to avoid storing the key in the
SparkConf, which would risk having the key being written to disk as part of the
configuration (as, for example, is done when starting YARN applications).

Test by new and existing unit tests (which were moved from the YARN module to
core), and by running apps with shuffle encryption enabled.
@SparkQA
Copy link

SparkQA commented Nov 23, 2016

Test build #69033 has finished for PR 15981 at commit b7486cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Nov 28, 2016

Ping.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it sends the key over the wire, I think it's better to throw an exception if securityManager.isSaslEncryptionEnabled() is false.

conf: SparkConf,
encryptionKey: Option[Array[Byte]]) {

def this(defaultSerializer: Serializer, conf: SparkConf) = this(defaultSerializer, conf, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not use the parameter default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought I was calling this from Java code but must have removed it. I guess I can use a default argument now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm calling this from UnsafeShuffleWriterSuite.java in the accompanying change, and default values make it hard to call the code from Java.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for your clarifying.

val port = conf.getInt("spark.executor.port", 0)
val env = SparkEnv.createExecutorEnv(
conf, executorId, slaveInfo.getHostname, port, cpusPerTask, isLocal = false)
conf, executorId, slaveInfo.getHostname, port, cpusPerTask, None, isLocal = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means mesos is not suppported. Right? Could you throw an exception if IO encryption is enabled for mesos?

@vanzin
Copy link
Contributor Author

vanzin commented Nov 28, 2016

Since it sends the key over the wire, I think it's better to throw an exception if securityManager.isSaslEncryptionEnabled() is false.

We don't do that now, and there's a whole lot of things that need to configured correctly for things to be really secure (like HDFS and YARN encryption). It would also make unit tests slightly more complicated (or at least their set up). I think it's fine to just have a warning here.

@zsxwing
Copy link
Member

zsxwing commented Nov 28, 2016

I think it's fine to just have a warning here.

Sounds good. Just make sure you mention it when updating the doc.

@zsxwing
Copy link
Member

zsxwing commented Nov 28, 2016

I think it's fine to just have a warning here.

I cannot find the warning log in your latest commit. Otherwise, LGTM.

@vanzin
Copy link
Contributor Author

vanzin commented Nov 29, 2016

Looks like I removed the warning from SparkEnv at some point, let me add it back.

@SparkQA
Copy link

SparkQA commented Nov 29, 2016

Test build #69277 has finished for PR 15981 at commit d97f27b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2016

Test build #69273 has finished for PR 15981 at commit e76dd15.

  • This patch fails from timeout after a configured wait of `250m`.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Nov 29, 2016

LGTM. The build for the latest commit passed. Merging to master and 2.1.

asfgit pushed a commit that referenced this pull request Nov 29, 2016
…ter.

This change modifies the method used to propagate encryption keys used during
shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
this change explicitly distributes the key using the messages exchanged between
driver and executor during registration. When RPC encryption is enabled, this means
key propagation is also secure.

This allows shuffle encryption to work in non-YARN mode, which means that it's
easier to write unit tests for areas of the code that are affected by the feature.

The key is stored in the SecurityManager; because there are many instances of
that class used in the code, the key is only guaranteed to exist in the instance
managed by the SparkEnv. This path was chosen to avoid storing the key in the
SparkConf, which would risk having the key being written to disk as part of the
configuration (as, for example, is done when starting YARN applications).

Tested by new and existing unit tests (which were moved from the YARN module to
core), and by running apps with shuffle encryption enabled.

Author: Marcelo Vanzin <[email protected]>

Closes #15981 from vanzin/SPARK-18547.

(cherry picked from commit 8b325b1)
Signed-off-by: Shixiong Zhu <[email protected]>
@asfgit asfgit closed this in 8b325b1 Nov 29, 2016
@vanzin vanzin deleted the SPARK-18547 branch November 29, 2016 17:28
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
…ter.

This change modifies the method used to propagate encryption keys used during
shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
this change explicitly distributes the key using the messages exchanged between
driver and executor during registration. When RPC encryption is enabled, this means
key propagation is also secure.

This allows shuffle encryption to work in non-YARN mode, which means that it's
easier to write unit tests for areas of the code that are affected by the feature.

The key is stored in the SecurityManager; because there are many instances of
that class used in the code, the key is only guaranteed to exist in the instance
managed by the SparkEnv. This path was chosen to avoid storing the key in the
SparkConf, which would risk having the key being written to disk as part of the
configuration (as, for example, is done when starting YARN applications).

Tested by new and existing unit tests (which were moved from the YARN module to
core), and by running apps with shuffle encryption enabled.

Author: Marcelo Vanzin <[email protected]>

Closes apache#15981 from vanzin/SPARK-18547.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ter.

This change modifies the method used to propagate encryption keys used during
shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
this change explicitly distributes the key using the messages exchanged between
driver and executor during registration. When RPC encryption is enabled, this means
key propagation is also secure.

This allows shuffle encryption to work in non-YARN mode, which means that it's
easier to write unit tests for areas of the code that are affected by the feature.

The key is stored in the SecurityManager; because there are many instances of
that class used in the code, the key is only guaranteed to exist in the instance
managed by the SparkEnv. This path was chosen to avoid storing the key in the
SparkConf, which would risk having the key being written to disk as part of the
configuration (as, for example, is done when starting YARN applications).

Tested by new and existing unit tests (which were moved from the YARN module to
core), and by running apps with shuffle encryption enabled.

Author: Marcelo Vanzin <[email protected]>

Closes apache#15981 from vanzin/SPARK-18547.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants