Skip to content

Conversation

@tgravescs
Copy link
Contributor

What changes were proposed in this pull request?

Clarify documentation about security.

How was this patch tested?

None, just documentation

@tgravescs
Copy link
Contributor Author

@vanzin @srowen

docs/security.md Outdated
{:toc}

# Spark RPC
# Spark Security Overview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might even frame this as "Security: N Things You Need To Know" or something along those lines. This is more specifically an important TL;DR and call to action.

docs/security.md Outdated
Spark supports multiple deployments types and each one supports different levels of security. Not
all deployment types will be secure in all environments and none are secure by default. Be
sure to evaluate your environment, what Spark supports, and take the appropriate measure to secure
your Spark deployment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: period at end. I do like highlighting security issues. This reads now like a simple disclaimer: security isn't on, it's your responsibility, Spark isn't secure out of the box, we can't secure everything for you, don't blame us, etc. That may be true but I think it's more constructive here to make this a list of N key security issues and what to know, as forward pointers into the document. Like, along the lines of:

  • Enable spark.authenticate.secret to encrypt communication with the master
  • Note that REST API isn't secured
  • Use firewall rules to restrict access to key ports
  • ...

... with links

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this but this is very specific to a users environment. The rest of the doc has the things you listed. We aren't responsible for securing other things in their environment so I wasn't wanting to make it sound like it was a comprehensive list.

docs/security.md Outdated
sure to evaluate your environment, what Spark supports, and take the appropriate measure to secure
your Spark deployment

There are many different types of security concerns, Spark does not necessarily protect against
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: concerns, -> concerns.
And similarly in a couple other sentences.

@SparkQA
Copy link

SparkQA commented Oct 26, 2018

Test build #98090 has finished for PR 22852 at commit 8b4aaf5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 26, 2018

I get it, the "it's your responsibility" stance, and it is. For any risk there's a sentence in this doc we could point to and say, "see, told you". If we're going to make a change here, adding another para saying "below, we told you so" isn't additive. Especially if we're trying to use this change to actively mitigate security issues. More useful is a cheat-sheet, TL;DR, simply enumerating the top things you don't want to miss. I think it's more useful than redundant. I can take a crack at that too.

@SparkQA
Copy link

SparkQA commented Oct 26, 2018

Test build #98091 has finished for PR 22852 at commit 1320795.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor Author

the intention is not a we told you so, its meant to grab their attention and to get people to think about it because in the end it is their responsibility in my opinion.

I'm fine if you want to take a crack at listing a few things or adding a table of the bullets we have in the below sections. I just want to make sure we also say that this is not a comprehensive list. Many users who don't read all the docs look at the cheat sheet and stop there and think its comprehensive. There is no way we could list all variances of peoples environments.

@vanzin
Copy link
Contributor

vanzin commented Oct 26, 2018

I'm fine with this, although I wonder if having a stronger wording about Spark standalone just not being secure would be better. Even if you enable auth, everybody needs to know the same auth secret, which is not optimal. It gives you a little bit of security, but not much. No isolation or anything.

I'm also generally skeptical that people read and follow these things, but at least it's there. :-)

(And just to comment on a previous comment by Sean, just enabling auth does not enable encryption.)

@tgravescs
Copy link
Contributor Author

we can add stronger wording for standalone if you want, I know the text was recently updated (I believe by you) to have the below:

For other resource managers, spark.authenticate.secret must be configured on each of the nodes. This secret will be shared by all the daemons and applications, so this deployment configuration is not as secure as the above, especially when considering multi-tenant clusters. In this configuration, a user with the secret can effectively impersonate any other user.

Do you have specific suggestion on where you want to put that? The reason I didn't put stronger was because if you are running it in isolated one client environment then the authentication part via secret doesn't matter that much.

@vanzin
Copy link
Contributor

vanzin commented Oct 26, 2018

I looked at the existing docs after I wrote my comment and saw that paragraph. I think it's fine, maybe just in need of some update now that we have k8s, although I'm not sure yet how spark.authenticate behaves on k8s.

It's probably ok as is. It could be clarified a little bit (e.g. mentioning explicitly the Master and Worker daemons) but not a big deal.

@tgravescs
Copy link
Contributor Author

so I had filed a jira to update mesos docs more detail about security things (https://issues.apache.org/jira/browse/SPARK-25024) which I need to follow up on, but I didn't file one for k8s. It would be good to have one for k8s if its not clear as well.

@srowen
Copy link
Member

srowen commented Oct 26, 2018

I don't feel strongly about it; go ahead.

If someone lands on this page, do they pretty easily come away with the impression they need to set spark.authenticate and network security if they care about security? if so, great. If the text is just adding to the text they might skip over, maybe revise it. That's how I think about it.

I think you can make edits for Mesos and K8S here too.

@tgravescs
Copy link
Contributor Author

I would rather see someone more familiar with K8s that uses it document it.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgravescs
Copy link
Contributor Author

I would be fine with adding it more places, including perhaps the overview page: http://spark.apache.org/docs/latest/ and quick start pages.

Perhaps we should agree upon the wording here first though. I'm not exactly sure where this pr stands honestly. @srowen are you going to put up a different one with wording you prefer?

If someone lands on this page, do they pretty easily come away with the impression they need to set spark.authenticate and network security if they care about security?

Everyone reads text slightly different and I'm by no means a doc expert, so I'm definitely open to reword if there is consensus on it.

@srowen
Copy link
Member

srowen commented Oct 29, 2018

A quick pointer to security issues in other key places sounds good. As long as it is increasing the chance users understand the specific issue and isn't more general text to skip past, it is helping

@tgravescs
Copy link
Contributor Author

Updated to have a section on security in the quickstart and overview, let me know what you think and if wording needs updated. If this ok I can followup with something on the website

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

Test build #98267 has finished for PR 22852 at commit a4616bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 30, 2018

I think these are good changes. In a separate PR for the versions-specific docs, we could add a similar note to https://spark.apache.org/docs/latest/spark-standalone.html as much of the security concern is around the standalone master.

@tgravescs
Copy link
Contributor Author

I can add a note here for deployments here and then we can do version specific ones after

@tgravescs
Copy link
Contributor Author

added sections to the resource manager sections.

@SparkQA
Copy link

SparkQA commented Oct 30, 2018

Test build #98269 has finished for PR 22852 at commit ebf4789.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Oct 30, 2018

lgtm

@tgravescs
Copy link
Contributor Author

If no other comments, I'll commit this? I'll leave it open for a bit longer

asfgit pushed a commit that referenced this pull request Nov 2, 2018
## What changes were proposed in this pull request?

Clarify documentation about security.

## How was this patch tested?

None, just documentation

Closes #22852 from tgravescs/SPARK-25023.

Authored-by: Thomas Graves <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
(cherry picked from commit c00186f)
Signed-off-by: Thomas Graves <[email protected]>
@tgravescs
Copy link
Contributor Author

merged to master and 2.4

@asfgit asfgit closed this in c00186f Nov 2, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Clarify documentation about security.

## How was this patch tested?

None, just documentation

Closes apache#22852 from tgravescs/SPARK-25023.

Authored-by: Thomas Graves <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
## What changes were proposed in this pull request?

Clarify documentation about security.

## How was this patch tested?

None, just documentation

Closes apache#22852 from tgravescs/SPARK-25023.

Authored-by: Thomas Graves <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
(cherry picked from commit c00186f)
Signed-off-by: Thomas Graves <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
## What changes were proposed in this pull request?

Clarify documentation about security.

## How was this patch tested?

None, just documentation

Closes apache#22852 from tgravescs/SPARK-25023.

Authored-by: Thomas Graves <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
(cherry picked from commit c00186f)
Signed-off-by: Thomas Graves <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants