- 
                Notifications
    
You must be signed in to change notification settings  - Fork 28.9k
 
[SPARK-25023] Clarify Spark security documentation #22852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
        
          
                docs/security.md
              
                Outdated
          
        
      | {:toc} | ||
| 
               | 
          ||
| # Spark RPC | ||
| # Spark Security Overview | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might even frame this as "Security: N Things You Need To Know" or something along those lines. This is more specifically an important TL;DR and call to action.
        
          
                docs/security.md
              
                Outdated
          
        
      | Spark supports multiple deployments types and each one supports different levels of security. Not | ||
| all deployment types will be secure in all environments and none are secure by default. Be | ||
| sure to evaluate your environment, what Spark supports, and take the appropriate measure to secure | ||
| your Spark deployment | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: period at end. I do like highlighting security issues. This reads now like a simple disclaimer: security isn't on, it's your responsibility, Spark isn't secure out of the box, we can't secure everything for you, don't blame us, etc. That may be true but I think it's more constructive here to make this a list of N key security issues and what to know, as forward pointers into the document. Like, along the lines of:
- Enable 
spark.authenticate.secretto encrypt communication with the master - Note that REST API isn't secured
 - Use firewall rules to restrict access to key ports
 - ...
 
... with links
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this but this is very specific to a users environment. The rest of the doc has the things you listed. We aren't responsible for securing other things in their environment so I wasn't wanting to make it sound like it was a comprehensive list.
        
          
                docs/security.md
              
                Outdated
          
        
      | sure to evaluate your environment, what Spark supports, and take the appropriate measure to secure | ||
| your Spark deployment | ||
| 
               | 
          ||
| There are many different types of security concerns, Spark does not necessarily protect against | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: concerns, -> concerns.
And similarly in a couple other sentences.
| 
           Test build #98090 has finished for PR 22852 at commit  
  | 
    
| 
           I get it, the "it's your responsibility" stance, and it is. For any risk there's a sentence in this doc we could point to and say, "see, told you". If we're going to make a change here, adding another para saying "below, we told you so" isn't additive. Especially if we're trying to use this change to actively mitigate security issues. More useful is a cheat-sheet, TL;DR, simply enumerating the top things you don't want to miss. I think it's more useful than redundant. I can take a crack at that too.  | 
    
| 
           Test build #98091 has finished for PR 22852 at commit  
  | 
    
| 
           the intention is not a we told you so, its meant to grab their attention and to get people to think about it because in the end it is their responsibility in my opinion. I'm fine if you want to take a crack at listing a few things or adding a table of the bullets we have in the below sections. I just want to make sure we also say that this is not a comprehensive list. Many users who don't read all the docs look at the cheat sheet and stop there and think its comprehensive. There is no way we could list all variances of peoples environments.  | 
    
| 
           I'm fine with this, although I wonder if having a stronger wording about Spark standalone just not being secure would be better. Even if you enable auth, everybody needs to know the same auth secret, which is not optimal. It gives you a little bit of security, but not much. No isolation or anything. I'm also generally skeptical that people read and follow these things, but at least it's there. :-) (And just to comment on a previous comment by Sean, just enabling auth does not enable encryption.)  | 
    
| 
           we can add stronger wording for standalone if you want, I know the text was recently updated (I believe by you) to have the below: 
 Do you have specific suggestion on where you want to put that? The reason I didn't put stronger was because if you are running it in isolated one client environment then the authentication part via secret doesn't matter that much.  | 
    
| 
           I looked at the existing docs after I wrote my comment and saw that paragraph. I think it's fine, maybe just in need of some update now that we have k8s, although I'm not sure yet how spark.authenticate behaves on k8s. It's probably ok as is. It could be clarified a little bit (e.g. mentioning explicitly the Master and Worker daemons) but not a big deal.  | 
    
| 
           so I had filed a jira to update mesos docs more detail about security things (https://issues.apache.org/jira/browse/SPARK-25024) which I need to follow up on, but I didn't file one for k8s. It would be good to have one for k8s if its not clear as well.  | 
    
| 
           I don't feel strongly about it; go ahead. If someone lands on this page, do they pretty easily come away with the impression they need to set spark.authenticate and network security if they care about security? if so, great. If the text is just adding to the text they might skip over, maybe revise it. That's how I think about it. I think you can make edits for Mesos and K8S here too.  | 
    
| 
           I would rather see someone more familiar with K8s that uses it document it.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this also be in https://github.com/apache/spark-website/blob/asf-site/security.md?
| 
           I would be fine with adding it more places, including perhaps the overview page: http://spark.apache.org/docs/latest/ and quick start pages. Perhaps we should agree upon the wording here first though. I'm not exactly sure where this pr stands honestly. @srowen are you going to put up a different one with wording you prefer? 
 Everyone reads text slightly different and I'm by no means a doc expert, so I'm definitely open to reword if there is consensus on it.  | 
    
| 
           A quick pointer to security issues in other key places sounds good. As long as it is increasing the chance users understand the specific issue and isn't more general text to skip past, it is helping  | 
    
| 
           Updated to have a section on security in the quickstart and overview, let me know what you think and if wording needs updated. If this ok I can followup with something on the website  | 
    
| 
           Test build #98267 has finished for PR 22852 at commit  
  | 
    
| 
           I think these are good changes. In a separate PR for the versions-specific docs, we could add a similar note to https://spark.apache.org/docs/latest/spark-standalone.html as much of the security concern is around the standalone master.  | 
    
| 
           I can add a note here for deployments here and then we can do version specific ones after  | 
    
| 
           added sections to the resource manager sections.  | 
    
| 
           Test build #98269 has finished for PR 22852 at commit  
  | 
    
| 
           lgtm  | 
    
| 
           If no other comments, I'll commit this? I'll leave it open for a bit longer  | 
    
## What changes were proposed in this pull request? Clarify documentation about security. ## How was this patch tested? None, just documentation Closes #22852 from tgravescs/SPARK-25023. Authored-by: Thomas Graves <[email protected]> Signed-off-by: Thomas Graves <[email protected]> (cherry picked from commit c00186f) Signed-off-by: Thomas Graves <[email protected]>
| 
           merged to master and 2.4  | 
    
## What changes were proposed in this pull request? Clarify documentation about security. ## How was this patch tested? None, just documentation Closes apache#22852 from tgravescs/SPARK-25023. Authored-by: Thomas Graves <[email protected]> Signed-off-by: Thomas Graves <[email protected]>
## What changes were proposed in this pull request? Clarify documentation about security. ## How was this patch tested? None, just documentation Closes apache#22852 from tgravescs/SPARK-25023. Authored-by: Thomas Graves <[email protected]> Signed-off-by: Thomas Graves <[email protected]> (cherry picked from commit c00186f) Signed-off-by: Thomas Graves <[email protected]>
## What changes were proposed in this pull request? Clarify documentation about security. ## How was this patch tested? None, just documentation Closes apache#22852 from tgravescs/SPARK-25023. Authored-by: Thomas Graves <[email protected]> Signed-off-by: Thomas Graves <[email protected]> (cherry picked from commit c00186f) Signed-off-by: Thomas Graves <[email protected]>
What changes were proposed in this pull request?
Clarify documentation about security.
How was this patch tested?
None, just documentation