From 8b4aaf5a85f7f925baf7365283e950b9d7676a4b Mon Sep 17 00:00:00 2001 From: Thomas Graves Date: Fri, 26 Oct 2018 13:45:58 +0000 Subject: [PATCH 1/5] [SPARK-25023] Clarify Spark security documentation --- docs/security.md | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/docs/security.md b/docs/security.md index ffae683df6256..dfa9546ceeb85 100644 --- a/docs/security.md +++ b/docs/security.md @@ -6,7 +6,20 @@ title: Security * This will become a table of contents (this text will be scraped). {:toc} -# Spark RPC +# Spark Security Overview + +Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Spark supports multiple deployments types and each one supports different levels of security. Not +all deployment types will be secure in all environments and none are secure by default. Be +sure to evaluate your environment, what Spark supports, and take the appropriate measure to secure +your Spark deployment + +There are many different types of security concerns, Spark does not necessarily protect against +all things. Listed below are some of the things Spark supports, also check the deployment +documentation for the type of deployment you are using for deployment specific settings. Anything +not documented, Spark does not support. + +# Spark RPC (Communication protocol between Spark processes) ## Authentication @@ -123,7 +136,7 @@ The following table describes the different options available for configuring th Spark supports encrypting temporary data written to local disks. This covers shuffle files, shuffle spills and data blocks stored on disk (for both caching and broadcast variables). It does not cover encrypting output data generated by applications with APIs such as `saveAsHadoopFile` or -`saveAsTable`. +`saveAsTable`. It also may not cover temporary files created explicitly by the user. The following settings cover enabling encryption for data written to disk: From c359ef922fdee1354ae7abe07f02c5ef0b75e5db Mon Sep 17 00:00:00 2001 From: Thomas Graves Date: Fri, 26 Oct 2018 14:24:06 +0000 Subject: [PATCH 2/5] review feedback --- docs/security.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/security.md b/docs/security.md index dfa9546ceeb85..9083b50f51063 100644 --- a/docs/security.md +++ b/docs/security.md @@ -6,7 +6,7 @@ title: Security * This will become a table of contents (this text will be scraped). {:toc} -# Spark Security Overview +# Spark Security: Things You Need To Know Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. Spark supports multiple deployments types and each one supports different levels of security. Not @@ -14,8 +14,8 @@ all deployment types will be secure in all environments and none are secure by d sure to evaluate your environment, what Spark supports, and take the appropriate measure to secure your Spark deployment -There are many different types of security concerns, Spark does not necessarily protect against -all things. Listed below are some of the things Spark supports, also check the deployment +There are many different types of security concerns. Spark does not necessarily protect against +all things. Listed below are some of the things Spark supports. Also check the deployment documentation for the type of deployment you are using for deployment specific settings. Anything not documented, Spark does not support. From 13207950f5a187943cb201220b59698d653d9398 Mon Sep 17 00:00:00 2001 From: Thomas Graves Date: Fri, 26 Oct 2018 14:25:44 +0000 Subject: [PATCH 3/5] add period --- docs/security.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/security.md b/docs/security.md index 9083b50f51063..2f7fa9c6179f4 100644 --- a/docs/security.md +++ b/docs/security.md @@ -12,7 +12,7 @@ Security in Spark is OFF by default. This could mean you are vulnerable to attac Spark supports multiple deployments types and each one supports different levels of security. Not all deployment types will be secure in all environments and none are secure by default. Be sure to evaluate your environment, what Spark supports, and take the appropriate measure to secure -your Spark deployment +your Spark deployment. There are many different types of security concerns. Spark does not necessarily protect against all things. Listed below are some of the things Spark supports. Also check the deployment From a4616bf8cd5adfb94ad9146d1f1d620f213ab041 Mon Sep 17 00:00:00 2001 From: Thomas Graves Date: Tue, 30 Oct 2018 13:57:34 +0000 Subject: [PATCH 4/5] Add security section to overview and quickstart --- docs/index.md | 5 +++++ docs/quick-start.md | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/docs/index.md b/docs/index.md index d269f54c73439..ac38f1d4c53c2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -10,6 +10,11 @@ It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including [Spark SQL](sql-programming-guide.html) for SQL and structured data processing, [MLlib](ml-guide.html) for machine learning, [GraphX](graphx-programming-guide.html) for graph processing, and [Spark Streaming](streaming-programming-guide.html). +# Security + +Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Please see [Spark Security](security.html) before downloading and running Spark. + # Downloading Get Spark from the [downloads page](https://spark.apache.org/downloads.html) of the project website. This documentation is for Spark version {{site.SPARK_VERSION}}. Spark uses Hadoop's client libraries for HDFS and YARN. Downloads are pre-packaged for a handful of popular Hadoop versions. diff --git a/docs/quick-start.md b/docs/quick-start.md index ef7af6c3f6cec..28186c11887fc 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -17,6 +17,11 @@ you can download a package for any version of Hadoop. Note that, before Spark 2.0, the main programming interface of Spark was the Resilient Distributed Dataset (RDD). After Spark 2.0, RDDs are replaced by Dataset, which is strongly-typed like an RDD, but with richer optimizations under the hood. The RDD interface is still supported, and you can get a more detailed reference at the [RDD programming guide](rdd-programming-guide.html). However, we highly recommend you to switch to use Dataset, which has better performance than RDD. See the [SQL programming guide](sql-programming-guide.html) to get more information about Dataset. +# Security + +Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Please see [Spark Security](security.html) before running Spark. + # Interactive Analysis with the Spark Shell ## Basics From ebf47895c07b33684d5a206ba37d1ac2aaed36a5 Mon Sep 17 00:00:00 2001 From: Thomas Graves Date: Tue, 30 Oct 2018 14:38:38 +0000 Subject: [PATCH 5/5] add security section to the resource manager docs --- docs/running-on-kubernetes.md | 5 +++++ docs/running-on-mesos.md | 5 +++++ docs/running-on-yarn.md | 5 +++++ docs/spark-standalone.md | 5 +++++ 4 files changed, 20 insertions(+) diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md index 60c9279f2bce2..49e40a8e56d24 100644 --- a/docs/running-on-kubernetes.md +++ b/docs/running-on-kubernetes.md @@ -12,6 +12,11 @@ Kubernetes scheduler that has been added to Spark. In future versions, there may be behavioral changes around configuration, container images and entrypoints.** +# Security + +Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark. + # Prerequisites * A runnable distribution of Spark 2.3 or above. diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md index b473e654563d6..2502cd4ca86f4 100644 --- a/docs/running-on-mesos.md +++ b/docs/running-on-mesos.md @@ -13,6 +13,11 @@ The advantages of deploying Spark with Mesos include: [frameworks](https://mesos.apache.org/documentation/latest/frameworks/) - scalable partitioning between multiple instances of Spark +# Security + +Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark. + # How it Works In a standalone cluster deployment, the cluster manager in the below diagram is a Spark master diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 3b725cf295537..a7a448fbeb65e 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -9,6 +9,11 @@ Support for running on [YARN (Hadoop NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html) was added to Spark in version 0.6.0, and improved in subsequent releases. +# Security + +Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark. + # Launching Spark on YARN Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster. diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md index 7975b0c8b11ca..49ef2e1ce2a1b 100644 --- a/docs/spark-standalone.md +++ b/docs/spark-standalone.md @@ -8,6 +8,11 @@ title: Spark Standalone Mode In addition to running on the Mesos or YARN cluster managers, Spark also provides a simple standalone deploy mode. You can launch a standalone cluster either manually, by starting a master and workers by hand, or use our provided [launch scripts](#cluster-launch-scripts). It is also possible to run these daemons on a single machine for testing. +# Security + +Security in Spark is OFF by default. This could mean you are vulnerable to attack by default. +Please see [Spark Security](security.html) and the specific security sections in this doc before running Spark. + # Installing Spark Standalone to a Cluster To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or [build it yourself](building-spark.html).