|
| 1 | +--- |
| 2 | +post_title: Kerberos |
| 3 | +nav_title: Kerberos |
| 4 | +menu_order: 120 |
| 5 | +enterprise: 'no' |
| 6 | +--- |
| 7 | + |
| 8 | + |
| 9 | +# HDFS Kerberos |
| 10 | + |
| 11 | +Kerberos is an authentication system to allow Spark to retrieve and write data securely to a Kerberos-enabled HDFS cluster. As of Mesosphere Spark `2.2.0-2`, long-running jobs will renew their delegation tokens (authentication credentials). This section assumes you have previously set up a Kerberos-enabled HDFS cluster. **Note** Depending on your OS, Spark may need to be run as `root` in order to authenticate with your Kerberos-enabled service. This can be done by setting `--conf spark.mesos.driverEnv.SPARK_USER=root` when submitting your job. |
| 12 | + |
| 13 | +## Spark Installation |
| 14 | + |
| 15 | +Spark (and all Kerberos-enabed) components need a valid `krb5.conf` file. You can setup the Spark service to use a single `krb5.conf` file for all of the its drivers. |
| 16 | + |
| 17 | +1. A `krb5.conf` file tells Spark how to connect to your KDC. Base64 encode this file: |
| 18 | + |
| 19 | + cat krb5.conf | base64 -w 0 |
| 20 | + |
| 21 | +1. Put the encoded file (as a string) into your JSON configuration file: |
| 22 | + |
| 23 | + { |
| 24 | + "security": { |
| 25 | + "kerberos": { |
| 26 | + "krb5conf": "<base64 encoding>" |
| 27 | + } |
| 28 | + } |
| 29 | + } |
| 30 | + |
| 31 | + Your configuration will probably also have the `hdfs` parameters from above: |
| 32 | + |
| 33 | + { |
| 34 | + "service": { |
| 35 | + "name": "kerberized-spark", |
| 36 | + "user": "nobody" |
| 37 | + }, |
| 38 | + "hdfs": { |
| 39 | + "config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints" |
| 40 | + }, |
| 41 | + "security": { |
| 42 | + "kerberos": { |
| 43 | + "krb5conf": "<base64_encoding>" |
| 44 | + } |
| 45 | + } |
| 46 | + } |
| 47 | + |
| 48 | + |
| 49 | +1. Install Spark with your custom configuration, here called `options.json`: |
| 50 | + |
| 51 | + dcos package install --options=/path/to/options.json spark |
| 52 | + |
| 53 | +1. Make sure your keytab is accessible from the DC/OS [Secret Store][https://docs.mesosphere.com/latest/security/secrets/]. |
| 54 | + |
| 55 | +1. If you've enabled the history server via `history-server.enabled`, you must also configure the principal and keytab for the history server. **WARNING**: The keytab contains secrets, in the current history server package the keytab is not stored securely. See [Limitations][9] |
| 56 | + |
| 57 | + Base64 encode your keytab: |
| 58 | + |
| 59 | + cat spark.keytab | base64 |
| 60 | + |
| 61 | + And add the following to your configuration file: |
| 62 | + |
| 63 | + { |
| 64 | + "history-server": { |
| 65 | + "kerberos": { |
| 66 | + "principal": "spark@REALM", |
| 67 | + "keytab": "<base64 encoding>" |
| 68 | + } |
| 69 | + } |
| 70 | + } |
| 71 | + |
| 72 | +## Job Submission |
| 73 | + |
| 74 | +To authenticate to a Kerberos KDC, Spark on Mesos supports keytab files as well as ticket-granting tickets (TGTs). Keytabs are valid infinitely, while tickets can expire. Keytabs are recommended, especially for long-running streaming jobs. |
| 75 | + |
| 76 | +### Keytab Authentication |
| 77 | + |
| 78 | +Submit the job with the keytab: |
| 79 | + |
| 80 | + dcos spark run --submit-args="\ |
| 81 | + --kerberos-principal user@REALM \ |
| 82 | + --keytab-secret-path /__dcos_base64__hdfs-keytab \ |
| 83 | + --conf ... --class MySparkJob <url> <args>" |
| 84 | + |
| 85 | +### TGT Authentication |
| 86 | + |
| 87 | +Submit the job with the ticket: |
| 88 | + |
| 89 | + dcos spark run --submit-args="\ |
| 90 | + --kerberos-principal hdfs/name-0-node.hdfs.autoip.dcos.thisdcos.directory@LOCAL \ |
| 91 | + --tgt-secret-path /__dcos_base64__tgt \ |
| 92 | + --conf ... --class MySparkJob <url> <args>" |
| 93 | + |
| 94 | +**Note:** You can access external (i.e. non-DC/OS) Kerberos-secured HDFS clusters from Spark on Mesos. |
| 95 | + |
| 96 | +**Note:** These credentials are security-critical. The DC/OS Secret Store requires you to base64 encode binary secrets (such as the Kerberos keytab) before adding them. If they are uploaded with the `__dcos_base64__` prefix, they are automatically decoded when the secret is made available to your Spark job. If the secret name **doesn't** have this prefix, the keytab will be decoded and written to a file in the sandbox. This leaves the secret exposed and is not recommended. |
| 97 | + |
| 98 | + |
| 99 | +# Kafka Kerberos |
| 100 | + |
| 101 | +Spark can consume data from a Kerberos-enabled Kafka cluster. Connecting Spark to secure Kafka does not require special installation parameters, however does require the Spark Driver _and_ the Spark Executors can access the following files: |
| 102 | + |
| 103 | +* Client JAAS (Java Authentication and Authorization Service) file. This is provided using Mesos URIS with `--conf spark.mesos.uris=<location_of_jaas>`. |
| 104 | +* `krb5.conf` for your Kerberos setup. Similarly to HDFS, this is provided using a base64 encoding of the file. |
| 105 | + |
| 106 | + cat krb5.conf | base64 -w 0 |
| 107 | + |
| 108 | + Then assign the environment variable, `KRB5_CONFIG_BASE64`, this value for the Driver and the Executors: |
| 109 | + --conf spark.mesos.driverEnv.KRB5_CONFIG_BASE64=<base64_encoded_string> |
| 110 | + --conf spark.executorEnv.KRB5_CONFIG_BASE64=<base64_encoded_string> |
| 111 | + |
| 112 | +* The `keytab` containing the credentials for accessing the Kafka cluster. |
| 113 | + |
| 114 | + --conf spark.mesos.driver.secret.names=<base64_encoded_keytab> # e.g. __dcos_base64__kafka_keytab |
| 115 | + --conf spark.mesos.driver.secret.filenames=<keytab_file_name> # e.g. kafka.keytab |
| 116 | + --conf spark.mesos.executor.secret.names=<base64_encoded_keytab> # e.g. __dcos_base64__kafka_keytab |
| 117 | + --conf spark.mesos.executor.secret.filenames=<keytab_file_name> # e.g. kafka.keytab |
| 118 | + |
| 119 | + |
| 120 | +Finally, you'll likely need to tell Spark to use the JAAS file: |
| 121 | + |
| 122 | + --conf spark.driver.extraJavaOptions=-Djava.security.auth.login.config=/mnt/mesos/sandbox/<jaas_file> |
| 123 | + --conf spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/mnt/mesos/sandbox/<jaas_file> |
| 124 | + |
| 125 | + |
| 126 | +It is important that the filename is the same for the driver and executor keytab file (`<keytab_file_name>` above) and that this file is properly addressed in your JAAS file. For a worked example of a Spark consumer from secure Kafka see the [advanced examples][https://docs.mesosphere.com/service-docs/spark/2.1.1-2.2.0-2/usage-examples/] |
0 commit comments