Staging server for receiving application dependencies. #212

mccheah · 2017-04-03T20:04:31Z

No description provided.

…es' into submission-v2-file-server

mccheah · 2017-04-03T20:06:26Z

@foxish @ash211 @erikerlandson @kimoonkim This isn't quite done yet, but I wanted to get the proof of concept out. There's still some work left to do here on the unit testing side.

mccheah · 2017-04-03T20:09:16Z

...main/scala/org/apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyService.scala

+   *
+   * @param driverPodName Name of the driver pod.
+   * @param driverPodNamespace Namespace for the driver pod.
+   * @param jars Application jars to upload, compacted together in tar + gzip format. The tarball


The APIs here are worded in a Spark-opinionated way, but there could also be a world where we just take one tarball as input "files" and just have spark-submit call uploadFiles() twice - once for jars and once for local files. If we were to do this, we would need a different unique key for the application other than name and namespace indicating the pod(s) to watch - probably a label(s).

mccheah · 2017-04-03T20:57:40Z

rerun unit tests please

mccheah · 2017-04-06T00:26:19Z

This is pretty much done now as a workable unit. Of course there is still the rest of the submission process that will revolve around this to build but this is the first step.

mccheah · 2017-04-06T22:16:02Z

@ssuchter @erikerlandson @foxish @ash211

Can you guys please take a look at this PR? Essentially I’m looking to prototype what a second version of submission would look like end to end, but will be trying to build it out in increments. We could generalize this service later on, but I would like to get a proof of concept of how Spark could benefit from such a file-staging API. Thus it would be good to implement and review a quick and naive version of the staging-server based submission process, and after completing a prototype think about how we can generalize the concept for Kubernetes applications in general.

The desired end result is this: #167 (comment)

mccheah · 2017-04-11T22:22:49Z

...main/scala/org/apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyService.scala

+  @Consumes(Array(MediaType.APPLICATION_JSON))
+  @Produces(Array(MediaType.APPLICATION_JSON))
+  @Path("/dependencies/credentials")
+  def getKubernetesCredentials(


The more I think about this the more I would prefer this to not be part of the API. The submitter can mount the credentials themselves as secrets or else expect the pod to have them in the local disk.

We still want to post them because in the future we can use these credentials to monitor the API server and handle cleaning up the data accordingly.

ash211 · 2017-04-12T06:04:06Z

...main/scala/org/apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyService.scala

+      @QueryParam("driverPodName") driverPodName: String,
+      @QueryParam("driverPodNamespace") driverPodNamespace: String,
+      @FormDataParam("jars") jars: InputStream,
+      @FormDataParam("files") files: InputStream,


love that these are InputStreams now -- no more OOMs!

ash211 · 2017-04-12T06:08:03Z

.../scala/org/apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyServiceImpl.scala

+  private val SPARK_APPLICATION_DEPENDENCIES_LOCK = new Object
+  private val SECURE_RANDOM = new SecureRandom()
+  // TODO clean up these resources based on the driver's lifecycle
+  private val registeredDrivers = mutable.Set.empty[PodNameAndNamespace]


for now, using an LRU cache with maximum size and expireAfterWrite of about 7 days and expireAfterRead of about 24 hours at least prevents OOMs in this service

We'll need to rethink this for streaming and other long-running applications though

ash211 · 2017-04-12T06:11:01Z

.../scala/org/apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyServiceImpl.scala

+      val namespaceDir = new File(dependenciesRootDir, podNameAndNamespace.namespace)
+      val applicationDir = new File(namespaceDir, podNameAndNamespace.name)
+      DIRECTORIES_LOCK.synchronized {
+        if (!applicationDir.exists()) {


do we want to throw here if the directory already exists? that would mean we're about to overwrite files from another upload

In a second iteration of this I'm making the directory have a unique identifier.

ash211 · 2017-04-12T06:11:47Z

.../scala/org/apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyServiceImpl.scala

+        }
+      }
+      val jarsTgz = new File(applicationDir, "jars.tgz")
+      // TODO encrypt the written data with the secret.


move this TODO up a line

ash211 · 2017-04-12T06:16:20Z

...main/scala/org/apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyService.scala

+   * when it runs.
+   *
+   * @param driverPodName Name of the driver pod.
+   * @param driverPodNamespace Namespace for the driver pod.


why does this take a driver pod name and a driver pod namespace? I'd imagine there would be times when you want to upload resources to this service before you know what the name of your driver pod will be, or maybe even in what namespace it will run in.

Later on the service should watch the API server for the pod with the given name and namespace, and will clean up the resources being used by that pod once the pod either exits cleanly or fails without a retry for a specified amount of time. This would be more robust than an LRU cache for long-running jobs.

We probably want to be more general than just a pod or a pod namespace, however. Names are dynamic if a replication controller were to be used in the general case, for example. Perhaps monitoring pods with a given label would be better.

ash211 · 2017-04-12T06:19:07Z

...la/org/apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyServiceRetrofit.scala

+ * Retrofit-compatible variant of {@link KubernetesSparkDependencyService}. For documentation on
+ * how to use this service, see the aforementioned JAX-RS based interface.
+ */
+private[spark] trait KubernetesSparkDependencyServiceRetrofit {


Does it make sense to move these annotations onto the other trait definition? I'm worried this could get out of sync with the other trait if they were changing

We can't have both Retrofit and Jax-RS on the same trait because they take different types of parameters. Retrofit expects Retrofit-Specific types everywhere while Jax-RS expects POJOs.

ash211 · 2017-04-12T06:26:04Z

What do you think of the terminology ResourceStagingService instead of KubernetesSparkDependencyService?

…server' into submission-v2-file-server

foxish · 2017-04-13T21:59:06Z

...e/src/main/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingServiceImpl.scala

+      // TODO encrypt the written data with the secret.
+      val resourcesTgz = new File(resourcesDir, "resources.tgz")
+      Utils.tryWithResource(new FileOutputStream(resourcesTgz)) { ByteStreams.copy(resources, _) }
+      SPARK_APPLICATION_DEPENDENCIES_LOCK.synchronized {


Does this lock serialize all dependency accesses (read/write) using the staging server, or just for a particular dependency?

We can allow for both - provide it as an optional field in the API and default to the server's config otherwise.

Edit: The above comment is for the wrong feedback...

Regarding the lock - it serializes for all but only blocks on making the directories. Would be great to have something that doesn't have to serialize on file system operations, however.

not sure we need this lock here -- isn't it highly highly unlikely that we'd have an applicationSecret conflict from multiple uploads, given it contains a UUID as well as an additional 1024 bytes of entropy? This is the only writer

If one thread runs mkdir -p a/b and another thread runs mkdir -p a/c, do the two threads attempting to make the parent directory collide?

foxish · 2017-04-13T22:01:35Z

.../core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingService.scala

+   *                  any directories. We take a stream here to avoid holding these entirely in
+   *                  memory.
+   * @param podLabels Labels of pods to monitor. When no more pods are running with the given label,
+   *                  after some period of time, these dependencies will be cleared.


after some period of time

We should define how the period of time is defined? Do you see it being a per-file decision, or part of the configuration when the server is started?

We can allow for both - provide it as an optional field in the API and default to the server's config otherwise.

let's add the cleanup in a followup rather than block this PR on it -- filed #237 to track

mccheah · 2017-04-14T18:08:26Z

rerun unit tests please

ash211

@mccheah couple minor changes, but this looks good. I'm likely good to merge after your response commit

ash211 · 2017-04-19T06:51:46Z

.../core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingService.scala

+   *                  any directories. We take a stream here to avoid holding these entirely in
+   *                  memory.
+   * @param podLabels Labels of pods to monitor. When no more pods are running with the given label,
+   *                  after some period of time, these dependencies will be cleared.


let's add the cleanup in a followup rather than block this PR on it -- filed #237 to track

ash211 · 2017-04-19T06:57:40Z

...e/src/main/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingServiceImpl.scala

+        }
+      }
+      // TODO encrypt the written data with the secret.
+      val resourcesTgz = new File(resourcesDir, "resources.tgz")


we plan to upload tgz files but from the staging server's perspective it's just byte stream. Perhaps just resources.data would be better here (though certain cli tools used in debugging may work better with .tgz extension)

ash211 · 2017-04-19T07:00:15Z

...e/src/main/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingServiceImpl.scala

+  }
+
+  override def downloadResources(applicationSecret: String): StreamingOutput = {
+    val applicationDependencies = SPARK_APPLICATION_DEPENDENCIES_LOCK.synchronized {


I don't think we want this lock here -- it looks like it would prevent concurrent download of the resources

It only locks on accessing the map, not the underlying streams.

We need to do this because we don't use a ConcurrentMap here. I don't think we can use a Concurrent Map to get the atomicity we need here either.

Never mind, a concurrent map would probably work. We can use scala.collection.concurrent.TrieMap.

ash211 · 2017-04-19T07:02:39Z

...e/src/main/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingServiceImpl.scala

+      // TODO encrypt the written data with the secret.
+      val resourcesTgz = new File(resourcesDir, "resources.tgz")
+      Utils.tryWithResource(new FileOutputStream(resourcesTgz)) { ByteStreams.copy(resources, _) }
+      SPARK_APPLICATION_DEPENDENCIES_LOCK.synchronized {


not sure we need this lock here -- isn't it highly highly unlikely that we'd have an applicationSecret conflict from multiple uploads, given it contains a UUID as well as an additional 1024 bytes of entropy? This is the only writer

ash211 · 2017-04-19T07:04:59Z

...s/core/src/main/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingServer.scala

+    server.addConnector(connector)
+    server.setHandler(contextHandler)
+    server.start()
+    jettyServer = Some(server)


need ssl in here, though I see that's coming in a followup so doesn't block this PR

#221

ash211 · 2017-04-19T07:06:44Z

...e/src/test/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingServerSuite.scala

+import org.apache.spark.util.Utils
+
+/**
+ * Tests for KubernetesSparkDependencyServer and its APIs. Note that this is not an end-to-end


KubernetesSparkDependencyServer has been renamed

ash211 · 2017-04-19T07:08:57Z

.../test/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingServiceImplSuite.scala

+ * implementation methods directly as opposed to over HTTP, as well as check the
+ * data written to the underlying disk.
+ */
+class ResourceStagingServiceImplSuite extends SparkFunSuite with BeforeAndAfter {


do you still need with BeforeAndAfter if you don't override before or after ?

mccheah · 2017-04-20T20:47:48Z

@foxish @ash211 @erikerlandson Are there any more comments that I should address for this?

ash211

LGTM -- will merge today unless I hear otherwise from anyone.

ash211 · 2017-04-20T21:48:13Z

Will do around ~5pm Pacific

foxish · 2017-04-20T21:42:47Z

...c/main/scala/org/apache/spark/deploy/rest/kubernetes/v2/ResourceStagingServiceRetrofit.scala

+private[spark] trait ResourceStagingServiceRetrofit {
+
+  @Multipart
+  @retrofit2.http.PUT("/api/resources/upload")


Is there a reason why the API has /upload and /download here?
I think it is breaking some rules associated with RESTful APIs. I'd have thought more along the lines of:

PUT /api/resources/:resourceID -> (as PUT needs to be idempotent)

GET /api/resources -> returns a list of resources (later to be used by some metrics/dashboard/...)

GET /api/resources/:resourceID -> same as /download here

The problem is that the resource ID is sensitive information in this context. So we have to pass the identifier which is a secret through a request body. We can return an identifier token as well as the secret - would that be better?

Partly, the endpoints /upload and /download are expressing things that the HTTP verbs GET/PUT also express. I'm not sure if we want to also have versioning somewhere in that string, to make /api/v2/resources/.... My first thought is that having versioning is probably a bit much for this use case.

Can we make both of these paths the same and just with different HTTP actions and method signatures in the Java interface? I also think having versioning is a little much for now.

Maybe then we should use a POST instead, which isn't expected to be idempotent. POST /api/resources would return the secret that needs to be used when fetching it.

If/when we want to later list things, it would help to identify each set of files submitted by some token other than the secret.

We can cross that bridge if/when we get there, but I anticipate this service to be very lightweight with only put/get operations on individual bundles.

At least in the context of Spark - a more generic version of this in upstream Kubernetes could have more features.

It seems like it would be hard to change this later, after we've established the API. Unless there is a strong reason not to separate the token and the secret, I feel that we should do it. @ash211, do you have thoughts on this?

this is just an early API, so I'd expect it to change somewhat going forward. But we should also try at least somewhat to not break it unnecessarily.

In that spirit, I'd support:

putting v0/ in the URLs in the (probable) case this changes down the road

splitting the overloaded purpose of the secret into a separate resource ID and resource secret. That way in the future if we need an ID for a highly-available version of this service, or want to log in various places the resource that's being downloaded/uploaded, or want to collect metrics on this service for upload/download counts on a per-resource basis, we don't have to use the sensitive secret as the resource identifier everywhere.

It's going to take some work to plumb the ID+secret everywhere vs just the secret that we use now, but I'm optimistic that it will be ultimately worth it for future flexibility.

as for ID generation, I think it should be the responsibility of the service to create the ID rather than the client. Otherwise clients need to coordinate so that they don't have ID conflicts. The server is much better able to do this.

mccheah · 2017-04-20T23:42:01Z

@foxish @ash211 made the change to use a resource identifier, please take a look. I also updated everything downstream.

foxish · 2017-04-20T23:42:38Z

Thanks for making the change @mccheah. Sorry about the last minute flurry of review comments. LGTM after tests pass.

* Staging server for receiving application dependencies. * Add unit test for file writing * Minor fixes * Remove getting credentials from the API We still want to post them because in the future we can use these credentials to monitor the API server and handle cleaning up the data accordingly. * Generalize to resource staging server outside of Spark * Update code documentation * Val instead of var * Fix naming, remove unused import * Move suites from integration test package to core * Use TrieMap instead of locks * Address comments * Fix imports * Change paths, use POST instead of PUT * Use a resource identifier as well as a resource secret

…on-k8s#212) * Staging server for receiving application dependencies. * Add unit test for file writing * Minor fixes * Remove getting credentials from the API We still want to post them because in the future we can use these credentials to monitor the API server and handle cleaning up the data accordingly. * Generalize to resource staging server outside of Spark * Update code documentation * Val instead of var * Fix naming, remove unused import * Move suites from integration test package to core * Use TrieMap instead of locks * Address comments * Fix imports * Change paths, use POST instead of PUT * Use a resource identifier as well as a resource secret

mccheah added 2 commits April 3, 2017 13:02

Staging server for receiving application dependencies.

71b825a

Merge remote-tracking branch 'apache-spark-on-k8s/branch-2.1-kubernet…

e005213

…es' into submission-v2-file-server

mccheah commented Apr 3, 2017

View reviewed changes

mccheah added 2 commits April 3, 2017 18:42

Add unit test for file writing

6e40c4c

Minor fixes

7d00f07

mccheah mentioned this pull request Apr 6, 2017

Reorganize packages between v1 work and v2 work #220

Merged

Merge branch 'branch-2.1-kubernetes' into submission-v2-file-server

ea76823

mccheah mentioned this pull request Apr 6, 2017

Support SSL on the file staging server #221

Merged

mccheah commented Apr 11, 2017

View reviewed changes

mccheah and others added 2 commits April 11, 2017 15:33

Remove getting credentials from the API

3dd3504

We still want to post them because in the future we can use these credentials to monitor the API server and handle cleaning up the data accordingly.

Merge branch 'branch-2.1-kubernetes' into submission-v2-file-server

9bfb085

ash211 reviewed Apr 12, 2017

View reviewed changes

mccheah added 6 commits April 12, 2017 13:47

Generalize to resource staging server outside of Spark

df8e0c8

Update code documentation

24452ec

Val instead of var

f597171

Merge remote-tracking branch 'apache-spark-on-k8s/submission-v2-file-…

c20e461

…server' into submission-v2-file-server

Fix naming, remove unused import

e5f26aa

Move suites from integration test package to core

c408ff9

foxish reviewed Apr 14, 2017

View reviewed changes

Merge branch 'branch-2.1-kubernetes' into submission-v2-file-server

a527847

ash211 reviewed Apr 19, 2017

View reviewed changes

mccheah added 3 commits April 19, 2017 13:39

Use TrieMap instead of locks

64eddc1

Address comments

8f79802

Fix imports

04099d6

ash211 approved these changes Apr 20, 2017

View reviewed changes

foxish reviewed Apr 20, 2017

View reviewed changes

mccheah added 2 commits April 20, 2017 15:06

Change paths, use POST instead of PUT

d713c27

Use a resource identifier as well as a resource secret

720c38d

ash211 merged commit 3f6e5ea into branch-2.1-kubernetes Apr 21, 2017

ash211 deleted the submission-v2-file-server branch April 21, 2017 06:34

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019

Merge pull request apache-spark-on-k8s#212 from palantir/rk/latest

4e65e09

Staging server for receiving application dependencies. #212

Staging server for receiving application dependencies. #212

Uh oh!

Conversation

mccheah commented Apr 3, 2017

Uh oh!

mccheah commented Apr 3, 2017

Uh oh!

mccheah Apr 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah commented Apr 3, 2017

Uh oh!

mccheah commented Apr 6, 2017

Uh oh!

mccheah commented Apr 6, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah Apr 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ash211 commented Apr 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah commented Apr 14, 2017

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mccheah Apr 3, 2017 •

edited

Loading

mccheah Apr 12, 2017 •

edited

Loading

ash211 Apr 20, 2017 •

edited

Loading