Download remotely-located resources on driver startup. #240

mccheah · 2017-04-22T01:24:59Z

Augments the init-container so that we don't need to use a separate image, but on submission two containers are bootstrapped for a cleaner architecture.

mccheah · 2017-04-22T01:25:18Z

Will work on fixing the unit tests.

Augments the init-container so that we don't need to use a separate image, but on submission two containers are bootstrapped instead for a cleaner architecture.

mccheah · 2017-04-25T17:15:12Z

This change is ready for review.

ash211

Some minor suggestions, nothing major. Looks good!

ash211 · 2017-04-27T20:29:54Z

...ain/scala/org/apache/spark/deploy/kubernetes/submit/v2/DownloadRemoteDependencyManager.scala

+    initContainerImage: String) extends DownloadRemoteDependencyManager {
+
+  private val jarsToDownload = KubernetesFileUtils.getOnlyRemoteFiles(sparkJars)
+  private val filesToDownload = KubernetesFileUtils.getOnlyRemoteFiles(sparkFiles)


move these into buildInitContainerConfigMap since that's the only place they're used. Otherwise these run in the object constructor which I don't think we want (constructors should be cheap)

ash211 · 2017-04-27T20:37:10Z

...la/org/apache/spark/deploy/kubernetes/submit/v2/PropertiesConfigMapFromScalaMapBuilder.scala

+import io.fabric8.kubernetes.api.model.{ConfigMap, ConfigMapBuilder}
+
+/**
+ * Creates a config map from a map object, with a single given key


Creates a Kubernetes ConfigMap

ash211 · 2017-04-27T20:39:30Z

.../apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyDownloadInitContainer.scala

+}
+
+/**
+ * Process that fetches files from a resource staging server and/or arbi trary remote locations.


nit: arbitrary

ash211 · 2017-04-27T20:43:44Z

.../apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyDownloadInitContainer.scala

+        downloadJarsSecretLocation,
+        stagingServerJarsDownloadDir,
+        "Starting to download jars from resource staging server...",
+        "Finished downloading jars from resource staging server.",


change this two starting/finished message params to just "jars" or "files" and do the logging in the method. This also lets us drop the last two failure message parameters too right?

ash211 · 2017-04-27T20:44:49Z

.../apache/spark/deploy/rest/kubernetes/v2/KubernetesSparkDependencyDownloadInitContainer.scala

+    val remoteJarsDownload = Future[Unit] {
+      downloadFiles(remoteJars,
+        remoteJarsDownloadDir,
+        s"Remote jars download directory specified at $remoteJarsDownloadDir does not exist" +


change this to just "jars" or "files" too

ash211 · 2017-04-27T20:48:16Z

...ernetes/core/src/test/scala/org/apache/spark/deploy/kubernetes/submit/v2/ClientV2Suite.scala

+    remoteDependencyManagerProvider = mock[DownloadRemoteDependencyManagerProvider]
+    remoteDependencyManager = mock[DownloadRemoteDependencyManager]
+    when(remoteDependencyManagerProvider.getDownloadRemoteDependencyManager(any(), any(), any()))
+      .thenAnswer(new Answer[DownloadRemoteDependencyManager] {


can you use ArgumentCaptor.forClass instead of any() to eliminate this custom Answer implementation?

ash211 · 2017-04-27T20:52:59Z

...ts/src/test/scala/org/apache/spark/deploy/kubernetes/integrationtest/KubernetesV2Suite.scala

+      s"$assetServerUri/${KubernetesSuite.EXAMPLES_JAR_FILE.getName}",
+      s"$assetServerUri/${KubernetesSuite.HELPER_JAR_FILE.getName}"
+    ))
+    runSparkAppAndVerifyCompletion(SparkLauncher.NO_RESOURCE)


does this PR fix #213 as a side effect?

ash211 · 2017-04-27T20:54:36Z

...rc/test/scala/org/apache/spark/deploy/kubernetes/integrationtest/SparkReadinessWatcher.scala

+import io.fabric8.kubernetes.client.Watcher.Action
+import io.fabric8.kubernetes.client.internal.readiness.Readiness
+
+private[spark] class SparkReadinessWatcher[T <: HasMetadata] extends Watcher[T] {


this doesn't seem Spark specific, can we call ResourceReadinessWatcher instead?

ash211 · 2017-04-27T20:56:05Z

...est/scala/org/apache/spark/deploy/kubernetes/integrationtest/StaticAssetServerLauncher.scala

+      val podIP = kubernetesClient.pods().withName(pod.getMetadata.getName).get()
+        .getStatus
+        .getPodIP
+      s"http://$podIP:8080"


pull port 8080 out to a val

ash211 · 2017-04-27T20:56:33Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/config.scala

+
+  private[spark] val INIT_CONTAINER_REMOTE_JARS =
+    ConfigBuilder("spark.kubernetes.driver.initcontainer.remoteJars")
+      .doc("Comma-separated list of jar URIs to download in the init-container. This is inferred" +


nit: inferred -> calculated

foxish · 2017-04-27T21:06:08Z

...urce-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/config.scala

-  private[spark] val DRIVER_LOCAL_JARS_DOWNLOAD_LOCATION =
-    ConfigBuilder("spark.kubernetes.driver.mountdependencies.jarsDownloadDir")
+  private[spark] val DRIVER_SUBMITTED_JARS_DOWNLOAD_LOCATION =
+    ConfigBuilder("spark.kubernetes.driver.mountdependencies.submittedJars.downloadDir")


The config string is getting long and unwieldy. spark.kubernetes.driver.jars and spark.kubernetes.driver.files?

We probably want to indicate that this is where the jars are being downloaded to. spark.kubernetes.driver.jars is ambiguous in the sense that it could be jars that need to be uploaded or downloaded or added to its classpath, etc.

foxish · 2017-04-27T21:22:01Z

...e-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/kubernetes/constants.scala

+    "download-submitted-files"
+  private[spark] val INIT_CONTAINER_SUBMITTED_FILES_PROPERTIES_FILE_VOLUME =
+    "download-submitted-files-properties"
+  private[spark] val INIT_CONTAINER_SUBMITTED_FILES_PROPERTIES_FILE_MOUNT_PATH =


There's a lot of growth in complexity here in terms of mount paths and other parameters. Is it possible for us to use fewer init-containers, or group these better?

We could download all of the jars from all locations to the same directory (resource staging server and remote) as well as the files, but I'm concerned about file name conflicts and how to deduplicate those. Perhaps we should explicitly forbid that multiple URIs end with the same file name. @aash for thoughts.

Having identically named jars coming from different places sounds like an anti-pattern. I don't think prohibiting that is all that bad.

There's already the contract that file names must be unique as they're all downloaded into the cwd

foxish · 2017-04-27T21:27:08Z

two containers are bootstrapped for a cleaner architecture.

Why do we need two separate init containers? If we group these in terms of intent, we would have one init-container to download dependencies and supply them to the main container. I think it would help if you could explain what the separate init containers do and what problem this solves.

mccheah · 2017-04-27T21:39:14Z

Why do we need two separate init containers? If we group these in terms of intent, we would have one init-container to download dependencies and supply them to the main container. I think it would help if you could explain what the separate init containers do and what problem this solves.

It's primarily a code-level decision, but it's arguable either way. With the two init-containers we only use an instance of SubmittedDependencyManager and its associated configuration if the resource staging server is being used. It's a little harder to reason about having a single init container that has a "base" of downloading remote dependencies but then is optionally-modified for handling locally-submitted resources as well.

ash211 · 2017-04-27T21:51:29Z

Thanks good catch @foxish -- I didn't see that we were using two init containers.

In my experience init containers add several seconds before pod readiness, something like 3-10 for a Go-based container we've been using internally. So if we want to make job startup time a priority (this is valuable to us at least) I think I'd want to try pretty hard to have only one init container to download resources from both sources (submission server + remote resources).

foxish · 2017-04-27T21:57:30Z

In my experience, idiomatic applications I've seen so far have a single init container, and it gets harder to debug when there are multiple init containers.

mccheah · 2017-04-27T21:57:57Z

@aash @foxish I'm ok with moving to a one-container design.

The problem is #249, which also has some major refactors based on the assumption that we use multiple containers. Given that, I want to combine #249 with this PR, and then start the redesign from there.

mccheah · 2017-04-27T21:58:37Z

The resultant PR will be rather large, so I apologize for that. But it's the easiest way to make sure we don't create large merge conflicts for ourselves.

ash211 · 2017-04-27T22:00:56Z

Yeah sorry for not jumping on this right away -- we might've been able to catch the 1 vs 2 container design choice earlier.

Adding the init container to driver and executor seems semantically close enough that it should go well in one PR (though a large one).

Let's do that I think.

mccheah · 2017-04-27T22:16:42Z

There's also #246 but I just did some rebasing locally and found that the merge conflicts resolved there by rearranging the commits wasn't too bad.

foxish · 2017-04-27T22:35:56Z

@mccheah Should #249 and #246 be reviewed first then?

mccheah · 2017-04-27T22:36:37Z

Hold off on reviewing anything until I've pushed everything fresh.

mccheah · 2017-04-28T04:32:36Z

Superceded by #251

…#240) netty/netty@c37267d

mccheah force-pushed the init-container-downloads-remote-files branch from 7216b86 to 7bfb342 Compare April 22, 2017 01:26

mccheah changed the base branch from submit-v2-end-to-end to branch-2.1-kubernetes April 25, 2017 17:06

Download remotely-located resources on driver startup.

e84737d

Augments the init-container so that we don't need to use a separate image, but on submission two containers are bootstrapped instead for a cleaner architecture.

mccheah force-pushed the init-container-downloads-remote-files branch from 1849ff3 to e84737d Compare April 25, 2017 17:09

mccheah changed the title ~~[WIP] Download remotely-located resources on driver startup.~~ Download remotely-located resources on driver startup. Apr 25, 2017

ash211 reviewed Apr 27, 2017

View reviewed changes

foxish reviewed Apr 27, 2017

View reviewed changes

mccheah mentioned this pull request Apr 28, 2017

Download remotely-located resources on driver and executor startup via init-container #251

Merged

mccheah closed this Apr 28, 2017

ifilonenko pushed a commit to ifilonenko/spark that referenced this pull request Feb 26, 2019

PDS-55029 Bump netty to 4.1.13 to pick up bugfix (apache-spark-on-k8s…

2460bb8

…#240) netty/netty@c37267d

Uh oh!

Download remotely-located resources on driver startup. #240

Download remotely-located resources on driver startup. #240

Uh oh!

Conversation

mccheah commented Apr 22, 2017

Uh oh!

mccheah commented Apr 22, 2017

Uh oh!

mccheah commented Apr 25, 2017

Uh oh!

ash211 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mccheah Apr 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

foxish commented Apr 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mccheah commented Apr 27, 2017

Uh oh!

ash211 commented Apr 27, 2017

Uh oh!

foxish commented Apr 27, 2017

Uh oh!

mccheah commented Apr 27, 2017

Uh oh!

mccheah commented Apr 27, 2017

Uh oh!

ash211 commented Apr 27, 2017

Uh oh!

mccheah commented Apr 27, 2017

Uh oh!

foxish commented Apr 27, 2017

Uh oh!

mccheah commented Apr 27, 2017

Uh oh!

mccheah commented Apr 28, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mccheah Apr 27, 2017 •

edited

Loading

foxish commented Apr 27, 2017 •

edited

Loading