[SPARK-22757][Kubernetes] Enable use of remote dependencies (http, s3, gcs, etc.) in Kubernetes mode #19954

liyinan926 · 2017-12-12T20:18:45Z

What changes were proposed in this pull request?

This PR expands the Kubernetes mode to be able to use remote dependencies on http/https endpoints, GCS, S3, etc. It adds steps for configuring and appending the Kubernetes init-container into the driver and executor pods for downloading remote dependencies.
Init-containers, as the name suggests, are containers that are run to completion before the main containers start, and are often used to perform initialization tasks prior to starting the main containers. We use init-containers to localize remote application dependencies before the driver/executors start running. The code that the init-container runs is also included. This PR also adds a step to the driver and executors for mounting user-specified secrets that may store credentials for accessing data storage, e.g., S3 and Google Cloud Storage (GCS), into the driver and executors.

How was this patch tested?

The patch contains unit tests which are passing.
Manual testing: ./build/mvn -Pkubernetes clean package succeeded.
Manual testing of the following cases:
- Running SparkPi using container-local spark-example jar.
- Running SparkPi using container-local spark-example jar with user-specific secret mounted.
- Running SparkPi using spark-example jar hosted remotely on an https endpoint.

cc @rxin @felixcheung @mateiz (shepherd)
k8s-big-data SIG members & contributors: @mccheah @foxish @ash211 @ssuchter @varunkatta @kimoonkim @erikerlandson @tnachen @ifilonenko @liyinan926
reviewers: @vanzin @felixcheung @jiangxb1987 @mridulm

SparkQA · 2017-12-12T20:24:16Z

Test build #84789 has finished for PR 19954 at commit f9dc86d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-12T21:39:18Z

Test build #84793 has finished for PR 19954 at commit cd5e832.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-12T22:02:15Z

Test build #84795 has finished for PR 19954 at commit b9a0090.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-13T00:48:11Z

Test build #84798 has finished for PR 19954 at commit 5512d80.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-13T08:05:01Z

Test build #84838 has finished for PR 19954 at commit 1a74521.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

foxish · 2017-12-13T16:52:08Z

@liyinan926, can you update the PR title please?
Maybe "Enable submission of GCS (or S3) and HTTP dependencies to Kubernetes Scheduler Backend"

foxish · 2017-12-14T17:24:54Z

@rxin @mateiz @vanzin @mridulm @jiangxb1987 @felixcheung, if you can help take a look at this, it would add a lot of value to Kubernetes mode in 2.3. Granted we already have a way to supply dependencies within docker images and that works as expected (with the minor fix in #19972), this would enable submitting from http and from object storage. That's a big value-add for people and enables them to have standard images and not bake a new one for each spark application.

If we can get this review going before everyone leaves for vacation (and even if it gets picked up after we all return), that's a big step for us in the community.

erikerlandson · 2017-12-14T17:32:13Z

+1 to @foxish comment above. For accessing data from http, s3, etc this will be a huge reduction in barrier to entry. The difference between having to spin a custom docker image and just using it out of the box.

liyinan926 · 2017-12-14T17:56:05Z

+1. This PR closes a big feature gap of the Kubernetes scheduler backend compared with other backends. This makes the Kubernetes backend much more usable.

mridulm · 2017-12-14T19:31:25Z

This is promising ! I will take look at it over the weekend, thanks for the great work :-)
My only concern would be if we can minimize fetches if multiple containers are running on the same host - sort of like what yarn does : but that is a minor nit compared to basic feature.

foxish · 2017-12-14T19:36:42Z

@mridulm We did have some discussions on resource localization at some point. This is a powerful mechanism when coupled with a resource staging server (in the future). There is a cost per-pod when using localization in this manner, but there is an alternative - using a docker image baked with dependencies - k8s does node-level caching of the downloaded images, so, one would pay the cost only once per node.

ueshin · 2017-12-15T07:41:02Z

...bernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala

@@ -209,9 +214,33 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf)
        .build()
    }.getOrElse(executorContainer)

-    new PodBuilder(executorPod)
+    val (withMaybeSecretsMountedPod, withMaybeSecretsMountedContainer) =
+      mountSecretsBootstrap.map {bootstrap =>


nit: add a space before bootstrap.

ueshin · 2017-12-15T08:05:35Z

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala

+      initContainer,
+      mainContainerWithMountedFiles)
+  }
+


nit: remove an extra line.

ueshin · 2017-12-15T08:06:55Z

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala

+          .build())
+
+    val initContainer = new ContainerBuilder(podWithDetachedInitContainer.initContainer)
+      .withName(s"spark-init")


nit: remove s.

ueshin · 2017-12-15T08:34:04Z

.../scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala

+        remoteJars,
+        jarsDownloadDir,
+        s"Remote jars download directory specified at $jarsDownloadDir does not exist " +
+          s"or is not a directory.")


nit: remove s.

ueshin · 2017-12-15T09:50:35Z

...es/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala

+    // dependencies. The executor's main container and its init-container share the secrets
+    // because the init-container is sort of an implementation details and this sharing
+    // avoids introducing a dedicated configuration property just for the init-container.
+    val mayBeInitContainerMountSecretsBootstrap = if (maybeInitContainerBootstrap.nonEmpty &&


What's the difference between this and mayBeMountSecretBootstrap above? Looks like mounting the same paths twice?

mayBeMountSecretBootstrap is for the executor main container, whereas mayBeInitContainerMountSecretsBootstrap is for the executor init container, which is run before the main container starts.

Oh, I see. Thanks!

SparkQA · 2017-12-15T20:11:18Z

Test build #84966 has finished for PR 19954 at commit 38b850f.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin

This is a pretty large change, and I need more time to look at everything. I left some comments, mostly stylistic, which also should apply to the code I haven't commented on yet.

In general, reading the code becomes a little tiring because it's hard to mentally keep track of all the long names being used.

vanzin · 2017-12-15T23:12:40Z

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala

      .stringConf
      .createWithDefault("/var/spark-data/spark-files")

+  val INIT_CONTAINER_DOCKER_IMAGE =
+    ConfigBuilder("spark.kubernetes.initContainer.docker.image")


Mirroring the discussion in the other PR, are these really restricted to docker? Is it a required config?

None of it is docker specific, can be container everywhere.

Is it a required config?

No, as one may forgo the init container if they're building the deps into the docker image itself and supplying it via local:/// paths.

Renamed to spark.kubernetes.initContainer.image.

vanzin · 2017-12-15T23:27:00Z

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala

+    }
+    val initContainerCustomEnvVars = sparkConf.getAllWithPrefix(initContainerCustomEnvVarKeyPrefix)
+      .toSeq
+      .map(env =>


.map { env =>

vanzin · 2017-12-15T23:28:15Z

...agers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala

+
+  override def mountSecrets(pod: Pod, container: Container): (Pod, Container) = {
+    var podBuilder = new PodBuilder(pod)
+    secretNamesToMountPaths.keys.foreach(name =>


.foreach { name =>

vanzin · 2017-12-15T23:28:38Z

...agers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/MountSecretsBootstrap.scala

+          .endSpec())
+
+    var containerBuilder = new ContainerBuilder(container)
+    secretNamesToMountPaths.foreach(namePath =>


.foreach { namePath =>. You get the idea. Please fix all of these.

vanzin · 2017-12-15T23:42:43Z

...src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigurationStepsOrchestrator.scala

@@ -116,10 +122,53 @@ private[spark] class DriverConfigurationStepsOrchestrator(
      None
    }

+    val mayBeInitContainerBootstrapStep =
+      if (areAnyFilesNonContainerLocal(sparkJars ++ sparkFiles)) {
+        val initContainerConfigurationStepsOrchestrator =


The variable names in this code are very long in general; verbosity can both help and harm readability, in this case I don't think it's helping much. For example, orchestrator is just as good a name for this variable, since there's no other orchestrator being used here.

vanzin · 2017-12-16T00:02:10Z

...park/deploy/k8s/submit/steps/initcontainer/InitContainerConfigurationStepsOrchestrator.scala

+ * Returns the complete ordered list of steps required to configure the init-container. This is
+ * only used when there are remote application dependencies to localize.
+ */
+private[spark] class InitContainerConfigurationStepsOrchestrator(


Similarly, some type names are also really long. InitContainerOrchestrator sounds just as good to me.

Renamed to InitContainerConfigOrchestrator and similarly DriverConfigurationStepsOrchestrator to DriverConfigOrchestrator.

vanzin · 2017-12-16T00:04:04Z

...e-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/FileFetcher.scala

+/**
+ * Utility for fetching remote file dependencies.
+ */
+private[spark] trait FileFetcher {


Why do you need a trait for this? If it's for the tests, you can mock classes, you don't need an interface for that.

Yeah, removed the trait.

vanzin · 2017-12-16T00:06:58Z

.../scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala

+ * with different configurations for different download sources, or using the same container to
+ * download everything at once.
+ */
+private[spark] class KubernetesSparkDependencyDownloadInitContainer(


This name is so long it becomes confusing. How about KubernetesInitContainer? Or do you plan to have multiple different init containers for different things (ugh)?

Renamed to SparkPodInitContainer.

vanzin · 2017-12-16T00:08:40Z

.../scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala

+        s"Remote files download directory specified at $filesDownloadDir does not exist " +
+          "or is not a directory.")
+    }
+    waitForFutures(


You have a thread pool, but really you're just submitting two tasks. Why not one task for each file / jar?

This class actually handles more tasks in our fork. For example, is is also responsible for downloading from the resource staging server that hosts submission client dependencies. The resource staging server will be in a future PR.

Sure, but that's not my point. If you have 10 jars and 10 files to download, the current code will only download 2 at a time. If you submit each jar / file separately, you'll download as many as your thread pool allows, and you can make that configurable.

Got it, will address this.

Updated to create one task per file/jar to download. Regarding the type of thread pool, we are using a CachedThreadPool, which I think makes sense as it can be expected that the tasks are not long-lived.

vanzin · 2017-12-16T00:09:38Z

.../scala/org/apache/spark/deploy/rest/k8s/KubernetesSparkDependencyDownloadInitContainer.scala

+    }
+  }
+
+  private def waitForFutures(futures: Future[_]*) {


: Unit = {

But really, there's a single caller, so just inline.

liyinan926 · 2017-12-16T01:00:43Z

@vanzin Addressed your comments so far. PTAL. Thanks!

SparkQA · 2017-12-16T03:45:58Z

Test build #84985 has finished for PR 19954 at commit 46a8c99.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-17T00:20:32Z

Test build #85013 has finished for PR 19954 at commit 197882d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-17T19:44:24Z

Test build #85028 has finished for PR 19954 at commit e20e212.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-17T20:18:11Z

Test build #85029 has finished for PR 19954 at commit 340fa41.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2017-12-18T09:36:04Z

Jenkins, retest this please.

ueshin · 2017-12-18T10:06:28Z

...s/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala

-      orchestrator: DriverConfigurationStepsOrchestrator,
-      types: Class[_ <: DriverConfigurationStep]*): Unit = {
+                                 orchestrator: DriverConfigOrchestrator,
+                                 types: Class[_ <: DriverConfigurationStep]*): Unit = {


nit: indent

ueshin · 2017-12-18T10:09:40Z

...t/scala/org/apache/spark/deploy/k8s/submit/steps/DriverInitContainerBootstrapStepSuite.scala

+  }
+}
+
+private object FirstTestInitContainerConfigurationStep$ extends InitContainerConfigurationStep {


Do we need $ at the end of the object name?

No, it shouldn't be there. Removed.

ueshin · 2017-12-18T10:09:57Z

...t/scala/org/apache/spark/deploy/k8s/submit/steps/DriverInitContainerBootstrapStepSuite.scala

+  }
+}
+
+private object SecondTestInitContainerConfigurationStep$ extends InitContainerConfigurationStep {


ueshin · 2017-12-18T10:18:37Z

...ernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/SparkConfPropertiesParser.scala

+    val sparkConf = new SparkConf(true)
+
+    if (!propertiesFile.isFile) {
+      throw new IllegalArgumentException(s"Server properties file given at" +


nit: remove s.
nit: move a space at the beginning of the string of the next line to the end of the string of this line.

SparkQA · 2017-12-18T12:50:33Z

Test build #85056 has finished for PR 19954 at commit 340fa41.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liyinan926 · 2017-12-18T16:24:19Z

@ueshin addressed your comments. PTAL. Thanks!

SparkQA · 2017-12-18T19:49:07Z

Test build #85066 has finished for PR 19954 at commit 9ebfc73.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-18T23:17:12Z

Test build #85069 has finished for PR 19954 at commit ddcb0f2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin

I started looking at this again, but sorry, the long variable names make it really hard to read and, especially, understand this code. I'm always getting lost because all variable names look the same aside from some small difference at the end.

I also don't fully understand all the abstractions being created here. It seems there are three separate concepts (a "bootstrap", a "configuration step", and an "orchestrator") and they're not used consistently.

It seems for example that "orchestrators" are used for the drivers, but for executors there's different code that does some similar things but is instead baked into ExecutorPodFactory (which is another trait with a single implementation).

It would be nice to take a look at the abstraction here and make sure it makes sense and is being used consistently.

At the very least, a "README.md" file explaining how these things are tied together would help a lot those who did not write this code.

vanzin · 2017-12-18T23:43:44Z

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala

+ * This is separated out from the init-container steps API because this component can be reused to
+ * set up the init-container for executors as well.
+ */
+private[spark] trait InitContainerBootstrap {


What's the purpose of all these traits that have a single implementation? That seems unnecessary.

It's more idiomatic to mock a trait than a class and our unit tests always create mocks for every component that isn't the class under test.

It's more idiomatic to mock a trait than a class

Why? You can mock classes just fine.

It's probably fine to just use the class here, but some classes can't be mocked, such as final classes or classes with final methods. Having traits everywhere ensures that even if we change the classes down the road to have such characteristics, our tests won't break.

This also is not entirely without precedent. TaskScheduler is only implemented by TaskSchedulerImpl in the main scheduler code, as is TaskContext being extended only by TaskContextImpl. Putting a trait in front of an implementation communicates that it's expected for tests that dependency-inject instances of this to create stub implementations.

But we might be splitting hairs at this point, so using only the class could suffice until we run into problems from having done so.

vanzin · 2017-12-18T23:46:29Z

...gers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/InitContainerBootstrap.scala

+  extends InitContainerBootstrap {
+
+  override def bootstrapInitContainer(
+      podWithDetachedInitContainer: PodWithDetachedInitContainer): PodWithDetachedInitContainer = {


I commented about this in my previous review, but could you try to use shorter variable names throughout the PR?

For example, here, just repeating the name of the already long type to name the variable doesn't really help with readability. Imagine if you have two of those, are you going to start adding counters to the already long name?

vanzin · 2017-12-19T00:05:44Z

...c/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverInitContainerBootstrapStep.scala

+ * configuration properties for the init-container.
+ */
+private[spark] class DriverInitContainerBootstrapStep(
+    initContainerConfigurationSteps: Seq[InitContainerConfigurationStep],


Another place where variable names are unnecessarily long and harm readability. initContainer is already implicit given the name of the class. I have to parse the variable name every time and ignore the prefix to know which one it is.

mccheah · 2017-12-19T01:01:57Z

I also don't fully understand all the abstractions being created here. It seems there are three separate concepts (a "bootstrap", a "configuration step", and an "orchestrator") and they're not used consistently.

A configuration step is a logical unit that applies an additive transformation to the input. A steps orchestrator selects which configuration steps to apply based on the configuration of the application. A bootstrap is a component that can be shared by one or more configuration steps and the driver, since a lot of times the submission client and the driver will share code for configuring the driver and the executor pods, respectively. We discuss this a bit more in here: https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/architecture-docs/submission-client.md - but we don't cover the bootstrap abstraction. We're open to different representations and architectures as well.

liyinan926 · 2017-12-19T01:47:27Z

@mccheah @foxish I gave you push access to the fork I used for this PR. Feel free to push commits if you want. Please do let me know if you plan to address comments from @vanzin and make changes. Otherwise, I will address them later tonight.

vanzin · 2017-12-19T01:57:34Z

We discuss this a bit more in here

It's nice that there is some documentation somewhere, but that documentation doesn't really seem to address my comments. For one example, it only explicitly talks about the driver - which sort of makes sense because the document is about submission. But why aren't orchestrators used when starting executors too? It seems there's similar code baked into another class instead.

What I'm asking is for this to be documented properly so that someone who didn't write the code has enough information to know that it's working as it should. Right now I don't see what some of these abstractions are for at all - for example, as far as I can see, the orchestrator can be replaced by a method call instead of being a completely separate type; it's not really abstracting anything. Look at where it's used:

    val configurationStepsOrchestrator = new DriverConfigOrchestrator(/* long list of arguments */)

    Utils.tryWithResource(SparkKubernetesClientFactory.createKubernetesClient(/* another long list of arguments */)0) { kubernetesClient =>
        val client = new Client(
          configurationStepsOrchestrator.getAllConfigurationSteps(),
          /* another long list of arguments */

So aside from not really being able to infer the structure of how these things work, the current abstraction seems to be creating a lot of constructors and methods with long lists of arguments, which is another thing that hurts the readability of the code.

SparkQA · 2017-12-23T02:11:21Z

Test build #85324 has finished for PR 19954 at commit 785b90e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-23T03:26:08Z

Test build #85327 has finished for PR 19954 at commit f82c568.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-24T07:52:12Z

Test build #85352 has finished for PR 19954 at commit 9d9c841.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987

Mostly LGTM only some nits. Also cc @ueshin to take another look.

jiangxb1987 · 2017-12-25T16:25:08Z

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala

+      .createOptional
+
+  val INIT_CONTAINER_MOUNT_TIMEOUT =
+    ConfigBuilder("spark.kubernetes.mountDependencies.timeout")


nit: spark.kubernetes.initContainer.mountDependencies.timeout?

Please see the response regarding spark.kubernetes.mountDependencies.maxSimultaneousDownloads.

jiangxb1987 · 2017-12-25T16:25:46Z

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala

+      .createWithDefault(5)
+
+  val INIT_CONTAINER_MAX_THREAD_POOL_SIZE =
+    ConfigBuilder("spark.kubernetes.mountDependencies.maxSimultaneousDownloads")


nit: spark.kubernetes.initContainer.mountDependencies.maxSimultaneousDownloads?

I think the current name is already pretty long. Adding initContainer makes it even longer without much added value.

jiangxb1987 · 2017-12-25T16:27:45Z

resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala

+    ConfigBuilder("spark.kubernetes.mountDependencies.timeout")
+      .doc("Timeout before aborting the attempt to download and unpack dependencies from remote " +
+        "locations into the driver and executor pods.")
+      .timeConf(TimeUnit.MINUTES)


Why not TimeUnit.SECONDS?

jiangxb1987 · 2017-12-25T17:02:49Z

...rnetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestrator.scala

+      mountSecretsStep
+  }
+
+  private def areAnyFilesNonContainerLocal(files: Seq[String]): Boolean = {


nit:areAnyFilesNonContainerLocal -> existNonContainerLocalFiles

jiangxb1987 · 2017-12-25T17:23:32Z

.../kubernetes/core/src/main/scala/org/apache/spark/deploy/rest/k8s/SparkPodInitContainer.scala

+  private def downloadFiles(
+      filesCommaSeparated: Option[String],
+      downloadDir: File,
+      errMessageOnDestinationNotADirectory: String): Unit = {


nit: errMessageOnDestinationNotADirectory -> errMessage?

jiangxb1987 · 2017-12-25T17:31:13Z

...es/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala

+    val initContainerConfigMapKey = sparkConf.get(INIT_CONTAINER_CONFIG_MAP_KEY_CONF)
+
+    if (initContainerConfigMap.isEmpty) {
+      logWarning("The executor's init-container config map was not specified. Executors will " +


nit: was not -> is not

jiangxb1987 · 2017-12-25T17:31:29Z

...es/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala

+    }
+
+    if (initContainerConfigMapKey.isEmpty) {
+      logWarning("The executor's init-container config map key was not specified. Executors will " +


nit: was not -> is not

SparkQA · 2017-12-25T21:36:59Z

Test build #85384 has finished for PR 19954 at commit c51bc56.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin

LGTM except for one comment.

ueshin · 2017-12-26T07:23:02Z

resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile

+
+# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
+# command should be invoked from the top level directory of the Spark distribution. E.g.:
+# docker build -t spark-init:latest -f dockerfiles/init-container/Dockerfile .


kubernetes/dockerfiles/.. instead of dockerfiles/..

Btw, only nits but seems like paths here in Dockerfiles for driver/executor are wrong: kubernetes/dockerfiles/driver/Dockerfile and kubernetes/dockerfiles/executor/Dockerfile respectively?

SparkQA · 2017-12-26T12:00:28Z

Test build #85398 has finished for PR 19954 at commit 28343fb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liyinan926 · 2017-12-27T17:59:58Z

@vanzin @jiangxb1987 @ueshin Can this PR be merged now?

ueshin · 2017-12-28T04:44:33Z

Thanks! merging to master.

…rets ## What changes were proposed in this pull request? This PR updates the Kubernetes documentation corresponding to the following features/changes in apache#19954. * Ability to use remote dependencies through the init-container. * Ability to mount user-specified secrets into the driver and executor pods. vanzin jiangxb1987 foxish Author: Yinan Li <[email protected]> Closes apache#20059 from liyinan926/doc-update.

## What changes were proposed in this pull request? We missed enabling `spark.files` and `spark.jars` in #19954. The result is that remote dependencies specified through `spark.files` or `spark.jars` are not included in the list of remote dependencies to be downloaded by the init-container. This PR fixes it. ## How was this patch tested? Manual tests. vanzin This replaces #20157. foxish Author: Yinan Li <[email protected]> Closes #20160 from liyinan926/SPARK-22757. (cherry picked from commit 6cff7d1) Signed-off-by: Felix Cheung <[email protected]>

## What changes were proposed in this pull request? We missed enabling `spark.files` and `spark.jars` in apache#19954. The result is that remote dependencies specified through `spark.files` or `spark.jars` are not included in the list of remote dependencies to be downloaded by the init-container. This PR fixes it. ## How was this patch tested? Manual tests. vanzin This replaces apache#20157. foxish Author: Yinan Li <[email protected]> Closes apache#20160 from liyinan926/SPARK-22757.

liyinan926 changed the title ~~[SPARK-22757][Kubernetes] add init-container bootstrapping and secret mounting steps~~ [SPARK-22757][Kubernetes] Enable use of remote dependencies (http, s3, gcs, etc.) in Kubernetes mode Dec 13, 2017

ueshin reviewed Dec 15, 2017

View reviewed changes

vanzin reviewed Dec 16, 2017

View reviewed changes

ueshin reviewed Dec 18, 2017

View reviewed changes

vanzin reviewed Dec 19, 2017

View reviewed changes

liyinan926 added 8 commits December 22, 2017 16:11

Create one task per jar/file to download in the init-container

9c8051a

More review comments

1f65417

Shorten variable names

109ad80

Removed traits that have only a single implementation

c21fdcf

Remove unused class arguments

a3cd71d

Improved documentation

23c5cd9

Addressed latest round of comments

2ec15c4

Addressed more comments

5d1f889

Updated names of two configuration properties

9d9c841

jiangxb1987 reviewed Dec 25, 2017

View reviewed changes

Addressed more comments

c51bc56

ueshin reviewed Dec 26, 2017

View reviewed changes

Addressed one more comment

28343fb

asfgit closed this in 171f6dd Dec 28, 2017

This was referenced Jan 4, 2018

[SPARK-22757][K8S] Enable spark.jars and spark.files in KUBERNETES mode #20157

Closed

[SPARK-22757][K8S] Enable spark.jars and spark.files in KUBERNETES mode #20160

Closed

[SPARK-22757][Kubernetes] Enable use of remote dependencies (http, s3, gcs, etc.) in Kubernetes mode #19954

[SPARK-22757][Kubernetes] Enable use of remote dependencies (http, s3, gcs, etc.) in Kubernetes mode #19954

Conversation

liyinan926 commented Dec 12, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Dec 12, 2017

SparkQA commented Dec 12, 2017

SparkQA commented Dec 12, 2017

SparkQA commented Dec 13, 2017

SparkQA commented Dec 13, 2017

foxish commented Dec 13, 2017

foxish commented Dec 14, 2017 • edited Loading

erikerlandson commented Dec 14, 2017

liyinan926 commented Dec 14, 2017

mridulm commented Dec 14, 2017 • edited Loading

foxish commented Dec 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Dec 15, 2017

vanzin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liyinan926 commented Dec 16, 2017

SparkQA commented Dec 16, 2017

SparkQA commented Dec 17, 2017

SparkQA commented Dec 17, 2017

SparkQA commented Dec 17, 2017

ueshin commented Dec 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Dec 18, 2017

liyinan926 commented Dec 18, 2017

SparkQA commented Dec 18, 2017

SparkQA commented Dec 18, 2017

vanzin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mccheah commented Dec 19, 2017 • edited Loading

liyinan926 commented Dec 19, 2017

vanzin commented Dec 19, 2017 • edited Loading

SparkQA commented Dec 23, 2017

SparkQA commented Dec 23, 2017

SparkQA commented Dec 24, 2017

liyinan926 commented Dec 12, 2017 •

edited

Loading

foxish commented Dec 14, 2017 •

edited

Loading

mridulm commented Dec 14, 2017 •

edited

Loading

foxish commented Dec 14, 2017 •

edited

Loading

mccheah commented Dec 19, 2017 •

edited

Loading

vanzin commented Dec 19, 2017 •

edited

Loading