Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22757][Kubernetes] Enable use of remote dependencies (http, s3, gcs, etc.) in Kubernetes mode #19954

Closed
wants to merge 14 commits into from

Conversation

liyinan926
Copy link
Contributor

@liyinan926 liyinan926 commented Dec 12, 2017

What changes were proposed in this pull request?

This PR expands the Kubernetes mode to be able to use remote dependencies on http/https endpoints, GCS, S3, etc. It adds steps for configuring and appending the Kubernetes init-container into the driver and executor pods for downloading remote dependencies.
Init-containers, as the name suggests, are containers that are run to completion before the main containers start, and are often used to perform initialization tasks prior to starting the main containers. We use init-containers to localize remote application dependencies before the driver/executors start running. The code that the init-container runs is also included. This PR also adds a step to the driver and executors for mounting user-specified secrets that may store credentials for accessing data storage, e.g., S3 and Google Cloud Storage (GCS), into the driver and executors.

How was this patch tested?

  • The patch contains unit tests which are passing.
  • Manual testing: ./build/mvn -Pkubernetes clean package succeeded.
  • Manual testing of the following cases:
    • Running SparkPi using container-local spark-example jar.
    • Running SparkPi using container-local spark-example jar with user-specific secret mounted.
    • Running SparkPi using spark-example jar hosted remotely on an https endpoint.

cc @rxin @felixcheung @mateiz (shepherd)
k8s-big-data SIG members & contributors: @mccheah @foxish @ash211 @ssuchter @varunkatta @kimoonkim @erikerlandson @tnachen @ifilonenko @liyinan926
reviewers: @vanzin @felixcheung @jiangxb1987 @mridulm

@SparkQA
Copy link

SparkQA commented Dec 12, 2017

Test build #84789 has finished for PR 19954 at commit f9dc86d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 12, 2017

Test build #84793 has finished for PR 19954 at commit cd5e832.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 12, 2017

Test build #84795 has finished for PR 19954 at commit b9a0090.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84798 has finished for PR 19954 at commit 5512d80.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84838 has finished for PR 19954 at commit 1a74521.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@foxish
Copy link
Contributor

foxish commented Dec 13, 2017

@liyinan926, can you update the PR title please?
Maybe "Enable submission of GCS (or S3) and HTTP dependencies to Kubernetes Scheduler Backend"

@liyinan926 liyinan926 changed the title [SPARK-22757][Kubernetes] add init-container bootstrapping and secret mounting steps [SPARK-22757][Kubernetes] Enable use of remote dependencies (http, s3, gcs, etc.) in Kubernetes mode Dec 13, 2017
@foxish
Copy link
Contributor

foxish commented Dec 14, 2017

@rxin @mateiz @vanzin @mridulm @jiangxb1987 @felixcheung, if you can help take a look at this, it would add a lot of value to Kubernetes mode in 2.3. Granted we already have a way to supply dependencies within docker images and that works as expected (with the minor fix in #19972), this would enable submitting from http and from object storage. That's a big value-add for people and enables them to have standard images and not bake a new one for each spark application.

If we can get this review going before everyone leaves for vacation (and even if it gets picked up after we all return), that's a big step for us in the community.

@erikerlandson
Copy link
Contributor

+1 to @foxish comment above. For accessing data from http, s3, etc this will be a huge reduction in barrier to entry. The difference between having to spin a custom docker image and just using it out of the box.

@liyinan926
Copy link
Contributor Author

+1. This PR closes a big feature gap of the Kubernetes scheduler backend compared with other backends. This makes the Kubernetes backend much more usable.

@mridulm
Copy link
Contributor

mridulm commented Dec 14, 2017

This is promising ! I will take look at it over the weekend, thanks for the great work :-)
My only concern would be if we can minimize fetches if multiple containers are running on the same host - sort of like what yarn does : but that is a minor nit compared to basic feature.

@foxish
Copy link
Contributor

foxish commented Dec 14, 2017

@mridulm We did have some discussions on resource localization at some point. This is a powerful mechanism when coupled with a resource staging server (in the future). There is a cost per-pod when using localization in this manner, but there is an alternative - using a docker image baked with dependencies - k8s does node-level caching of the downloaded images, so, one would pay the cost only once per node.

@@ -209,9 +214,33 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf)
.build()
}.getOrElse(executorContainer)

new PodBuilder(executorPod)
val (withMaybeSecretsMountedPod, withMaybeSecretsMountedContainer) =
mountSecretsBootstrap.map {bootstrap =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a space before bootstrap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

initContainer,
mainContainerWithMountedFiles)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove an extra line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.build())

val initContainer = new ContainerBuilder(podWithDetachedInitContainer.initContainer)
.withName(s"spark-init")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

remoteJars,
jarsDownloadDir,
s"Remote jars download directory specified at $jarsDownloadDir does not exist " +
s"or is not a directory.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// dependencies. The executor's main container and its init-container share the secrets
// because the init-container is sort of an implementation details and this sharing
// avoids introducing a dedicated configuration property just for the init-container.
val mayBeInitContainerMountSecretsBootstrap = if (maybeInitContainerBootstrap.nonEmpty &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and mayBeMountSecretBootstrap above? Looks like mounting the same paths twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mayBeMountSecretBootstrap is for the executor main container, whereas mayBeInitContainerMountSecretsBootstrap is for the executor init container, which is run before the main container starts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Thanks!

@SparkQA
Copy link

SparkQA commented Dec 15, 2017

Test build #84966 has finished for PR 19954 at commit 38b850f.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty large change, and I need more time to look at everything. I left some comments, mostly stylistic, which also should apply to the code I haven't commented on yet.

In general, reading the code becomes a little tiring because it's hard to mentally keep track of all the long names being used.

.stringConf
.createWithDefault("/var/spark-data/spark-files")

val INIT_CONTAINER_DOCKER_IMAGE =
ConfigBuilder("spark.kubernetes.initContainer.docker.image")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mirroring the discussion in the other PR, are these really restricted to docker? Is it a required config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of it is docker specific, can be container everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a required config?

No, as one may forgo the init container if they're building the deps into the docker image itself and supplying it via local:/// paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to spark.kubernetes.initContainer.image.

}
val initContainerCustomEnvVars = sparkConf.getAllWithPrefix(initContainerCustomEnvVarKeyPrefix)
.toSeq
.map(env =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.map { env =>


override def mountSecrets(pod: Pod, container: Container): (Pod, Container) = {
var podBuilder = new PodBuilder(pod)
secretNamesToMountPaths.keys.foreach(name =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.foreach { name =>

.endSpec())

var containerBuilder = new ContainerBuilder(container)
secretNamesToMountPaths.foreach(namePath =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.foreach { namePath =>. You get the idea. Please fix all of these.

@@ -116,10 +122,53 @@ private[spark] class DriverConfigurationStepsOrchestrator(
None
}

val mayBeInitContainerBootstrapStep =
if (areAnyFilesNonContainerLocal(sparkJars ++ sparkFiles)) {
val initContainerConfigurationStepsOrchestrator =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable names in this code are very long in general; verbosity can both help and harm readability, in this case I don't think it's helping much. For example, orchestrator is just as good a name for this variable, since there's no other orchestrator being used here.

* Returns the complete ordered list of steps required to configure the init-container. This is
* only used when there are remote application dependencies to localize.
*/
private[spark] class InitContainerConfigurationStepsOrchestrator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, some type names are also really long. InitContainerOrchestrator sounds just as good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to InitContainerConfigOrchestrator and similarly DriverConfigurationStepsOrchestrator to DriverConfigOrchestrator.

/**
* Utility for fetching remote file dependencies.
*/
private[spark] trait FileFetcher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a trait for this? If it's for the tests, you can mock classes, you don't need an interface for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, removed the trait.

* with different configurations for different download sources, or using the same container to
* download everything at once.
*/
private[spark] class KubernetesSparkDependencyDownloadInitContainer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is so long it becomes confusing. How about KubernetesInitContainer? Or do you plan to have multiple different init containers for different things (ugh)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to SparkPodInitContainer.

s"Remote files download directory specified at $filesDownloadDir does not exist " +
"or is not a directory.")
}
waitForFutures(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a thread pool, but really you're just submitting two tasks. Why not one task for each file / jar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class actually handles more tasks in our fork. For example, is is also responsible for downloading from the resource staging server that hosts submission client dependencies. The resource staging server will be in a future PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but that's not my point. If you have 10 jars and 10 files to download, the current code will only download 2 at a time. If you submit each jar / file separately, you'll download as many as your thread pool allows, and you can make that configurable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, will address this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to create one task per file/jar to download. Regarding the type of thread pool, we are using a CachedThreadPool, which I think makes sense as it can be expected that the tasks are not long-lived.

}
}

private def waitForFutures(futures: Future[_]*) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: Unit = {

But really, there's a single caller, so just inline.

@liyinan926
Copy link
Contributor Author

@vanzin Addressed your comments so far. PTAL. Thanks!

@SparkQA
Copy link

SparkQA commented Dec 16, 2017

Test build #84985 has finished for PR 19954 at commit 46a8c99.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2017

Test build #85013 has finished for PR 19954 at commit 197882d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2017

Test build #85028 has finished for PR 19954 at commit e20e212.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2017

Test build #85029 has finished for PR 19954 at commit 340fa41.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ueshin
Copy link
Member

ueshin commented Dec 18, 2017

Jenkins, retest this please.

orchestrator: DriverConfigurationStepsOrchestrator,
types: Class[_ <: DriverConfigurationStep]*): Unit = {
orchestrator: DriverConfigOrchestrator,
types: Class[_ <: DriverConfigurationStep]*): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent

}
}

private object FirstTestInitContainerConfigurationStep$ extends InitContainerConfigurationStep {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need $ at the end of the object name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it shouldn't be there. Removed.

}
}

private object SecondTestInitContainerConfigurationStep$ extends InitContainerConfigurationStep {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

val sparkConf = new SparkConf(true)

if (!propertiesFile.isFile) {
throw new IllegalArgumentException(s"Server properties file given at" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove s.
nit: move a space at the beginning of the string of the next line to the end of the string of this line.

@SparkQA
Copy link

SparkQA commented Dec 18, 2017

Test build #85056 has finished for PR 19954 at commit 340fa41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liyinan926
Copy link
Contributor Author

@ueshin addressed your comments. PTAL. Thanks!

@SparkQA
Copy link

SparkQA commented Dec 18, 2017

Test build #85066 has finished for PR 19954 at commit 9ebfc73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 18, 2017

Test build #85069 has finished for PR 19954 at commit ddcb0f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@vanzin vanzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking at this again, but sorry, the long variable names make it really hard to read and, especially, understand this code. I'm always getting lost because all variable names look the same aside from some small difference at the end.

I also don't fully understand all the abstractions being created here. It seems there are three separate concepts (a "bootstrap", a "configuration step", and an "orchestrator") and they're not used consistently.

It seems for example that "orchestrators" are used for the drivers, but for executors there's different code that does some similar things but is instead baked into ExecutorPodFactory (which is another trait with a single implementation).

It would be nice to take a look at the abstraction here and make sure it makes sense and is being used consistently.

At the very least, a "README.md" file explaining how these things are tied together would help a lot those who did not write this code.

* This is separated out from the init-container steps API because this component can be reused to
* set up the init-container for executors as well.
*/
private[spark] trait InitContainerBootstrap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of all these traits that have a single implementation? That seems unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more idiomatic to mock a trait than a class and our unit tests always create mocks for every component that isn't the class under test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more idiomatic to mock a trait than a class

Why? You can mock classes just fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine to just use the class here, but some classes can't be mocked, such as final classes or classes with final methods. Having traits everywhere ensures that even if we change the classes down the road to have such characteristics, our tests won't break.

This also is not entirely without precedent. TaskScheduler is only implemented by TaskSchedulerImpl in the main scheduler code, as is TaskContext being extended only by TaskContextImpl. Putting a trait in front of an implementation communicates that it's expected for tests that dependency-inject instances of this to create stub implementations.

But we might be splitting hairs at this point, so using only the class could suffice until we run into problems from having done so.

extends InitContainerBootstrap {

override def bootstrapInitContainer(
podWithDetachedInitContainer: PodWithDetachedInitContainer): PodWithDetachedInitContainer = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented about this in my previous review, but could you try to use shorter variable names throughout the PR?

For example, here, just repeating the name of the already long type to name the variable doesn't really help with readability. Imagine if you have two of those, are you going to start adding counters to the already long name?

* configuration properties for the init-container.
*/
private[spark] class DriverInitContainerBootstrapStep(
initContainerConfigurationSteps: Seq[InitContainerConfigurationStep],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place where variable names are unnecessarily long and harm readability. initContainer is already implicit given the name of the class. I have to parse the variable name every time and ignore the prefix to know which one it is.

@mccheah
Copy link
Contributor

mccheah commented Dec 19, 2017

I also don't fully understand all the abstractions being created here. It seems there are three separate concepts (a "bootstrap", a "configuration step", and an "orchestrator") and they're not used consistently.

A configuration step is a logical unit that applies an additive transformation to the input. A steps orchestrator selects which configuration steps to apply based on the configuration of the application. A bootstrap is a component that can be shared by one or more configuration steps and the driver, since a lot of times the submission client and the driver will share code for configuring the driver and the executor pods, respectively. We discuss this a bit more in here: https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/architecture-docs/submission-client.md - but we don't cover the bootstrap abstraction. We're open to different representations and architectures as well.

@liyinan926
Copy link
Contributor Author

@mccheah @foxish I gave you push access to the fork I used for this PR. Feel free to push commits if you want. Please do let me know if you plan to address comments from @vanzin and make changes. Otherwise, I will address them later tonight.

@vanzin
Copy link
Contributor

vanzin commented Dec 19, 2017

We discuss this a bit more in here

It's nice that there is some documentation somewhere, but that documentation doesn't really seem to address my comments. For one example, it only explicitly talks about the driver - which sort of makes sense because the document is about submission. But why aren't orchestrators used when starting executors too? It seems there's similar code baked into another class instead.

What I'm asking is for this to be documented properly so that someone who didn't write the code has enough information to know that it's working as it should. Right now I don't see what some of these abstractions are for at all - for example, as far as I can see, the orchestrator can be replaced by a method call instead of being a completely separate type; it's not really abstracting anything. Look at where it's used:

    val configurationStepsOrchestrator = new DriverConfigOrchestrator(/* long list of arguments */)

    Utils.tryWithResource(SparkKubernetesClientFactory.createKubernetesClient(/* another long list of arguments */)0) { kubernetesClient =>
        val client = new Client(
          configurationStepsOrchestrator.getAllConfigurationSteps(),
          /* another long list of arguments */

So aside from not really being able to infer the structure of how these things work, the current abstraction seems to be creating a lot of constructors and methods with long lists of arguments, which is another thing that hurts the readability of the code.

@SparkQA
Copy link

SparkQA commented Dec 23, 2017

Test build #85324 has finished for PR 19954 at commit 785b90e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 23, 2017

Test build #85327 has finished for PR 19954 at commit f82c568.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 24, 2017

Test build #85352 has finished for PR 19954 at commit 9d9c841.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM only some nits. Also cc @ueshin to take another look.

.createOptional

val INIT_CONTAINER_MOUNT_TIMEOUT =
ConfigBuilder("spark.kubernetes.mountDependencies.timeout")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spark.kubernetes.initContainer.mountDependencies.timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the response regarding spark.kubernetes.mountDependencies.maxSimultaneousDownloads.

.createWithDefault(5)

val INIT_CONTAINER_MAX_THREAD_POOL_SIZE =
ConfigBuilder("spark.kubernetes.mountDependencies.maxSimultaneousDownloads")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spark.kubernetes.initContainer.mountDependencies.maxSimultaneousDownloads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current name is already pretty long. Adding initContainer makes it even longer without much added value.

ConfigBuilder("spark.kubernetes.mountDependencies.timeout")
.doc("Timeout before aborting the attempt to download and unpack dependencies from remote " +
"locations into the driver and executor pods.")
.timeConf(TimeUnit.MINUTES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not TimeUnit.SECONDS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

mountSecretsStep
}

private def areAnyFilesNonContainerLocal(files: Seq[String]): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:areAnyFilesNonContainerLocal -> existNonContainerLocalFiles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private def downloadFiles(
filesCommaSeparated: Option[String],
downloadDir: File,
errMessageOnDestinationNotADirectory: String): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: errMessageOnDestinationNotADirectory -> errMessage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

val initContainerConfigMapKey = sparkConf.get(INIT_CONTAINER_CONFIG_MAP_KEY_CONF)

if (initContainerConfigMap.isEmpty) {
logWarning("The executor's init-container config map was not specified. Executors will " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was not -> is not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if (initContainerConfigMapKey.isEmpty) {
logWarning("The executor's init-container config map key was not specified. Executors will " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was not -> is not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SparkQA
Copy link

SparkQA commented Dec 25, 2017

Test build #85384 has finished for PR 19954 at commit c51bc56.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for one comment.


# If this docker file is being used in the context of building your images from a Spark distribution, the docker build
# command should be invoked from the top level directory of the Spark distribution. E.g.:
# docker build -t spark-init:latest -f dockerfiles/init-container/Dockerfile .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes/dockerfiles/.. instead of dockerfiles/..

Btw, only nits but seems like paths here in Dockerfiles for driver/executor are wrong: kubernetes/dockerfiles/driver/Dockerfile and kubernetes/dockerfiles/executor/Dockerfile respectively?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SparkQA
Copy link

SparkQA commented Dec 26, 2017

Test build #85398 has finished for PR 19954 at commit 28343fb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liyinan926
Copy link
Contributor Author

@vanzin @jiangxb1987 @ueshin Can this PR be merged now?

@ueshin
Copy link
Member

ueshin commented Dec 28, 2017

Thanks! merging to master.

@asfgit asfgit closed this in 171f6dd Dec 28, 2017
ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 28, 2017
…rets

## What changes were proposed in this pull request?

This PR updates the Kubernetes documentation corresponding to the following features/changes in apache#19954.
* Ability to use remote dependencies through the init-container.
* Ability to mount user-specified secrets into the driver and executor pods.

vanzin jiangxb1987 foxish

Author: Yinan Li <[email protected]>

Closes apache#20059 from liyinan926/doc-update.
asfgit pushed a commit that referenced this pull request Jan 5, 2018
## What changes were proposed in this pull request?

We missed enabling `spark.files` and `spark.jars` in #19954. The result is that remote dependencies specified through `spark.files` or `spark.jars` are not included in the list of remote dependencies to be downloaded by the init-container. This PR fixes it.

## How was this patch tested?

Manual tests.

vanzin This replaces #20157.

foxish

Author: Yinan Li <[email protected]>

Closes #20160 from liyinan926/SPARK-22757.

(cherry picked from commit 6cff7d1)
Signed-off-by: Felix Cheung <[email protected]>
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Jan 5, 2018
## What changes were proposed in this pull request?

We missed enabling `spark.files` and `spark.jars` in apache#19954. The result is that remote dependencies specified through `spark.files` or `spark.jars` are not included in the list of remote dependencies to be downloaded by the init-container. This PR fixes it.

## How was this patch tested?

Manual tests.

vanzin This replaces apache#20157.

foxish

Author: Yinan Li <[email protected]>

Closes apache#20160 from liyinan926/SPARK-22757.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants