-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22757][Kubernetes] Enable use of remote dependencies (http, s3, gcs, etc.) in Kubernetes mode #19954
Conversation
Test build #84789 has finished for PR 19954 at commit
|
Test build #84793 has finished for PR 19954 at commit
|
Test build #84795 has finished for PR 19954 at commit
|
Test build #84798 has finished for PR 19954 at commit
|
Test build #84838 has finished for PR 19954 at commit
|
@liyinan926, can you update the PR title please? |
@rxin @mateiz @vanzin @mridulm @jiangxb1987 @felixcheung, if you can help take a look at this, it would add a lot of value to Kubernetes mode in 2.3. Granted we already have a way to supply dependencies within docker images and that works as expected (with the minor fix in #19972), this would enable submitting from http and from object storage. That's a big value-add for people and enables them to have standard images and not bake a new one for each spark application. If we can get this review going before everyone leaves for vacation (and even if it gets picked up after we all return), that's a big step for us in the community. |
+1 to @foxish comment above. For accessing data from http, s3, etc this will be a huge reduction in barrier to entry. The difference between having to spin a custom docker image and just using it out of the box. |
+1. This PR closes a big feature gap of the Kubernetes scheduler backend compared with other backends. This makes the Kubernetes backend much more usable. |
This is promising ! I will take look at it over the weekend, thanks for the great work :-) |
@mridulm We did have some discussions on resource localization at some point. This is a powerful mechanism when coupled with a resource staging server (in the future). There is a cost per-pod when using localization in this manner, but there is an alternative - using a docker image baked with dependencies - k8s does node-level caching of the downloaded images, so, one would pay the cost only once per node. |
@@ -209,9 +214,33 @@ private[spark] class ExecutorPodFactoryImpl(sparkConf: SparkConf) | |||
.build() | |||
}.getOrElse(executorContainer) | |||
|
|||
new PodBuilder(executorPod) | |||
val (withMaybeSecretsMountedPod, withMaybeSecretsMountedContainer) = | |||
mountSecretsBootstrap.map {bootstrap => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a space before bootstrap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
initContainer, | ||
mainContainerWithMountedFiles) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove an extra line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.build()) | ||
|
||
val initContainer = new ContainerBuilder(podWithDetachedInitContainer.initContainer) | ||
.withName(s"spark-init") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
remoteJars, | ||
jarsDownloadDir, | ||
s"Remote jars download directory specified at $jarsDownloadDir does not exist " + | ||
s"or is not a directory.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove s
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// dependencies. The executor's main container and its init-container share the secrets | ||
// because the init-container is sort of an implementation details and this sharing | ||
// avoids introducing a dedicated configuration property just for the init-container. | ||
val mayBeInitContainerMountSecretsBootstrap = if (maybeInitContainerBootstrap.nonEmpty && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this and mayBeMountSecretBootstrap
above? Looks like mounting the same paths twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mayBeMountSecretBootstrap
is for the executor main container, whereas mayBeInitContainerMountSecretsBootstrap
is for the executor init container, which is run before the main container starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Thanks!
Test build #84966 has finished for PR 19954 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty large change, and I need more time to look at everything. I left some comments, mostly stylistic, which also should apply to the code I haven't commented on yet.
In general, reading the code becomes a little tiring because it's hard to mentally keep track of all the long names being used.
.stringConf | ||
.createWithDefault("/var/spark-data/spark-files") | ||
|
||
val INIT_CONTAINER_DOCKER_IMAGE = | ||
ConfigBuilder("spark.kubernetes.initContainer.docker.image") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mirroring the discussion in the other PR, are these really restricted to docker? Is it a required config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of it is docker specific, can be container
everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a required config?
No, as one may forgo the init container if they're building the deps into the docker image itself and supplying it via local:///
paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to spark.kubernetes.initContainer.image
.
} | ||
val initContainerCustomEnvVars = sparkConf.getAllWithPrefix(initContainerCustomEnvVarKeyPrefix) | ||
.toSeq | ||
.map(env => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map { env =>
|
||
override def mountSecrets(pod: Pod, container: Container): (Pod, Container) = { | ||
var podBuilder = new PodBuilder(pod) | ||
secretNamesToMountPaths.keys.foreach(name => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.foreach { name =>
.endSpec()) | ||
|
||
var containerBuilder = new ContainerBuilder(container) | ||
secretNamesToMountPaths.foreach(namePath => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.foreach { namePath =>
. You get the idea. Please fix all of these.
@@ -116,10 +122,53 @@ private[spark] class DriverConfigurationStepsOrchestrator( | |||
None | |||
} | |||
|
|||
val mayBeInitContainerBootstrapStep = | |||
if (areAnyFilesNonContainerLocal(sparkJars ++ sparkFiles)) { | |||
val initContainerConfigurationStepsOrchestrator = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable names in this code are very long in general; verbosity can both help and harm readability, in this case I don't think it's helping much. For example, orchestrator
is just as good a name for this variable, since there's no other orchestrator being used here.
* Returns the complete ordered list of steps required to configure the init-container. This is | ||
* only used when there are remote application dependencies to localize. | ||
*/ | ||
private[spark] class InitContainerConfigurationStepsOrchestrator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, some type names are also really long. InitContainerOrchestrator
sounds just as good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to InitContainerConfigOrchestrator
and similarly DriverConfigurationStepsOrchestrator
to DriverConfigOrchestrator
.
/** | ||
* Utility for fetching remote file dependencies. | ||
*/ | ||
private[spark] trait FileFetcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a trait for this? If it's for the tests, you can mock classes, you don't need an interface for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, removed the trait.
* with different configurations for different download sources, or using the same container to | ||
* download everything at once. | ||
*/ | ||
private[spark] class KubernetesSparkDependencyDownloadInitContainer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is so long it becomes confusing. How about KubernetesInitContainer
? Or do you plan to have multiple different init containers for different things (ugh)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to SparkPodInitContainer
.
s"Remote files download directory specified at $filesDownloadDir does not exist " + | ||
"or is not a directory.") | ||
} | ||
waitForFutures( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a thread pool, but really you're just submitting two tasks. Why not one task for each file / jar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class actually handles more tasks in our fork. For example, is is also responsible for downloading from the resource staging server that hosts submission client dependencies. The resource staging server will be in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but that's not my point. If you have 10 jars and 10 files to download, the current code will only download 2 at a time. If you submit each jar / file separately, you'll download as many as your thread pool allows, and you can make that configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, will address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to create one task per file/jar to download. Regarding the type of thread pool, we are using a CachedThreadPool
, which I think makes sense as it can be expected that the tasks are not long-lived.
} | ||
} | ||
|
||
private def waitForFutures(futures: Future[_]*) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: Unit = {
But really, there's a single caller, so just inline.
@vanzin Addressed your comments so far. PTAL. Thanks! |
Test build #84985 has finished for PR 19954 at commit
|
Test build #85013 has finished for PR 19954 at commit
|
Test build #85028 has finished for PR 19954 at commit
|
Test build #85029 has finished for PR 19954 at commit
|
Jenkins, retest this please. |
orchestrator: DriverConfigurationStepsOrchestrator, | ||
types: Class[_ <: DriverConfigurationStep]*): Unit = { | ||
orchestrator: DriverConfigOrchestrator, | ||
types: Class[_ <: DriverConfigurationStep]*): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent
} | ||
} | ||
|
||
private object FirstTestInitContainerConfigurationStep$ extends InitContainerConfigurationStep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need $
at the end of the object name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it shouldn't be there. Removed.
} | ||
} | ||
|
||
private object SecondTestInitContainerConfigurationStep$ extends InitContainerConfigurationStep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
val sparkConf = new SparkConf(true) | ||
|
||
if (!propertiesFile.isFile) { | ||
throw new IllegalArgumentException(s"Server properties file given at" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove s
.
nit: move a space at the beginning of the string of the next line to the end of the string of this line.
Test build #85056 has finished for PR 19954 at commit
|
@ueshin addressed your comments. PTAL. Thanks! |
Test build #85066 has finished for PR 19954 at commit
|
Test build #85069 has finished for PR 19954 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started looking at this again, but sorry, the long variable names make it really hard to read and, especially, understand this code. I'm always getting lost because all variable names look the same aside from some small difference at the end.
I also don't fully understand all the abstractions being created here. It seems there are three separate concepts (a "bootstrap", a "configuration step", and an "orchestrator") and they're not used consistently.
It seems for example that "orchestrators" are used for the drivers, but for executors there's different code that does some similar things but is instead baked into ExecutorPodFactory
(which is another trait with a single implementation).
It would be nice to take a look at the abstraction here and make sure it makes sense and is being used consistently.
At the very least, a "README.md" file explaining how these things are tied together would help a lot those who did not write this code.
* This is separated out from the init-container steps API because this component can be reused to | ||
* set up the init-container for executors as well. | ||
*/ | ||
private[spark] trait InitContainerBootstrap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of all these traits that have a single implementation? That seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more idiomatic to mock a trait
than a class
and our unit tests always create mocks for every component that isn't the class under test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more idiomatic to mock a trait than a class
Why? You can mock classes just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably fine to just use the class here, but some classes can't be mocked, such as final classes or classes with final methods. Having traits everywhere ensures that even if we change the classes down the road to have such characteristics, our tests won't break.
This also is not entirely without precedent. TaskScheduler
is only implemented by TaskSchedulerImpl
in the main scheduler code, as is TaskContext
being extended only by TaskContextImpl
. Putting a trait in front of an implementation communicates that it's expected for tests that dependency-inject instances of this to create stub implementations.
But we might be splitting hairs at this point, so using only the class could suffice until we run into problems from having done so.
extends InitContainerBootstrap { | ||
|
||
override def bootstrapInitContainer( | ||
podWithDetachedInitContainer: PodWithDetachedInitContainer): PodWithDetachedInitContainer = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented about this in my previous review, but could you try to use shorter variable names throughout the PR?
For example, here, just repeating the name of the already long type to name the variable doesn't really help with readability. Imagine if you have two of those, are you going to start adding counters to the already long name?
* configuration properties for the init-container. | ||
*/ | ||
private[spark] class DriverInitContainerBootstrapStep( | ||
initContainerConfigurationSteps: Seq[InitContainerConfigurationStep], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place where variable names are unnecessarily long and harm readability. initContainer
is already implicit given the name of the class. I have to parse the variable name every time and ignore the prefix to know which one it is.
A configuration step is a logical unit that applies an additive transformation to the input. A steps orchestrator selects which configuration steps to apply based on the configuration of the application. A bootstrap is a component that can be shared by one or more configuration steps and the driver, since a lot of times the submission client and the driver will share code for configuring the driver and the executor pods, respectively. We discuss this a bit more in here: https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/resource-managers/kubernetes/architecture-docs/submission-client.md - but we don't cover the bootstrap abstraction. We're open to different representations and architectures as well. |
It's nice that there is some documentation somewhere, but that documentation doesn't really seem to address my comments. For one example, it only explicitly talks about the driver - which sort of makes sense because the document is about submission. But why aren't orchestrators used when starting executors too? It seems there's similar code baked into another class instead. What I'm asking is for this to be documented properly so that someone who didn't write the code has enough information to know that it's working as it should. Right now I don't see what some of these abstractions are for at all - for example, as far as I can see, the orchestrator can be replaced by a method call instead of being a completely separate type; it's not really abstracting anything. Look at where it's used:
So aside from not really being able to infer the structure of how these things work, the current abstraction seems to be creating a lot of constructors and methods with long lists of arguments, which is another thing that hurts the readability of the code. |
Test build #85324 has finished for PR 19954 at commit
|
Test build #85327 has finished for PR 19954 at commit
|
Test build #85352 has finished for PR 19954 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM only some nits. Also cc @ueshin to take another look.
.createOptional | ||
|
||
val INIT_CONTAINER_MOUNT_TIMEOUT = | ||
ConfigBuilder("spark.kubernetes.mountDependencies.timeout") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spark.kubernetes.initContainer.mountDependencies.timeout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the response regarding spark.kubernetes.mountDependencies.maxSimultaneousDownloads
.
.createWithDefault(5) | ||
|
||
val INIT_CONTAINER_MAX_THREAD_POOL_SIZE = | ||
ConfigBuilder("spark.kubernetes.mountDependencies.maxSimultaneousDownloads") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spark.kubernetes.initContainer.mountDependencies.maxSimultaneousDownloads
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current name is already pretty long. Adding initContainer
makes it even longer without much added value.
ConfigBuilder("spark.kubernetes.mountDependencies.timeout") | ||
.doc("Timeout before aborting the attempt to download and unpack dependencies from remote " + | ||
"locations into the driver and executor pods.") | ||
.timeConf(TimeUnit.MINUTES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not TimeUnit.SECONDS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
mountSecretsStep | ||
} | ||
|
||
private def areAnyFilesNonContainerLocal(files: Seq[String]): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:areAnyFilesNonContainerLocal
-> existNonContainerLocalFiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
private def downloadFiles( | ||
filesCommaSeparated: Option[String], | ||
downloadDir: File, | ||
errMessageOnDestinationNotADirectory: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: errMessageOnDestinationNotADirectory
-> errMessage
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
val initContainerConfigMapKey = sparkConf.get(INIT_CONTAINER_CONFIG_MAP_KEY_CONF) | ||
|
||
if (initContainerConfigMap.isEmpty) { | ||
logWarning("The executor's init-container config map was not specified. Executors will " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: was not
-> is not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
if (initContainerConfigMapKey.isEmpty) { | ||
logWarning("The executor's init-container config map key was not specified. Executors will " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: was not
-> is not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Test build #85384 has finished for PR 19954 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one comment.
|
||
# If this docker file is being used in the context of building your images from a Spark distribution, the docker build | ||
# command should be invoked from the top level directory of the Spark distribution. E.g.: | ||
# docker build -t spark-init:latest -f dockerfiles/init-container/Dockerfile . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubernetes/dockerfiles/..
instead of dockerfiles/..
Btw, only nits but seems like paths here in Dockerfile
s for driver/executor are wrong: kubernetes/dockerfiles/driver/Dockerfile
and kubernetes/dockerfiles/executor/Dockerfile
respectively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Test build #85398 has finished for PR 19954 at commit
|
@vanzin @jiangxb1987 @ueshin Can this PR be merged now? |
Thanks! merging to master. |
…rets ## What changes were proposed in this pull request? This PR updates the Kubernetes documentation corresponding to the following features/changes in apache#19954. * Ability to use remote dependencies through the init-container. * Ability to mount user-specified secrets into the driver and executor pods. vanzin jiangxb1987 foxish Author: Yinan Li <[email protected]> Closes apache#20059 from liyinan926/doc-update.
## What changes were proposed in this pull request? We missed enabling `spark.files` and `spark.jars` in #19954. The result is that remote dependencies specified through `spark.files` or `spark.jars` are not included in the list of remote dependencies to be downloaded by the init-container. This PR fixes it. ## How was this patch tested? Manual tests. vanzin This replaces #20157. foxish Author: Yinan Li <[email protected]> Closes #20160 from liyinan926/SPARK-22757. (cherry picked from commit 6cff7d1) Signed-off-by: Felix Cheung <[email protected]>
## What changes were proposed in this pull request? We missed enabling `spark.files` and `spark.jars` in apache#19954. The result is that remote dependencies specified through `spark.files` or `spark.jars` are not included in the list of remote dependencies to be downloaded by the init-container. This PR fixes it. ## How was this patch tested? Manual tests. vanzin This replaces apache#20157. foxish Author: Yinan Li <[email protected]> Closes apache#20160 from liyinan926/SPARK-22757.
What changes were proposed in this pull request?
This PR expands the Kubernetes mode to be able to use remote dependencies on http/https endpoints, GCS, S3, etc. It adds steps for configuring and appending the Kubernetes init-container into the driver and executor pods for downloading remote dependencies.
Init-containers, as the name suggests, are containers that are run to completion before the main containers start, and are often used to perform initialization tasks prior to starting the main containers. We use init-containers to localize remote application dependencies before the driver/executors start running. The code that the init-container runs is also included. This PR also adds a step to the driver and executors for mounting user-specified secrets that may store credentials for accessing data storage, e.g., S3 and Google Cloud Storage (GCS), into the driver and executors.
How was this patch tested?
./build/mvn -Pkubernetes clean package
succeeded.cc @rxin @felixcheung @mateiz (shepherd)
k8s-big-data SIG members & contributors: @mccheah @foxish @ash211 @ssuchter @varunkatta @kimoonkim @erikerlandson @tnachen @ifilonenko @liyinan926
reviewers: @vanzin @felixcheung @jiangxb1987 @mridulm