Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22648] [K8S] Spark on Kubernetes - Documentation #19946

Closed
wants to merge 13 commits into from

Conversation

foxish
Copy link
Contributor

@foxish foxish commented Dec 12, 2017

What changes were proposed in this pull request?

This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by #19717 and #19468 which have merged already.

How was this patch tested?
The script has been in use for releases on our fork. Rest is documentation.

cc @rxin @mateiz (shepherd)
k8s-big-data SIG members & contributors: @foxish @ash211 @mccheah @liyinan926 @erikerlandson @ssuchter @varunkatta @kimoonkim @tnachen @ifilonenko
reviewers: @vanzin @felixcheung @jiangxb1987 @mridulm

TODO:

@holdenk
Copy link
Contributor

holdenk commented Dec 12, 2017

Jenkins OK to test

done
}

function usage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So likely not for this PR, but I'm wondering for the future what you think about the idea of extending this to package up a usable Python env so we can solve the dependency management issue as well?

Really excited to see the progress :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do already do that in our fork - and when we get PySpark and R submitted (in Spark 2.4 hopefully), we will extend this script.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, only some nits.

push) push;;
*) usage;;
esac
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add extra empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* This will become a table of contents (this text will be scraped).
{:toc}

Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This features makes use of the new experimental native
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This features" -> "This feature"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. The mechanism by which spark-submit happens is as follows:

* Spark creates a spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
* The driver creates executors which are also Kubernetes pods and connects to them, and executes application code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"which are also Kubernetes pods" -> "which are also running within Kubernetes pods"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


To launch Spark Pi in cluster mode,

bin/spark-submit \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add {% highlight bash %} over the command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

master string with `k8s://` will cause the Spark application to launch on the Kubernetes cluster, with the API server
being contacted at `api_server_url`. If no HTTP protocol is specified in the URL, it defaults to `https`. For example,
setting the master to `k8s://example.com:443` is equivalent to setting it to `k8s://https://example.com:443`, but to
connect without TLS on a different port, the master would be set to `k8s://http://example.com:8080`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed something but where is the logic that handles connect without TLS you mentioned here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just be handled by the fabric8 client. We added this in fabric8io/kubernetes-client#652

### Namespaces

Kubernetes has the concept of [namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
Namespaces are a way to divide cluster resources between multiple users (via resource quota). Spark on Kubernetes can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a way" -> "ways"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Configuration

See the [configuration page](configuration.html) for information on Spark configurations. The following configuration is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" The following configuration is" -> " The following configurations are"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@foxish
Copy link
Contributor Author

foxish commented Dec 12, 2017

@jiangxb1987, thanks for the round of review - I updated some links and pointers to the docs as well. PTAL.

[kubectl](https://kubernetes.io/docs/user-guide/prereqs/). If you do not already have a working Kubernetes cluster,
you may setup a test cluster on your local machine using
[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
* We recommend using the latest releases of minikube be updated to the most recent version with the DNS addon enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence doesn't read right.

<img src="img/k8s-cluster-mode.png" title="Spark cluster components" alt="Spark cluster components" />
</p>

spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<code>spark-submit</code>

"The submission mechanism works as follows:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Spark creates a spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
* The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code.
* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists
logs and remains in "completed" state in the Kubernetes API till it's eventually garbage collected or manually cleaned up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/till/until

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Kubernetes has the concept of [namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
Namespaces are ways to divide cluster resources between multiple users (via resource quota). Spark on Kubernetes can
use namespaces to launch spark applications. This is through the `--conf spark.kubernetes.namespace` argument to spark-submit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of mentioning it as a spark-submit argument, mention it as a configuration. Users have more than one way to set configuration in their apps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

service account that has the right role granted. Spark on Kubernetes supports specifying a custom service account to
be used by the driver pod through the configuration property
`spark.kubernetes.authenticate.driver.serviceAccountName=<service account name>`. For example to make the driver pod
to use the `spark` service account, a user simply adds the following option to the `spark-submit` command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...to make the driver pod use..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

</td>
</tr>
<tr>
<td><code>spark.kubernetes.executor.docker.image</code></td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's required.

</tr>
<tr>
<td><code>spark.kubernetes.allocation.batch.delay</code></td>
<td><code>1</code></td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, isn't this a time conf? If so, specify the value with the unit and don't mention units in the description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an int conf at the moment - but good point about making it a time config. I think it makes sense to change that. Will draft up a PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix in #20032

</td>
</tr>
<tr>
<td><code>spark.kubernetes.authenticate.submission.oauthToken</code></td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not yet supported, I recommend adding support for reading this from an env variable; command line args are visible by anyone using ps.

Copy link
Contributor Author

@foxish foxish Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I think the fabric8 client has a way of consuming that from env-vars, but we need to check that it works.
cc @mccheah

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filing an issue to unblock. I think we can work around this for now - but will be important to document for the future.

@@ -155,6 +165,12 @@ The master URL passed to Spark can be in one of the following formats:
<code>client</code> or <code>cluster</code> mode depending on the value of <code>--deploy-mode</code>.
The cluster location will be found based on the <code>HADOOP_CONF_DIR</code> or <code>YARN_CONF_DIR</code> variable.
</td></tr>
<tr><td> <code>k8s://HOST:PORT</code> </td><td> Connect to a <a href="running-on-kubernetes.html"> Kubernetes </a> cluster in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove spaces around "Kubernetes".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<td><code>spark.kubernetes.driver.secrets.[SecretName]</code></td>
<td>(none)</td>
<td>
Mounts the Kubernetes secret named <code>SecretName</code> onto the path specified by the value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "Kubernetes secret"? Do you need a link to explain it somewhere? When would you use this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ueshin
Copy link
Member

ueshin commented Dec 13, 2017

ok to test

docs/index.md Outdated
@@ -112,7 +113,7 @@ options for deployment:
* [Mesos](running-on-mesos.html): deploy a private cluster using
[Apache Mesos](http://mesos.apache.org)
* [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen (YARN)
* [Kubernetes (experimental)](https://github.com/apache-spark-on-k8s/spark): deploy Spark on top of Kubernetes
* [Kubernetes (experimental)](running-on-kubernetes.html): deploy Spark on top of Kubernetes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove (experimental) here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Kubernetes master is running at http://127.0.0.1:6443
```

In the above example, the specific Kubernetes cluster can be used with spark submit by specifying
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark-submit instead of spark submit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

The local proxy can be started by:

```bash
kubectl proxy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove extra space at the beginning of this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### Accessing Logs

Logs can be accessed using the kubernetes API and the `kubectl` CLI. When a Spark application is running, it's possible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Kubernetes API instead of kubernetes API (using a capital letter K)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


There may be several kinds of failures. If the Kubernetes API server rejects the request made from spark-submit, or the
connection is refused for a different reason, the submission logic should indicate the error encountered. However, if there
are errors during the running of the application, often, the best way to investigate may be through the kubernetes CLI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Kubernetes CLI instead of kubernetes CLI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Status and logs of failed executor pods can be checked in similar ways. Finally, deleting the driver pod will clean up the entire spark
application, includling all executors, associated service, etc. The driver pod can be thought of as the Kubernetes representation of
the spark application.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Spark application instead of spark application?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Kubernetes has the concept of [namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
Namespaces are ways to divide cluster resources between multiple users (via resource quota). Spark on Kubernetes can
use namespaces to launch spark applications. This is through the `--conf spark.kubernetes.namespace` argument to spark-submit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

<td><code>spark.kubernetes.authenticate.driver.oauthToken</code></td>
<td>(none)</td>
<td>
OAuth token to use when authenticating against the against the Kubernetes API server from the driver pod when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

against the against -> against?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84850 has finished for PR 19946 at commit 5f24de1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor comments, overall looks pretty good - thanks for the PR !

{:toc}

Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This feature makes use of the new experimental native
Kubernetes scheduler that has been added to Spark.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove experimental ? I think there are other references (a grep should catch them all) and we can remove them all now ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The other references have been removed. Couldn't find any others.

* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists
logs and remains in "completed" state in the Kubernetes API till it's eventually garbage collected or manually cleaned up.

Note that in the completed state, the driver pod does *not* use any computational or memory resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - what about disk usage ? What is it proportional to ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some edit probably buried this query ... would be great if you can opine @foxish, thanks.

decisions for driver and executor pods using advanced primitives like
[node selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
and [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
in a future release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside - should have asked about it in scheduler review - how does kubernetes handle prioritization of resources ? In particular, if it is pre-empting containers, is there a relative ordering ?
Will driver will always be the last to go ?
If this has to be user specified, do we need to add doc link to it here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today, preemption is at random.
priority and preemption are in alpha as of now. As soon as they go to beta (in the Spark 2.4 timeframe), we'll add the required pieces to make it honor the rule as you said - driver is the last to go, etc.

--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--conf spark.kubernetes.namespace=default \
Copy link
Contributor

@mridulm mridulm Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this default value from example ? (or set to non-default value if we want to illustrate its use)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

--class org.apache.spark.examples.SparkPi \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--conf spark.kubernetes.namespace=default \
--conf spark.executor.instances=5 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--num-executors instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is specified currently as a YARN-only option in spark-submit. Does it make sense to move it out to suit both K8s and YARN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I did not realize mesos did not honour it and it was yarn only option !
It might make sense for k8s also to support it too (since it does not right now, my proposal for doc change would be incorrect) .. what do you think @foxish ?

$ bin/spark-submit \
--deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorder so that master is above deploy-mode ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--conf spark.kubernetes.namespace=default \
--conf spark.executor.instances=5 \
--conf spark.app.name=spark-pi \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--name instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

--conf spark.app.name=spark-pi \
--conf spark.kubernetes.driver.docker.image=<driver-image> \
--conf spark.kubernetes.executor.docker.image=<executor-image> \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the filename ?

I want to make sure it is intuitive and easy for users looking at examples - and it looks familiar with how spark is typically invoked. (Though the scheme is referenced here https://github.com/apache/spark/pull/19946/files#diff-b5527f236b253e0d9f5db5164bdb43e9R108 : I am assuming avoiding it defaults to local)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I just wanted to show that it had to be a path. The protocol local:/// is necessary to point at container-local files, as opposed to local file paths on the submitting user's machine which will both be handled separately. Changed it to local:///path/to/examples.jar. Is that better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify - local:///my/path/jar implies local to the docker container ? and /my/path/jar points to submitters machine ?

<td>
In cluster mode, whether to wait for the application to finish before exiting the launcher process. When changed to
false, the launcher has a "fire-and-forget" behavior when launching the Spark job.
</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When false, will terminating spark submit cause spark application to also terminate ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it wouldn't. It would simply stop spark-submit from continuing to watch for status from the cluster. The application runs the same in either case.

@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84865 has finished for PR 19946 at commit 109069f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84861 has finished for PR 19946 at commit f618e8b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 13, 2017

Test build #84864 has finished for PR 19946 at commit 8c656c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by the
to be runnable, use `./dev/make-distribution.sh` in the project root directory. It can be configured
with Maven profile settings and so on like the direct Maven build. Example:

./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
./dev/make-distribution.sh --name custom-spark --pip --r --tgz -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use k8s? I kept bringing this up and that's because I can never spell Kubernetes properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's k8s:// in the URL scheme we use and the package names are also k8s - so, users should never have to type the name.
This is one of the last few holdouts. I'd say that it's consistent here, with the use of other cluster manager names in full in their maven projects. I can change it here if you feel strongly about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually second @rxin, I do get the spelling wrong at times :-)

Copy link
Contributor Author

@foxish foxish Dec 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you both also referring to the config options etc, which are still spark.kubernetes.*, or just the maven build target? If it's everything, it would be a fairly large change, doable certainly - but confirming how far the rename should go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @mridulm, @rxin - can you please confirm the scope of the renaming you were referring to here? Is it just the maven target? Changing all the config options etc would be a considerably large change at this point. Also, a point that was brought up today was - while k8s is common shorthand, it's not universal as is the full name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed https://issues.apache.org/jira/browse/SPARK-22853 to discuss this and unblock this PR. We should be able to reach consensus by release time. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I don't think you need to block this pr with this.

@vanzin
Copy link
Contributor

vanzin commented Dec 14, 2017

Seems like the PR builders lost their state?

@vanzin
Copy link
Contributor

vanzin commented Dec 14, 2017

ok to test

## Docker Images

Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a container runtime environment that Kubernetes supports

Does the current code support these other container runtimes, or just docker? From my memory (and the propery names) it seems that only docker is supported?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the images should be runnable by other container runtimes (cri-o, rkt, etc), although I'm not aware of anybody testing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin, we don't make any runtime level assumptions in Spark code. The K8s abstraction layer CRI should in theory allow using a differnet runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is more about the properties being called "docker" and whether that means only docker images are supported. If you can use any image supported by the k8s cluster, than pehaps the properties should be renamed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point - Although there is flexibility in theory, as of now, it's safe to assume that most people are running docker containers when using k8s - making the name docker much more intuitive. If the other runtimes do see traction in future (and we do some testing around them), we can rename to container.image instead of docker.image. As of now, I can make the documentation clearer that spark on k8s only supports docker images. Sound like a reasonable thing to do here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if k8s can tell the container runtime from the image name? If it can, we can use container.image here, but otherwise, I guess we need another config like xxx.container to tell the container runtime and xxx.${container}.image to specify the image, e.g. xxx.container=docker and xxx.docker.image=something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @ueshin's comment; if the same property supports more than docker images, it should be renamed to use a more generic name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think container.image should be fine then - will send a PR changing that.
I don't think there are plans to support choosing the runtime on the fly, or letting the CRI abstraction leak into the application layer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the container run-time to be transparent to the particular image, and very definitely transparent to spark. Using container.image makes sense from either standpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19995 PTAL

./sbin/build-push-docker-images.sh -r <repo> -t my-tag build
./sbin/build-push-docker-images.sh -r <repo> -t my-tag push

Docker files are under the `dockerfiles/` and can be customized further before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the dockerfiles/ directory in the Spark distribution archive"?

(Assuming that's true. Need to clarify where to find dockerfiles/.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not yet true - it's a TODO item on this PR. Will clarify after we get that change in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been addressed and the script has been fixed now.


## Dependency Management

If your application's dependencies are all hosted in remote locations like HDFS or http servers, they may be referred to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"HTTP servers"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<td>
Docker image to use for the executors. Specify this using the standard
<a href="https://docs.docker.com/engine/reference/commandline/tag/">Docker tag</a> format.
This configuration is required and must be provided by the user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have this default to the same image used for the driver? Or are they fundamentally different in some way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They use different commands to run driver vs executor, and also have some differing environment expectations, so defaulting them to the same would not work

@@ -18,7 +18,8 @@ Spark application's configuration (driver, executors, and the AM when running in

There are two deploy modes that can be used to launch Spark applications on YARN. In `cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

Unlike [Spark standalone](spark-standalone.html) and [Mesos](running-on-mesos.html) modes, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn`.
Unlike [Spark standalone](spark-standalone.html), [Mesos](running-on-mesos.html) and [Kubernetes](running-on-kubernetes.html) modes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just say "other cluster managers supported by Spark" at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Dec 15, 2017

Test build #84929 has finished for PR 19946 at commit 109069f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 15, 2017

Test build #84934 has finished for PR 19946 at commit 873f04d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

pending updates from #19995 and #20007

asfgit pushed a commit that referenced this pull request Dec 18, 2017
…ith container

## What changes were proposed in this pull request?
Changes discussed in #19946 (comment)
docker -> container, since with CRI, we are not limited to running only docker images.

## How was this patch tested?
Manual testing

Author: foxish <[email protected]>

Closes #19995 from foxish/make-docker-container.
@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85167 has finished for PR 19946 at commit 74ac5c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85178 has finished for PR 19946 at commit d235847.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 20, 2017

Test build #85179 has finished for PR 19946 at commit 702162b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

$ bin/spark-submit \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster \
--name spark-pi \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to explicitly call out that app names in k8s mode cannot have spaces and special characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will update with caveat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vanzin
Copy link
Contributor

vanzin commented Dec 21, 2017

BTW can I ask you guys to start using "K8S" in the PR titles? These are not really scheduler changes.

ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 21, 2017
…ay to take time instead of int

## What changes were proposed in this pull request?

Fixing configuration that was taking an int which should take time. Discussion in apache#19946 (comment)
Made the granularity milliseconds as opposed to seconds since there's a use-case for sub-second reactions to scale-up rapidly especially with dynamic allocation.

## How was this patch tested?

TODO: manual run of integration tests against this PR.
PTAL

cc/ mccheah liyinan926 kimoonkim vanzin mridulm jiangxb1987 ueshin

Author: foxish <[email protected]>

Closes apache#20032 from foxish/fix-time-conf.
@foxish
Copy link
Contributor Author

foxish commented Dec 21, 2017 via email

@foxish foxish changed the title [SPARK-22648] [Scheduler] Spark on Kubernetes - Documentation [SPARK-22648] [K8S] Spark on Kubernetes - Documentation Dec 21, 2017
@foxish
Copy link
Contributor Author

foxish commented Dec 21, 2017

Addressed comments. This PR should be ready to go, with one separate issue of renaming to be discussed in https://issues.apache.org/jira/browse/SPARK-22853.

PTAL @vanzin @mridulm @ueshin @jiangxb1987 @felixcheung @rxin
This covers 90% of our documentation for Spark 2.3 (pending the documentation needed for #19954).

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85222 has finished for PR 19946 at commit 8726154.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -18,7 +18,8 @@ Spark application's configuration (driver, executors, and the AM when running in

There are two deploy modes that can be used to launch Spark applications on YARN. In `cluster` mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In `client` mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

Unlike [Spark standalone](spark-standalone.html) and [Mesos](running-on-mesos.html) modes, in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn`.
Unlike other cluster managers supported by Spark
in which the master's address is specified in the `--master` parameter, in YARN mode the ResourceManager's address is picked up from the Hadoop configuration. Thus, the `--master` parameter is `yarn`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why start a new line here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SparkQA
Copy link

SparkQA commented Dec 21, 2017

Test build #85278 has finished for PR 19946 at commit 374ddc8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@foxish
Copy link
Contributor Author

foxish commented Dec 21, 2017

Ping - any other comments?

@jiangxb1987
Copy link
Contributor

LGTM

@rxin
Copy link
Contributor

rxin commented Dec 22, 2017

Merging in master.

@asfgit asfgit closed this in 7ab165b Dec 22, 2017
foxish added a commit to foxish/spark that referenced this pull request Dec 22, 2017
The path was recently changed in apache#19946, but the dockerfile was not updated.
asfgit pushed a commit that referenced this pull request Dec 22, 2017
## What changes were proposed in this pull request?

The path was recently changed in #19946, but the dockerfile was not updated.
This is a trivial 1 line fix.

## How was this patch tested?

`./sbin/build-push-docker-images.sh -r spark-repo -t latest build`

cc/ vanzin mridulm rxin jiangxb1987 liyinan926

Author: Anirudh Ramanathan <[email protected]>
Author: foxish <[email protected]>

Closes #20051 from foxish/patch-1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.