Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run examples on k8s readme #2163

Merged
merged 5 commits into from
Apr 13, 2020
Merged

Conversation

Le-Zheng
Copy link
Contributor

@Le-Zheng Le-Zheng commented Apr 8, 2020

No description provided.

@Le-Zheng Le-Zheng requested a review from glorysdj April 8, 2020 07:31
Pull an Analytics Zoo k8s image:

```bash
sudo docker pull intelanalytics/hyper-zoo:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to also update hyperzoo:latest image

-e http_proxy=http://your-proxy-host:your-proxy-port \
-e https_proxy=https://your-proxy-host:your-proxy-port \
intelanalytics/hyper-zoo:latest bash
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should also list all the runtime parameters that we supported:

ENV RUNTIME_SPARK_MASTER local[4]
ENV RUNTIME_K8S_SERVICE_ACCOUNT spark
ENV RUNTIME_K8S_SPARK_IMAGE intelanalytics/hyper-zoo:0.8.0-SNAPSHOT-2.4.3
ENV RUNTIME_DRIVER_HOST localhost
ENV RUNTIME_DRIVER_PORT 54321
ENV RUNTIME_EXECUTOR_CORES 4
ENV RUNTIME_EXECUTOR_MEMORY 20g
ENV RUNTIME_EXECUTOR_INSTANCES 1
ENV RUNTIME_TOTAL_EXECUTOR_CORES 4
ENV RUNTIME_DRIVER_CORES 4
ENV RUNTIME_DRIVER_MEMORY 10g
ENV RUNTIME_PERSISTENT_VOLUME_CLAIM myvolumeclaim

and also told user to -v kube configs by
-v /etc/kubernetes:/etc/kubernetes
-v /root/.kube:/root/.kube \

@glorysdj
Copy link
Contributor

glorysdj commented Apr 9, 2020

we should tell user that 2 different kinds of contianers can be used:

  1. client container: user can submit zoo jobs from here, since it contains all the required env and libs except hadoop/k8s configs
    2.executor container, which is scheduled by k8s at runtime

@glorysdj
Copy link
Contributor

glorysdj commented Apr 9, 2020

we should also tell user to prepare a PERSISTENT_VOLUME_CLAIM on k8s to hold all the data

-e https_proxy=https://your-proxy-host:your-proxy-port \
-e RUNTIME_SPARK_MASTER=k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
-e RUNTIME_K8S_SERVICE_ACCOUNT=account \
-e RUNTIME_K8S_SPARK_IMAGE=10.239.47.32/intelanalytics/hyper-zoo:0.8.0-SNAPSHOT-2.4.3-0.17 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10.239.47.32/ is not needed

-e RUNTIME_SPARK_MASTER=k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
-e RUNTIME_K8S_SERVICE_ACCOUNT=account \
-e RUNTIME_K8S_SPARK_IMAGE=10.239.47.32/intelanalytics/hyper-zoo:0.8.0-SNAPSHOT-2.4.3-0.17 \
-e RUNTIME_PERSISTENT_VOLUME_CLAIM=nfsvolumeclaim \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nfsvolumeclaim --> myvolumeclaim?

-e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
-e RUNTIME_DRIVER_CORES=4 \
-e RUNTIME_DRIVER_MEMORY=10g \
10.239.47.32/intelanalytics/hyper-zoo:0.8.0-SNAPSHOT-2.4.3-0.17 bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10.239.47.32/ can be deleted

- RUNTIME_K8S_SERVICE_ACCOUNT is service account for driver pod. Please refer to k8s [RBAC](https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac).
- RUNTIME_K8S_SPARK_IMAGE is the k8s image.
- RUNTIME_PERSISTENT_VOLUME_CLAIM is to specify volume mount. We are supposed to use volume mount to store or receive data. Get ready with [Kubernetes Volumes](https://spark.apache.org/docs/latest/running-on-kubernetes.html#volume-mounts).
- RUNTIME_DRIVER_HOST is to specify driver localhost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add (only required when submit jobs as kubenetes client mode)

- RUNTIME_K8S_SPARK_IMAGE is the k8s image.
- RUNTIME_PERSISTENT_VOLUME_CLAIM is to specify volume mount. We are supposed to use volume mount to store or receive data. Get ready with [Kubernetes Volumes](https://spark.apache.org/docs/latest/running-on-kubernetes.html#volume-mounts).
- RUNTIME_DRIVER_HOST is to specify driver localhost.
- RUNTIME_DRIVER_PORT is to specify port number.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add (only required when submit jobs as kubenetes client mode)

Copy link
Contributor

@glorysdj glorysdj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Le-Zheng Le-Zheng merged commit 3a5ee76 into intel-analytics:master Apr 13, 2020
@Le-Zheng Le-Zheng deleted the k8stest branch April 13, 2020 03:06
glorysdj added a commit that referenced this pull request Oct 14, 2021
* add hyperzoo for k8s support (#2140)

* add hyperzoo for k8s support

* format

* format

* format

* format

* run examples on k8s readme (#2163)

* k8s  readme

* fix jdk download issue (#2219)

* add doc for submit jupyter notebook and cluster serving to k8s (#2221)

* add hyperzoo doc

* add hyperzoo doc

* add hyperzoo doc

* add hyperzoo doc

* fix jdk download issue (#2223)

* bump to 0.9s (#2227)

* update jdk download url (#2259)

* update some previous docs (#2284)

* K8docsupdate (#2306)

* Update README.md

* Update s3 related links in readme  and documents (#2489)

* Update s3 related links in readme  and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* Update s3 related links in readme and documents

* update

* update

* modify line length limit

* update

* Update mxnet-mkl version in hyper-zoo dockerfile (#2720)

Co-authored-by: gaoping <[email protected]>

* update bigdl version (#2743)

* update bigdl version

* hyperzoo dockerfile add cluster-serving (#2731)

* hyperzoo dockerfile add cluster-serving

* update

* update

* update

* update jdk url

* update jdk url

* update

Co-authored-by: gaoping <[email protected]>

* Support init_spark_on_k8s (#2813)

* initial

* fix

* code refactor

* bug fix

* update docker

* style

* add conda to docker image (#2894)

* add conda to docker image

* Update Dockerfile

* Update Dockerfile

Co-authored-by: glorysdj <[email protected]>

* Fix code blocks indents in .md files (#2978)

* Fix code blocks indents in .md files

Previously a lot of the code blocks in markdown files were horribly indented with bad white spaces in the beginning of lines. Users can't just select, copy, paste, and run (in the case of python). I have fixed all these, so there is no longer any code block with bad white space at the beginning of the lines.
It would be nice if you could try to make sure in future commits that all code blocks are properly indented inside and have the right amount of white space in the beginning!

* Fix small style issue

* Fix indents

* Fix indent and add \ for multiline commands

Change indent from 3 spaces to 4, and add "\" for multiline bash commands

Co-authored-by: Yifan Zhu <[email protected]>

* enable bigdl 0.12 (#3101)

* switch to bigdl 0.12

* Hyperzoo example ref (#3143)

* specify pip version to fix oserror 0 of proxy (#3165)

* Bigdl0.12.1 (#3155)

* bigdl 0.12.1

* bump 0.10.0-Snapshot (#3237)

* update runtime image name (#3250)

* update jdk download url (#3316)

* update jdk8 url (#3411)

Co-authored-by: ardaci <[email protected]>

* update hyperzoo docker image (#3429)

* update hyperzoo image (#3457)

* fix jdk in az docker (#3478)

* fix jdk in az docker

* fix jdk for hyperzoo

* fix jdk in jenkins docker

* fix jdk in cluster serving docker

* fix jdk

* fix readme

* update python dep to fit cnvrg (#3486)

* update ray version doc (#3568)

* fix deploy hyperzoo issue (#3574)

Co-authored-by: gaoping <[email protected]>

* add spark fix and net-tools and status check (#3742)

* intsall netstat and add check status

* add spark fix for graphene

* bigdl 0.12.2 (#3780)

* bump to 0.11-S and fix version issues except ipynb

* add multi-stage build Dockerfile (#3916)

* add multi-stage build Dockerfile

* multi-stage build dockerfile

* multi-stage build dockerfile

* Rename Dockerfile.multi to Dockerfile

* delete Dockerfile.multi

* remove comments, add TINI_VERSION to common arg, remove Dockerfile.multi

* multi-stage add tf_slim

Co-authored-by: shaojie <[email protected]>

* update hyperzoo doc and k8s doc (#3959)

* update userguide of k8s

* update k8s guide

* update hyperzoo doc

* Update k8s.md

add note

* Update k8s.md

add note

* Update k8s.md

update notes

* fix 4087 issue (#4097)

Co-authored-by: shaojie <[email protected]>

* fixed 4086 and 4083 issues (#4098)

Co-authored-by: shaojie <[email protected]>

* Reduce image size (#4132)

* Reduce Dockerfile size
1. del redis stage
2. del flink stage
3. del conda & exclude some python packages
4. add copies layer stage

* update numpy version to 1.18.1

Co-authored-by: zzti-bsj <[email protected]>

* update hyperzoo image (#4250)

Co-authored-by: Adria777 <[email protected]>

* bigdl 0.13 (#4210)

* bigdl 0.13

* update

* print exception

* pyspark2.4.6

* update release PyPI script

* update

* flip snapshot-0.12.0 and spark2.4.6 (#4254)

* s-0.12.0 master

* Update __init__.py

* Update python.md

* fix docker issues due to version update (#4280)

* fix docker issues

* fix docker issues

* update Dockerfile to support spark 3.1.2 && 2.4.6 (#4436)

Co-authored-by: shaojie <[email protected]>

* update hyperzoo, add lib for tf2 (#4614)

* delete tf 1.15.0 (#4719)

Co-authored-by: Le-Zheng <[email protected]>
Co-authored-by: pinggao18 <[email protected]>
Co-authored-by: pinggao187 <[email protected]>
Co-authored-by: gaoping <[email protected]>
Co-authored-by: Kai Huang <[email protected]>
Co-authored-by: GavinGu07 <[email protected]>
Co-authored-by: Yifan Zhu <[email protected]>
Co-authored-by: Yifan Zhu <[email protected]>
Co-authored-by: Song Jiaming <[email protected]>
Co-authored-by: ardaci <[email protected]>
Co-authored-by: Yang Wang <[email protected]>
Co-authored-by: zzti-bsj <[email protected]>
Co-authored-by: shaojie <[email protected]>
Co-authored-by: Lingqi Su <[email protected]>
Co-authored-by: Adria777 <[email protected]>
Co-authored-by: shaojie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants