- Step 1. Write Quick Start Configuration
- Step 2. Generate OpenPAI configuration files
- Optional Step 3. Customize configure OpenPAI
There is a example file in the link .
An example yaml file is shown below. Note that you should change the IP address of the machine and ssh information accordingly.
# quick-start.yaml
# (Required) Please fill in the IP address of the server you would like to deploy OpenPAI
machines:
- 192.168.1.11
- 192.168.1.12
- 192.168.1.13
# (Required) Log-in info of all machines. System administrator should guarantee
# that the username/password pair or username/key-filename is valid and has sudo privilege.
ssh-username: pai
ssh-password: pai-password
# (Optional, default=None) the key file that ssh client uses, that has higher priority then password.
#ssh-keyfile-path: <keyfile-path>
# (Optional, default=22) Port number of ssh service on each machine.
#ssh-port: 22
# (Optional, default=DNS of the first machine) Cluster DNS.
#dns: <ip-of-dns>
# (Optional, default=10.254.0.0/16) IP range used by Kubernetes. Note that
# this IP range should NOT conflict with the current network.
#service-cluster-ip-range: <ip-range-for-k8s>
cd /pai
# cmd should be executed under pai directory in the dev-box.
python paictl.py config generate -i /pai/deployment/quick-start/quick-start.yaml -o ~/pai-config -f
vi ~/pai-config/services-configuration.yaml
For example: v0.x.y branch, user should change docker-tag to v0.x.y.
docker-tag: v0.x.y
Quick start will generate node with 1 gpu with type generic, this may not suit your situation, for example, if you have two types of machines, and one type has 4 Tesla K80 gpu cards, and another has 2 Tesla P100 cards, you should modify your ~/pai-config/layout.yaml as following:
machine-sku:
k80-node:
mem: 40G
gpu:
type: Tesla K80
count: 4
cpu:
vcore: 24
os: ubuntu16.04
p100-node:
mem: 20G
gpu:
type: Tesla P100
count: 2
cpu:
vcore: 24
os: ubuntu16.04
machine-list:
- hostname: xxx
hostip: yyy
machine-type: k80-node
- hostname: xxx
hostip: yyy
machine-type: p100-node
The paictl
tool sets the following default values in the 4 configuration files:
Configuration Property | Default value |
---|---|
master node |
The first machine in the machine list will be configured as the master node. |
SSH port |
If not explicitly specified, the SSH port is set to 22 . |
cluster DNS |
If not explicitly specified, the cluster DNS is set to the value of the nameserver field in /etc/resolv.conf file of the master node. |
IP range used by Kubernetes |
If not explicitly specified, the IP range used by Kubernetes is set to 10.254.0.0/16 . |
docker registry |
The docker registry is set to docker.io , and the docker namespace is set to openpai . In another word, all PAI service images will be pulled from docker.io/openpai (see this link on DockerHub for the details of all images). |
Cluster id |
Cluster id is set to pai-example |
REST server's admin user |
REST server's admin user is set to admin , and its password is set to admin-password |
VC |
There is only one VC in the system, default , which has 100% of the resource capacity. |
This method is for advanced users.
The description of each field in these configuration files can be found in A Guide For Cluster Configuration.
If user want to customize configuration, please see the table below
-
Configure OpenPAI from scenarios
- placement
- scheduling
- account
- port / data folder etc.
- component version
- HA
-
- Cluster related configuration: configuration of layout.yaml
- Kubernetes role related configuration: It will be deprecated
- Kubernetes related configuration: configuration of kubernetes-configuration.yaml
- Service related configuration: configuration of services-configuration.yaml
-
Configure OpenPAI services [Note: This part is for advanced user who wants to customize OpenPAI each service]