Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hope to get a more detailed installation description #143

Open
emineh opened this issue Jan 30, 2018 · 11 comments
Open

Hope to get a more detailed installation description #143

emineh opened this issue Jan 30, 2018 · 11 comments

Comments

@emineh
Copy link

emineh commented Jan 30, 2018

1.When I tried to setup the development environment, I found the installation description says there exists a 'src/ClusterBootstrap/deploy' folder which contains important information to access the deployed DL workspace cluster, but I could not find such a folder.
2.After I ran the following command, I could not get to the '/home/DLWorkspace/src/ClusterBootstrap' folder.

git clone https://github.com/microsoft/DLWorkspace
docker run -ti -v DLWorkspace:/home/DLWorkspace jinl/dlworkspacedevdocker /bin/bash
cd /home/DLWorkspace/src/ClusterBootstrap

These are operated on the Ubuntu 16.04 system.

@emineh
Copy link
Author

emineh commented Jan 30, 2018

As for the second problem, I try to modify the path of the first ‘DLWorkspace’ to the absolute path and this problem is solved.

@resouer
Copy link
Member

resouer commented Jan 30, 2018

src/ClusterBootstrap/deploy should appear after you finished installation :)

@emineh
Copy link
Author

emineh commented Jan 31, 2018

@resouer Thank you for your reply. But I meet some other questions in the following installation process:
When setting up the cluster, I found the instruction is too brief. There is no config.yaml in the '/src/ClusterBootstrap' folder, just three similar files with names ending in the.yaml.template sufiix. I copied the config_philly.yaml.template file and rename it 'config.yaml'. After I set the important information of the cluster(e.g., cluster name, number of Etcd servers used), I use this command ./deploy.py backup [backup_file_prefix] [password] to backup cluster configuration, while an error occurring:

Traceback (most recent call last):
File "./deploy.py", line 3660, in
run_command( args, command, nargs, parser)
File "./deploy.py", line 2933, in run_command
merge_config(config, yaml.load(f))
File "/usr/local/lib/python2.7/dist-packages/yaml/init.py", line 71, in load
return loader.get_single_data()
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 39, in get_single_data
return self.construct_document(node)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 48, in construct_document
for dummy in generator:
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 398, in construct_yaml_map
value = self.construct_mapping(node)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 208, in construct_mapping
return BaseConstructor.construct_mapping(self, node, deep=deep)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 132, in construct_mapping
"found unacceptable key (%s)" % exc, key_node.start_mark)
yaml.constructor.ConstructorError: while constructing a mapping
in "/home/DLWorkspace/src/ClusterBootstrap/config.yaml", line 4, column 22
found unacceptable key (unhashable type: 'dict')
in "/home/DLWorkspace/src/ClusterBootstrap/config.yaml", line 4, column 23

Then I commented out the following lines in the config.yaml and ran the command ./deploy.py backup [backup_file_prefix] [password] again, however, another error occurred.
sqlserver-hostname : {{}} sqlserver-username : {{}} sqlserver-password : {{}} sqlserver-database : {{}}

error:

cp: cannot stat './deploy/clusterID.yml': No such file or directory
./deploy_backup/backup/
./deploy_backup/backup/config.yaml
./deploy_backup/backup/clusterID/
./deploy_backup/backup/ssl/
./deploy_backup/backup/sshkey/

I have never deployed similar cluster before, so I am really confused about the installation guidance. I have saw the subsequent guidance which is still very sketchy. I was wondering if there is a more detailed guidance for the newbie like me.
Thank you for your time.

@resouer
Copy link
Member

resouer commented Jan 31, 2018

It's expect for you to copy one of config_* as a template. I would suggest use azure template which should be easiest.

@emineh
Copy link
Author

emineh commented Feb 4, 2018

@resouer I met so many problems during the deployment and felt hard to finish it on the Ubuntu 16.04 system. I wonder have you ever deployed this project using the azure template successfully and whether the installation description for the azure template is really helpful for the user :)

@resouer
Copy link
Member

resouer commented Feb 4, 2018

@emineh That's funny ... how come you think https://github.com/Microsoft/DLWorkspace/blob/alpha.v1.5/docs/deployment/configuration/Readme.md is the deployment guide?

Please refers to the guide for Azure.

It is mentioned with bold lines in README ...

Here's exactly the config.yaml I am using, it's designed to be "sketchy":

cluster_name: harrydevbox

azure_cluster:
  harrydevbox:
    infra_node_num: 1
    infra_vm_size : Standard_D3_v2
    worker_node_num: 2
    worker_vm_size: Standard_D3_v2
    azure_location: westus

UserGroups:
  DLWSAdmins:
    Allowed: [ "[email protected]" ]
    uid : "20000"
    gid : "20001"
  DLWSRegister:
    Allowed: [ "@gmail.com" ]
    uid : "20001-29999"
    gid : "20001"

WebUIregisterGroups: [ "DLWSRegister"]
WebUIauthorizedGroups: [ "DLWSAdmins" ]
WebUIadminGroups: ["DLWSAdmins"]

DeployAuthentications : ["Corp", "Gmail", "Live"]

WinbindServers: []

webuiport: 3080

@emineh
Copy link
Author

emineh commented Feb 5, 2018

@resouer Thanks. I am sorry for my inappropriate question. It seems that I have been trying the wrong way. Again, thank you for your reply.

@hongzhili
Copy link
Member

hongzhili commented Feb 6, 2018

@emineh it seems that you want to deploy on your own ubuntu cluster, not on azure cluster.

https://github.com/Microsoft/DLWorkspace/blob/alpha.v1.5/docs/deployment/On-Prem/Ubuntu.md is for the users who want to deploy DLWorkspace on bare metal, including set up PXE server to install OS on each worker.
In that case, you should follow this instruction to install DLWorkspace on a single node.

you also need to create a SQL server and a NFS file share server by yourself.
After that, your config.yaml should be something like this:

cluster_name : <<your cluster Name>>
etcd_node_num : 1

sqlserver-hostname : tcp:<<dns_name_of_server>>
sqlserver-username : <<sql_user_name>>
sqlserver-password : <<sql_password>>
sqlserver-database : DLWorkspaceJobs

network:
  domain: <<current_domain>>

platform-scripts : ubuntu

machines:
  <<machine1>>:
    role: infrastructure

mountpoints:
  <<a_name>>:
    type: nfs
    server: <<nfs_server_address>>
    filesharename: <<nfs_share_path>>

Please let me know if you have any question.

@emineh
Copy link
Author

emineh commented Feb 6, 2018

@hongzhili Yeah, I really want to deploy on my own ubuntu cluster. Thank you for patiently answering my question. I will refer to the instruction you give to have another try.

@hongzhili
Copy link
Member

@emineh just wanted to follow up to see if you have been able to deploy it on your cluster.

@emineh
Copy link
Author

emineh commented Feb 21, 2018

@hongzhili Thanks for your concern about the deployment process. I have roughly finished the deployment. But when I run the deployment script block:
./deploy.py --verbose scriptblocks ubuntu_uncordon
there is an error:
`Activate Master Node(s): 0

Activate ETCD Node(s):0

Activate Worker Node(s):0

Cannot deploy cluster since there are insufficient number of etcd server or master server.
To continue deploy the cluster we need at least 1 etcd server(s)
It seems that there is no activate ETCD Node on my ubuntu cluster. I wonder if I had made some mistakes in the deployment process or I should install the ETCD Node by myself. My configuration file is as follows.
cluster_name : ccs1
clusterId: ccszoro

etcd_node_num : 1

sqlserver-hostname : tcp:192.168.3.4
sqlserver-username : SA
sqlserver-password : SA1234
sqlserver-database : DLWorkspaceJobs

platform-scripts : ubuntu

ssh_cert: ~/.ssh/id_rsa

machines:
zoro-N551JK:
role: infrastructure

mountpoints:
nfsshare:
type: nfs
server: 192.168.3.4
filesharename: /home/nfs-server`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants