Skip to content

0.7 Openstack Platform Deployment

Piotr Milewski edited this page Nov 17, 2016 · 30 revisions

Prerequisites

TAP recommends using Mirantis Openstack 7.0 for deployments.

Note: TAP install requires Internet connectivity

Hardware recommendations:

  • 1x Controller node: 2 CPUs with 6 cores, 24 GB of RAM, 1 TB RAID1
  • 1x Storage node: 1 CPU with 4 physical cores, 12 GB of RAM, 500 GB RAID1
  • 1x fuel server: Quad-core CPU, 4 GB RAM, 1 Gbps Ethernet, 128 GB SAS Disk, IPMI access through independent management network.
  • 6x compute node (each): Dual-socket CPU with at least 4 physical cores per socket, 64 GB RAM, 256 GB SSD

For VLAN networking 2 NICs are recommended, when using VXLAN only 5 are recommended.

Additional prerequisites for Hybrid deployments can be found here: 0.7 Openstack Hybrid Prerequisites

Configuration recommendations:

  • X-Auth-Token should be valid for 24h
    • Login to controller node as root
    • Edit  /etc/keystone/keystone.conf
    • Find section [token]
    • Change expiration = 3600 to expiration = 86400
    • Restart apache2 service (if your controller runs on Ubuntu) or httpd service (if your controller runs on CentOS)
  • Nova should use lvm type storage for VM. (Nova configuration.)
  • Cloudera instances flavor must be set to at least m1.large - deployment with m1.medium instances fails due to lack of RAM.

Create a Stack

  1. Download heat template for stack. Use TAP-FullVM.yaml for Full VM type install, or TAP-Hybrid.yaml for Hybrid type install.

  2. Log into OpenStack Horizon WebUI as admin.

  3. Create a new OpenStack project (Identity -> Projects -> Create Project), set quotas for Volumes, Vol Snapshots, Total size of Vols and Security Groups to "-1".

  4. Create a new OpenStack user (Identity -> Users -> Create User), grant admin rights to the project just created.

  5. Logout from Horizon and log in with just created user identity.

  6. Switch the UI context to the project just created (Top bar drop-down menu).

  7. Import a SSH key pair (Project -> Compute -> Access & Security -> Key Pairs -> Import Key Pair).

  8. Allocate and note down a Floating IP (Project -> Compute -> Access & Security -> Floating IPs -> Allocate IP To Project). Use it to register DNS A wildcard record of a TAP Domain.

  9. Note down API URL (Project -> Compute -> Access & Security -> Api Access -> Identity).

  10. Launch a Stack (Project -> Orchestration -> Stacks -> Launch Stack).

  11. Provide a template file as Template Source.

  12. Increase timeout to 300 minutes.

  13. Set OpenStack identity API URL to noted down API URL.

  14. Set Public IP to a noted down Floating IP

    If you're behind a http proxy, and your Floating IP is accessed directly, put previously registered TAP Domain into No Proxy list - also if your OpenStack Horizon address is accessed directly, put Horizon IP (as in API URL) into No Proxy list

Deploy the platform

  1. When the stack is created - log in to a Jump Box instance using SSH with the key you've chosen: ssh ubuntu@<jumpbox_server_ip> -i <ssh_key.pem>
  2. Run a shell script to finish the installation:
    1. with Kerberos disabled:

      sudo -i curl -Sso tqd.sh https://s3.amazonaws.com/trustedanalytics/tqd.sh && sudo -i bash tqd.sh

    2. with Kerberos enabled:

      sudo -i curl -Sso tqd.sh https://s3.amazonaws.com/trustedanalytics/tqd.sh && sudo -i KERBEROS_ENABLED=True bash tqd.sh

The whole deployment process should take from 2 to 5 hours. Once the process is complete (the script finishes without writing about failure), you can access the TAP console via https://console.DOMAIN_NAME_YOU_CHOSE and login with the username admin and the password you can find accessing Horizon UI for your OpenStack project (Project -> Orchestration -> Stacks -> (choose stack) -> Overview -> Outputs/password).

To access individual VMs, please SSH into the Jump Box machine using the procedure from the "Accessing installation logs" section below.

Troubleshooting

Accessing OpenStack installation logs

  1. Go to the Openstack Horizon UI Stacks tab (Project -> Orchestration -> Stacks -> (choose stack) -> Overview) and get the Jump box IP address: image
  2.     SSH to the instance using the user ubuntu and the key provided during the installation ssh ubuntu@<jumpbox_server_ip> -i <ssh_key.pem>.
  3.     Search /var/log/ansible.log for failed steps.

Accessing CDH manager

  1. Log in to a Jump Box instance using SSH with port forwarding set up to the cdh-master-2 machine: ssh ubuntu@<JumpBoxEIP> -L 7180:cdh-master-2:7180 -i <key.pem>
  2. You should be able to access the CDH Manager web UI via http://localhost:7180

Debugging failed stack

image

  1. Check if OpenStack X-Auth-Token can be valid for 24h (see: Configuration recommendations)
  2. Check installation logs
    1. If installation failed on TASK [tap : command], log should looks like this:
2016-06-23 14:22:26,688 p=25908 u=root |  TASK [tap : command] ***********************************************************
2016-06-23 14:22:27,302 p=25908 u=root |  changed: [localhost] => (item=cf)
2016-06-23 14:22:27,314 p=25908 u=root |  RUNNING HANDLER [tap : Deploy] *************************************************
2016-06-23 14:40:21,500 p=25908 u=root |  fatal: [localhost]: FAILED! => {"changed": true, "cmd": "bosh --no-color -n deploy", "delta": "0:17:54.0847
02", "end": "2016-06-23 14:40:21.476447", "failed": true, "rc": 1, "start": "2016-06-23 14:22:27.391745", "stderr": "Acting as user 'admin' on deploy
ment 'cf' on 'WHOOP-8040-bosh'", "stdout": "Getting deployment properties from director...\nUnable to get properties list from director, trying witho
ut it...\nCannot get current deployment information from director, possibly a new deployment\n\nDeploying\n---------\n\nDirector task 5\n  Started pr
eparing deployment > Preparing deployment. Done (00:00:00)\n\n  Started preparing package compilation > Finding packages to compile. Done (00:00:00)\
n\n  Started compiling packages\n  Started compiling packages > ruby-2.1.6-intel/c10a92eb4684b9bbcd4b5eaa9b1c485ff10be0fd\n  Started compiling packag
es > rootfs_cflinuxfs2/cbcda034c2bc785743c64ad4bbf689e5c96f09ed\n  Started compiling packages > buildpack_binary/e0c8736b073d83c2459519851b5736c28831
1d92\n  Started compiling packages > common-intel/7c774db615d36d85f4d905736833e7432524d567. Done (00:02:47)\n  Started compiling packages > buildpack
_staticfile/f79dd915e8ee73b297ef3ae1d85b2895d5f5c106. Done (00:00:04)\n  Started compiling packages > buildpack_php/a777948f80
125ad5ea3. Done (00:00:22)", "  Started compiling packages > buildpack_go/9a0f49a47e179202fa04fe2ca39e5cb87110d570", "     Done compiling packages >
buildpack_php/a777948f80667960f6bb8693253e59e888adf3e6 (00:00:40)", "  Started compiling packages > buildpack_nodejs/a55b6669b5138c9d90720dd2dc678de4
8955560d. Done (00:00:37)", "  Started compiling packages > buildpack_ruby/03b4c6236d1e663c05f5985fe964dd9262cdf2db", "     Done compiling packages >
 buildpack_go/9a0f49a47e179202fa04fe2ca39e5cb87110d570 (00:00:55)", "  Started compiling packages > buildpack_java_offline/b13deaa98addc5d157885c8ec3
aad4df6640873f", "     Done compiling packages > buildpack_ruby/03b4c6236d1e663c05f5985fe964dd9262cdf2db (00:00:36)", "  Started compiling packages >
 buildpack_java/b91bbdcc9fbe4d774ab47f4ded312151e741cb2a", "     Done compiling packages > buildpack_java_offline/b13deaa98addc5d157885c8ec3aad4df664
0873f (00:01:12)", "  Started compiling packages > nginx/bf3af6163e13887aacd230bbbc5eff90213ac6af", "     Done compiling packages > buildpack_java/b9
1bbdcc9fbe4d774ab47f4ded312151e741cb2a (00:00:43)", "  Started compiling packages > ruby-2.2.3/b1320e11c7ad997a68103042d3d1c38270309387", "     Done
compiling packages > nginx/bf3af6163e13887aacd230bbbc5eff90213ac6af (00:00:33)", "  Started compiling packages > mysqlclient-5.5/c97be6846302ac67d8ad
54ef08de4f741f8253ea. Done (00:00:04)", "  Started compiling packages > libpq/e9383da451434bed183824a28693268596f7a578. Done (00:00:21)", "  Started
compiling packages > postgres-9.4.2/ac1c8a521594f9459ffede25b9d7e0308811f139. Done (00:03:09)", "  Started compiling packages > postgres/b63fe0176a93
609bd4ba44751ea490a3ee0f646c. Done (00:00:08)", "  Started compiling packages > debian_nfs_server/aac05f22582b2f9faa6840da056084ed15772594. Done (00:
00:05)", "  Started compiling packages > etcd-common/a5492fb0ad41a80d2fa083172c0430073213a296. Done (00:00:04)", "  Started compiling packages > ruby
-2.1.7/c977026b967eab6fad2b03a820dca6f84a900f92", "   Failed compiling packages > rootfs_cflinuxfs2/cbcda034c2bc785743c64ad4bbf689e5c96f09ed: Timed o
ut pinging to db7db92c-5289-4779-ab58-acbf81259b9e after 600 seconds (00:11:48)", "     Done compiling packages > ruby-2.1.6-intel/c10a92eb4684b9bbcd
4b5eaa9b1c485ff10be0fd (00:14:39)", "     Done compiling packages > ruby-2.2.3/b1320e11c7ad997a68103042d3d1c38270309387 (00:10:07)", "     Done compi
ling packages > ruby-2.1.7/c977026b967eab6fad2b03a820dca6f84a900f92 (00:07:52)", "", "Error 450002: Timed out pinging to db7db92c-5289-4779-ab58-acbf
81259b9e after 600 seconds", "", "Task 5 error", "", "For a more detailed error report, run: bosh task 5 --debug"], "warnings": []}
2016-06-23 14:40:21,505 p=25908 u=root |  NO MORE HOSTS LEFT *************************************************************
2016-06-23 14:40:21,509 p=25908 u=root |        to retry, use: --limit @/root/.ansible/pull/jump-box.novalocal/local.retry

2016-06-23 14:40:21,509 p=25908 u=root |  PLAY RECAP *********************************************************************
2016-06-23 14:40:21,510 p=25908 u=root |  localhost                  : ok=80   changed=38   unreachable=0    failed=1

You can recover from this error by executing on JumpBox:

Note: You need to be logged to JumpBox as root user (use sudo -i to change active user from ubuntu to root)

Note: Run those commands under tmux or screen, as they can take a long time.

bosh deployment cf.yml

bosh -n deploy

ℹ️ Information
This task can take a long time, repeat in case of failure.

bosh deployment docker-broker.yml

bosh -n deploy

ℹ️ Information
This task can take a long time, repeat in case of failure.

/tmp/cloudfoundry.sh

Accessing CDH manager

  1. Log in to a Jump Box instance using SSH with port forwarding set up to the cdh-master-2 machine: ssh ubuntu@<jumpbox_server_ip> -i <ssh_key.pem> -L 7180:cdh-master-2:7180
  2. You should be able to access the CDH Manager web UI via http://localhost:7180

Removing an OpenStack environment

  1. Login to jumpbox and switch to root account

    ssh ubuntu@<jumpbox_server_ip> -i <ssh_key.pem>

    sudo -i

  2. Clear extra routes in OS router

    router_id=$(awk -F = '/router_id/ { print $2 }' /etc/ansible/hosts)

    neutron --insecure --os-cloud TAP router-update ${router_id} --routes action=clear

  3. Delete docker-broker deployment

    bosh delete deployment docker-broker

  4. Delete cf deployment

    bosh delete deployment cf

    ℹ️Information
    This task can take a long time, repeat in case of failure.
  5. Delete BOSH director

    cd /root/<deployment-name>-bosh/

    bosh-init delete bosh.yml

  6. Login to Horizon UI

    1. Go to Stacks list (go to: Project -> Orchestration -> Stacks)
    2. Delete your Stack (select it, and click red button in upper right corner)
    3. After Stack delete clean up volume leftovers if any  (Project -> Compute -> Volume)
    4. Remove BOSH Stemcells (Project -> Compute -> Images) and delete all images with name starting with BOSH
  7. Optional (skip this step if in doubt)

    1. Release Floating IP (Project -> Compute -> Access & Security -> Floating IPs)
    2. Delete your SSH key pair (Project -> Compute -> Access & Security -> Floating IPs)

Platform upgrades

Upgrade from version 0.7.0 to version 0.7.1

  1. Log in to a Jump Box instance using SSH: ssh ubuntu@<JumpBoxEIP> -i <key.pem>

  2. Run a shell script to finish the installation:

    1. with Kerberos disabled:

    sudo -i curl -Sso update_v0.7.0_to_v0.7.1.sh https://s3.amazonaws.com/trustedanalytics/update_v0.7.0_to_v0.7.1.sh && sudo -i bash update_v0.7.0_to_v0.7.1.sh

    1. with Kerberos enabled:

    sudo -i curl -Sso update_v0.7.0_to_v0.7.1.sh https://s3.amazonaws.com/trustedanalytics/update_v0.7.0_to_v0.7.1.sh && sudo -i KERBEROS_ENABLED=True bash update_v0.7.0_to_v0.7.1.sh

For concrete upgrade contents please refer to 0.7.1 Release Notes

Q&A

Launching Stack window

Q: Environment Source? (File? Direct Input?)

A: Do not use this field.

Q: Stack Name? are there recommendations for this?

A: Whatever suits You. Good practice is to use the same name as tenant name.

Q: TAP Domain? should this be the same as API URL?

A: No. Register a wildcard domain, and point it to floating IP allocated in step 8 of "Create a Stack"

Q: Where i can get Ubuntu VM Image?

A: You can use upstream Ubuntu 14.04 image from here: https://cloud-images.ubuntu.com/trusty/current/ . Ask Your OpenStack guy how to import images.

Q: Where i can get CentOS VM Image?

A: Use: https://s3-us-west-1.amazonaws.com/openstack-images-dp2/centos-6-x86_64.qcow2 . Ask Your OpenStack guy how to import images.

Q: (Hybrid Install Only) Cloudera Servers CIDR? it's asking the subnet in CIDR format, an example would be nice.

A: Ex: 1.1.1.0/24 Recommended reading: https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing

Q: (Hybrid Install Only) Cloudera Masters? This might be intuitive for some, but a bit more info and/or an example would be nice.

A: Ex: 1.1.1.1,1.1.1.2,1.1.1.3 <- list of IPs for servers dedicated to Cloudera Masters. Last one will by also a Manager.

Q: (Hybrid Install Only) Cloudera Workers? Same as Master.

A: Ex: 1.1.1.4,1.1.1.5,1.1.1.6 <- same as above, but for Workers.

Q: (Hybrid Install Only) Cloudera Storage Path? Same as Master.

A: If your servers have storage mounted as /tproot/data1,/tproot/data2,/tproot/data3 you should input storage paths like that: tproot/data1,tproot/data2,tproot/data3.

Clone this wiki locally