-
Notifications
You must be signed in to change notification settings - Fork 8
Space Shuttle example for Analytics Toolkit
This page describes the steps for deploying the Analytics Toolkit version of the space-shuttle-demo application to TAP. Most of the major steps have sub-steps. The link to the GitHub repository for this example is located here.
Steps | Summary |
---|---|
0 | Prerequisites |
1 | Clone or Download the GitHub repository |
2 | Uploading training data set to HDFS |
3 | Creating Analytics Toolkit instance |
4 | Creating the Jupyter instance and uploading Space Shuttle notebook |
5 | Connecting to TAP server and generating the model |
6 | Creating the scoring engine |
7 | Creating instances of additional services (InfluxDB, Zookeeper, and Gateway) |
8 | Building and uploading the central app |
9 | Sending data to Gateway and Kafka |
The following prerequisites are needed to successfully complete this example:
- An IDE environment such as IntelliJ IDE Community Edition (or Eclipse)
- Oracle JDK 1.8
- Apache maven
- Cloud Foundry CLI
See the Sample environment variables for Windows information at the end of this page when using Windows.
Clone/download this GitHub repository to your local system.
New to GitHub? Cloning instructions are located here.
A. Login to the TAP console and select the organization and space to work in from the drop-down menus in the upper right of the TAP console.
B. Navigate to Data catalog > Submit Transfer.
C. Select the input type: Local path.
D. Find and select the sample training data file, which can be found here: space-shuttle-demo-master/atkmodelgenerator/train-data.csv
E. Enter a data set title in the Title field.
F. Choose a Category if you want to help you keep things organized going forward. (Other is the default.)
G. Click the Upload button.
H. When the upload is completed, click OK in the dialog box. Then click the Data catalog > Data sets tab. Your data set is listed.
I. Click on the name of the data set to see detailed information. (You will need the URI to the data set later.)
A. In the TAP console, navigate to Data Science > TAP Analytics Toolkit.
B. If there is an instance of TAP Analytics Toolkit already installed, you will see it in the Installed Instances list; in this case, no further action is needed. You can skip to Step 4.
C. But if there are no instances, you will be asked if you want to create one. Enter a toolkit Name and select Yes on dialog box.
D. Wait until the application is created (this can take a minute or two). The application will appear in the TAP Analytics Toolkit Installed Instances list after creation.
You can watch progress of the instance creation by clicking the Log Event dashboard link.
A. In the TAP console, navigate to Data Science > Jupyter.
B. If a Jupyter instance already exists, skip this step. If no Jupyter instance exists, enter a notebook Instance Name and click the Create a new Jupyter instance button. (This takes less than a minute to complete.)
C. For your Jupyter instance, click Show to see the password for the instance. Copy the password for use in the next step.
D. Click the App Url link to login to your Jupyter instance, using the password you just copied.
E. In Jupyter, click the (white) Upload button, then navigate to the Space Shuttle notebook file, which can be found here: space-shuttle-demo-master/atkmodelgenerator/Space_Shuttle_SVM.ipynb
F. Select the file and click the second (blue) Upload button.
G. Back in Jupyter, click on the Space_Shuttle_SVM.ipynb
file to open the file.
In this step, you modify the generic notebook script, then run it to create the model.
A. On the menu bar at the top of the Jupyter window, click TAP Help > Create Credentials. Jupyter inserts a new cell at the top of the script.
B. In the TAP Console, copy the URL for your Analytics Toolkit instance from the Data Science > TAP Analytics Toolkit tab.
C. In Jupyter, paste this URL over the default name for the atk.server.uri
in the first cell.
atk.server.uri=`your_atk_server_uri__here'
Do not include the
http://
protocol prefix.
D. Now paste the same URL over the default name for the ia.server.uri
in the second cell.
ia.server.uri=”your_atk_server_uri_here”
Do not include the
http://
protocol prefix.
E. Replace the Shuttle.creds
default credentials filename in the second cell with ATK.creds
.
ia.connect(r`ATK.creds’)
F. In the TAP Console, navigate to the Data catalog > Data sets tab and then click on the name of the data set to show detailed information. Copy the URI for the data set for use in the next step.
G. Back in Jupyter, paste this URI over the default name for data set ds
in the fourth cell.
ds=”your_dataset_uri_here”
H. Select the first cell, then on the Jupyter menu, select Cell > Run Cell, Select Below, this is the same as the icon, to run the script in the first cell. You will be prompted for:
- The URL of the ATK server (again). Paste the URL into the field and press Enter.
- Your TAP user name. Type your user name and press Enter. Your TAP password. Type your password and press Enter.
- To connect to the Analytics Toolkit server. Enter Y and press Enter.
I. When execution of the first cell is finished, the asterisk in the In [*]
text on the right side of the cell is replaced by the cell number. Work through the remaining cells, one at a time, using Cell > Run, Highlight Next. Make sure you wait for the current cell to complete before moving to the next one.
J. When finished, the URI for the created model in HDFS is displayed in the Output for Cell 7. Copy the URI for use in the Step 6.
To create the Scoring Engine service instance:
A. From the TAP console, navigate to Services > Marketplace. Select the "TAP Scoring Engine" service.
B. Type the name space-shuttle-scoring-engine
C. Click + Add an extra parameter and add the URI to your model (from Step 5J) as part of this key/value pair:
key: uri value: hdfs://path_to_your_model_here
D. Click the Create new instance button.
This may take a minute or two. You can monitor the process via the Event Log tab.
Create the following required service instances (if they do not exist already):
- InfluxDB (recommended name:
space-shuttle-db
) - Zookeeper (recommended name:
space-shuttle-zookeeper
, create as Shared plan) - Gateway (recommended name:
space-shuttle-gateway
)
These services can all be found in Services > Marketplace.
The application will connect to these service instances using Spring Cloud Connectors.
If you use the recommended names for the required service instances, they will be bound automatically with the application when it is pushed to Cloud Foundry. Otherwise, service instance names will need to be either: (1) adjusted in the 'manifest.yml' file manually or (2) removed from 'manifest.yml' and bound manually after the application is pushed to Cloud Foundry. These two options are not covered here.
A. Open a command line window and change directory to the location where you cloned your GitHub repository.
If you list/dir the contents of this directory, you should see a
pom.xml
file. Maven uses the contents of thepom.xml
file to perform the build.
B. In the CLI, invoke maven to create a Java package:
mvn package
If you created service instances with different names than were recommended, edit the auto-generated
manifest.yml
file to adjust the names of service instances in the services section to match those that you have created. These files are located at:src/cloudfoundry/
. You can also remove the services section and bind them manually later. You may also want to change the application host/name.
The Linux
cp
command will fail in a Windows environment. In this case, manually copy two manifest (YAML) files from the subdirectories to the current working directory. Locations and files are:
target/classes/manifest.yml
target/classes/manifest-mqtt.yml
C. Log into the Cloud Foundry CLI using the following command:
cf login -a your_analytics_platform_uri_here
Note: Copy the TAP instance URI, then replace
console
at the beginning withapi
. Examples:
- Before:
console.ourtap07.teamanalytics.com
- After:
api.ourtap07.teamanalytics.com
D. When prompted, enter your:
- TAP user name
- TAP password
- Organization number (if there are multiple organizations)
E. Push the application to the platform using the Cloud Foundry (CF) command:
cf push
You can see this login and push command sequence in the TAP Console App Development tab.
CF uses the manifest.yml
file to upload the application to TAP. If you used the recommended names earlier, or edited the manifest.yml
file in the previous step, the application will be started, but no data is being sent to it yet.
If you removed the services section from
manifest.yml
, the application will not be started yet. First, bind the required service instances (cf bind-service
) to the application and then restage (cf restage
) it.
If the default application name (
space-shuttle
) is already uploaded to any organization on the TAP platform, you will get an error message stating that the host name is taken. In this case, open themanifest.yml
file and rename the application. Then attempt thecf push
again.
To send data to the Kafka queue (integrated into TAP), through the Gateway service instance, you can either:
- Change your directory to
client
, then pushspace_shuttle_client
from the client directory to the TAP space with the existing Gateway instance, using this command:
cf push space-shuttle-client
CF uses the
manifest.yml
file in the client directory.
or
- Use the Python file
space_shuttle_client.py
, locally passing the Gateway URL as a parameter. Data will start to be sent to the application.
In the TAP Console, navigate to Applications and click the link for the Space Shuttle Client. The space shuttle image appears, followed by the anomaly data grid, and, finally, by anomaly data.
It may take several minutes for the data grid to appear, and up to several hours for anomaly data to appear. Good data, however, is being sent immediately. You can check on data flow to the application via the following command in your CLI window:
cf logs space-shuttle
Use your actual application name if you chose something other than the default name.
##Sample environment variables for Windows:
###PATH system environment variable:
- C:\CF\CloudFoundry
- C:\Program Files\Java\jdk1.8.0_112\bin
- C:\maven\apache-maven-3.3.9\bin
###JAVA_HOME system environment variable:
- C:\Program Files\Java\jdk1.8.0_112\bin