This project contains a sample application that reads a dataset from HDFS and presents it in a graphical form to the user. The figure below shows the data flow for this sample.
- The dataset is uploaded through the TAP Data catalog into the platform and stored on HDFS.
- The data scientist performs analysis on it using the Analytics Toolkit. The resulting file is also stored on HDFS.
- The application developer uploads the dataset-reader application into the platform and binds it with the file produced by the data scientist.
- The dataset-reader application presents the dataset visually as a set of charts.
The best way to learn about the data analytics capabilities of the platform is to work through Workshop: Performing Analytics on Your Data.
However, if you are mostly interested in using the application to visualize data, you can follow the steps below to make use of a pre-built dataset (instead of working through the workshop mentioned above).
- Clone this GitHub repository.
- In the TAP console, navigate to Data catalog > Submit transfer.
- Choose Local path, then navigate to the local file and select it.
- Select the file to upload (sample dataset can be found here: data/nf-data-application.csv).
- Or you can select Link and paste the link to the raw file on GitHub (which is, nf-data-application.csv).
- Enter a title in the Title field.
- Click the Upload button.
- When the transfer finishes, your new dataset will be visible in Data catalog > Data sets.
- To acquire the link to the file on HDFS, click on the name of your dataset in Data catalog > Data sets and copy the value of the targetUri property.
To learn about the data visualization capabilities of the platform, work through Workshop: Visualizing Your Data with an App.
Note: If you made use of the pre-built dataset (instead of building your own), you can still work through the Visualization Workshop. But remember to supply the title you chose in step 5 above when you get to Visualization Workshop step 8 (Update the environment and set the dataset link).
Useful links:
- Training dataset: nf-hour.csv
- Jupyter notebook: Netflow_Demo.pynb