Skip to content
This repository has been archived by the owner on Dec 31, 2020. It is now read-only.

Examples

Eron Wright edited this page Sep 6, 2017 · 13 revisions

WikiExamples

A few examples are provided to familiarize you with using NuPIC in a Flink environment.

Running the Examples

  1. Install Flink 1.3.x, and set FLINK_HOME to the install location.
  2. Start a local Flink environment.
  3. Browse to http://localhost:8081/
[flink]$ bin/start-local.sh
Starting jobmanager daemon on host host1.local

Consider increasing the number of task slots in the local environment, by editing the taskmanager.numberOfTaskSlots property in $FLINK_HOME/conf/flink-conf.yaml.

The provided bin/run-example script launches the example program into the local Flink environment, using flink run.

Note that the examples are also runnable in your IDE. In that situation an embedded Flink environment is automatically used.

Hot Gym

The Hot Gym example is based on the well-known example in NuPIC: One Hot Gym Prediction Tutorial. See also the HTM.java implementation [here] (https://github.com/numenta/htm.java-examples/tree/master/src/main/java/org/numenta/nupic/examples/napi/hotgym).

The flink-htm embodiment is at flink-htm-examples/.../hotgym/HotGym.scala.

Running

Launch the hotgym Flink job:

[flink-htm-examples]$ bin/run-example hotgym.Demo --input file://`pwd`/data/rec-center-hourly-no-header.csv --output file:///tmp/hotgym-output.csv

Examine the output once the job completes. The output consists of four columns - datetime, Consumption (actual), Consumption (predicted), Anomaly Score.

[flink-htm-examples]$ cat /tmp/hotgym-output.csv
07/02/10 00:00,21.2,0.0,1.0
07/02/10 01:00,16.4,21.2,1.0
07/02/10 02:00,4.7,16.4,1.0
07/02/10 03:00,4.7,4.7,1.0
07/02/10 04:00,4.6,4.7,1.0
...
12/31/10 19:00,10.5,21.1,0.65
12/31/10 20:00,5.3,10.5,0.075
12/31/10 21:00,5.1,5.3,0.175
12/31/10 22:00,5.0,5.1,0.025

NYC Traffic Tutorial

Another well-known example in the NuPIC community is the NYC traffic anomalies tutorial. Given a data stream of traffic reports as provided by the NYDOT via the River View service, numerous HTM models are trained (in parallel) to detect anomalous traffic patterns. As seen in the screencast, the models are used to identify highly anomalous routes occurring during a certain time period.

The flink-htm embodiment is at flink-htm-examples/.../traffic/Traffic.scala.

Running

Launch the traffic Flink job:

[flink-htm-examples]$ bin/run-example -p 1 traffic.Demo --output file:///tmp/traffic-output-1.csv

Examine the output once the job completes. The output consists of three columns - datetime, streamId (i.e. a particular route), Anomaly Score.

[flink-htm-examples]$ cat /tmp/traffic-output-1.csv
2015-08-03T00:55:25.000-04:00,204,1.0
2015-08-03T10:07:25.000-04:00,204,1.0
2015-08-03T11:07:25.000-04:00,204,1.0
...
2015-08-03T15:58:26.000-04:00,446,0.9
2015-08-03T20:18:25.000-04:00,446,0.95
2015-08-03T20:38:25.000-04:00,446,0.9

Increasing Parallelism

In all situations, a separate HTM model will be trained for each traffic route, due to the keyBy operation on the inference stream. To maximize performance, numerous task slots may be leveraged to train those models in parallel. To increase the parallelism, adjust the -p flag passed to the bin/run-example script. Be sure to configure your Flink environment to have sufficient task slots (as described at the top).

[flink-htm-examples]$ bin/run-example -p 4 traffic.Demo --output file:///tmp/traffic-output-4.csv

Note that the job code sets the csv sink's parallelism to 1, to ensure that single file is written. If not for that, numerous csv files would be emitted (one per parallel instance of the sink).

Clone this wiki locally