Examples

Wiki ▸ Examples

A few examples are provided to familiarize you with using NuPIC in a Flink environment.

Running the Examples

Install Flink 1.3.x, and set FLINK_HOME to the install location.
Start a local Flink environment.
Browse to http://localhost:8081/

[flink]$ bin/start-local.sh
Starting jobmanager daemon on host host1.local

Consider increasing the number of task slots in the local environment, by editing the taskmanager.numberOfTaskSlots property in $FLINK_HOME/conf/flink-conf.yaml.

The provided bin/run-example script launches the example program into the local Flink environment, using flink run.

Note that the examples are also runnable in your IDE. In that situation an embedded Flink environment is automatically used.

Hot Gym

The Hot Gym example is based on the well-known example in NuPIC: One Hot Gym Prediction Tutorial. See also the HTM.java implementation [here] (https://github.com/numenta/htm.java-examples/tree/master/src/main/java/org/numenta/nupic/examples/napi/hotgym).

The flink-htm embodiment is at flink-htm-examples/.../hotgym/HotGym.scala.

Running

Launch the hotgym Flink job:

[flink-htm-examples]$ bin/run-example hotgym.Demo --input file://`pwd`/data/rec-center-hourly-no-header.csv --output file:///tmp/hotgym-output.csv

Examine the output once the job completes. The output consists of four columns - datetime, Consumption (actual), Consumption (predicted), Anomaly Score.

[flink-htm-examples]$ cat /tmp/hotgym-output.csv
07/02/10 00:00,21.2,0.0,1.0
07/02/10 01:00,16.4,21.2,1.0
07/02/10 02:00,4.7,16.4,1.0
07/02/10 03:00,4.7,4.7,1.0
07/02/10 04:00,4.6,4.7,1.0
...
12/31/10 19:00,10.5,21.1,0.65
12/31/10 20:00,5.3,10.5,0.075
12/31/10 21:00,5.1,5.3,0.175
12/31/10 22:00,5.0,5.1,0.025

NYC Traffic Tutorial

Another well-known example in the NuPIC community is the NYC traffic anomalies tutorial. Given a data stream of traffic reports as provided by the NYDOT via the River View service, numerous HTM models are trained (in parallel) to detect anomalous traffic patterns. As seen in the screencast, the models are used to identify highly anomalous routes occurring during a certain time period.

The flink-htm embodiment is at flink-htm-examples/.../traffic/Traffic.scala.

Running

Launch the traffic Flink job:

[flink-htm-examples]$ bin/run-example -p 1 traffic.Demo --output file:///tmp/traffic-output-1.csv

Examine the output once the job completes. The output consists of three columns - datetime, streamId (i.e. a particular route), Anomaly Score.

[flink-htm-examples]$ cat /tmp/traffic-output-1.csv
2015-08-03T00:55:25.000-04:00,204,1.0
2015-08-03T10:07:25.000-04:00,204,1.0
2015-08-03T11:07:25.000-04:00,204,1.0
...
2015-08-03T15:58:26.000-04:00,446,0.9
2015-08-03T20:18:25.000-04:00,446,0.95
2015-08-03T20:38:25.000-04:00,446,0.9

Increasing Parallelism

In all situations, a separate HTM model will be trained for each traffic route, due to the keyBy operation on the inference stream. To maximize performance, numerous task slots may be leveraged to train those models in parallel. To increase the parallelism, adjust the -p flag passed to the bin/run-example script. Be sure to configure your Flink environment to have sufficient task slots (as described at the top).

[flink-htm-examples]$ bin/run-example -p 4 traffic.Demo --output file:///tmp/traffic-output-4.csv

Note that the job code sets the csv sink's parallelism to 1, to ensure that single file is written. If not for that, numerous csv files would be emitted (one per parallel instance of the sink).

Getting Started

Using

Development

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples

Running the Examples

Hot Gym

Running

NYC Traffic Tutorial

Running

Increasing Parallelism

Clone this wiki locally