-
Notifications
You must be signed in to change notification settings - Fork 37
Examples
Wiki ▸ Examples
A few examples are provided to familiarize you with using NuPIC in a Flink environment.
- Install Flink 1.3.x, and set FLINK_HOME to the install location.
- Start a local Flink environment.
- Browse to
http://localhost:8081/
[flink]$ bin/start-local.sh
Starting jobmanager daemon on host host1.local
Consider increasing the number of task slots in the local environment, by editing the taskmanager.numberOfTaskSlots
property in $FLINK_HOME/conf/flink-conf.yaml
.
The provided bin/run-example
script launches the example program into the local Flink environment, using flink run
.
Note that the examples are also runnable in your IDE. In that situation an embedded Flink environment is automatically used.
The Hot Gym example is based on the well-known example in NuPIC: One Hot Gym Prediction Tutorial. See also the HTM.java implementation [here] (https://github.com/numenta/htm.java-examples/tree/master/src/main/java/org/numenta/nupic/examples/napi/hotgym).
The flink-htm embodiment is at flink-htm-examples/.../hotgym/HotGym.scala.
Launch the hotgym Flink job:
[flink-htm-examples]$ bin/run-example hotgym.Demo --input file://`pwd`/data/rec-center-hourly-no-header.csv --output file:///tmp/hotgym-output.csv
Examine the output once the job completes. The output consists of four columns - datetime, Consumption (actual), Consumption (predicted), Anomaly Score.
[flink-htm-examples]$ cat /tmp/hotgym-output.csv
07/02/10 00:00,21.2,0.0,1.0
07/02/10 01:00,16.4,21.2,1.0
07/02/10 02:00,4.7,16.4,1.0
07/02/10 03:00,4.7,4.7,1.0
07/02/10 04:00,4.6,4.7,1.0
...
12/31/10 19:00,10.5,21.1,0.65
12/31/10 20:00,5.3,10.5,0.075
12/31/10 21:00,5.1,5.3,0.175
12/31/10 22:00,5.0,5.1,0.025
Another well-known example in the NuPIC community is the NYC traffic anomalies tutorial. Given a data stream of traffic reports as provided by the NYDOT via the River View service, numerous HTM models are trained (in parallel) to detect anomalous traffic patterns. As seen in the screencast, the models are used to identify highly anomalous routes occurring during a certain time period.
The flink-htm embodiment is at flink-htm-examples/.../traffic/Traffic.scala.
Launch the traffic Flink job:
[flink-htm-examples]$ bin/run-example -p 1 traffic.Demo --output file:///tmp/traffic-output-1.csv
Examine the output once the job completes. The output consists of three columns - datetime, streamId (i.e. a particular route), Anomaly Score.
[flink-htm-examples]$ cat /tmp/traffic-output-1.csv
2015-08-03T00:55:25.000-04:00,204,1.0
2015-08-03T10:07:25.000-04:00,204,1.0
2015-08-03T11:07:25.000-04:00,204,1.0
...
2015-08-03T15:58:26.000-04:00,446,0.9
2015-08-03T20:18:25.000-04:00,446,0.95
2015-08-03T20:38:25.000-04:00,446,0.9
In all situations, a separate HTM model will be trained for each traffic route, due to the keyBy
operation on the inference stream. To maximize performance, numerous task slots may be leveraged to train those models in parallel. To increase the parallelism, adjust the -p
flag passed to the bin/run-example
script. Be sure to configure your Flink environment to have sufficient task slots (as described at the top).
[flink-htm-examples]$ bin/run-example -p 4 traffic.Demo --output file:///tmp/traffic-output-4.csv
Note that the job code sets the csv sink's parallelism to 1, to ensure that single file is written. If not for that, numerous csv files would be emitted (one per parallel instance of the sink).
Getting Started
Using
Development