Example project which combines messages from Pub/Sub and data from BigQuery. There is one
message queue: sensor
. This is a stream of numbers which - for whatever reason -
needs to be summed over a window of 5 seconds. The sum is checked against a lower and upper bound as
described in a BigQuery table (see ./data/numbers.csv
). The output will be a
record in a BQ table, which looks as follows:
update_time |
number_sum |
number_count |
number_type |
version |
---|---|---|---|---|
1234 | 49 | 5 | tens | run-1 |
1235 | 490 | 100 | hundreds | run-1 |
The flow schema is as follows:
A few things need to be set up on the Google Cloud Platform:
- One topic in Cloud Pub/Sub:
sensor
- One subscription in Cloud Pub/Sub:
sensor
- Import
./data/numbers.csv
into a BigQuery table callednumbers
. - Update
application.conf
with your configuration.
The pipeline is deployed by running:
$ sbt "runMain example.SourceMerge"
Messages can be sent with the client
, which is also included in this repository. See its README.md.
This project makes use of Protobuf in combination with Scio/Dataflow: