This software is intended to be a simple (non production ready) processor for apache nifi server, using Greenplum Streaming Service functionalities.
It is written in Java and it uses the following technologies: Apache nifi, java, GRPC, Greenplum GPSS.
At the moment it is just supporting json format. The processor is receiving .json entries from a nifi relation and ingest in a Greenplum table.
The following reading can help you to better understand the software:
Apache Nifi:
https://nifi.apache.org/
GRPC:
https://grpc.io/
Greenplum GPSS:
https://gpdb.docs.pivotal.io/5160/greenplum-stream/overview.html
https://gpdb.docs.pivotal.io/5160/greenplum-stream/api/dev_client.html
These are the steps to run the software:
-
Activate the gpss extension on the greenplum database you want to use (for example test)
test=# CREATE EXTENSION gpss;
-
Create the Greenplum table to be ingested
Create a table with a json data field (called data)
test=# create table test(data json);
-
Run a gpss server with the right configuration (ex):
gpss ./gpsscfg1.json --log-dir ./gpsslogs where gpsscfg1.json
{ "ListenAddress": { "Host": "", "Port": 8085, "SSL": false }, "Gpfdist": { "Host": "", "Port": 8086 } }
-
download, install and start nifi
- Copy the .nar file to the nifi lib directory
The nifi processor is written in Java. Maven will automatically create a .nar file to be deployed in nifi. Copy the .nar file in ./nifi-gpss-nar/target/nifi-gpss-nar-1.0-SNAPSHOT.nar inside your nifi lib directory
- restart nifi
Once copied restart nifi
- insert the processor in the nifi UI
- Setting property of the processor
Password can be null. All the other properties must be specified.
NumberOfItemsToBatch specify if the components need to batch items before ingesting. In this case is 5 so the processor needs to receive at least 5 json entries before ingesting.
For pure streaming way you can set it to 1.
Also set the processor to be a terminated one.
- Add a GetFile processor as a tester
- Create a relashionship
- Start the two processors
You can stop and restart the processor whenever you want.
- Put a populated json file inside the test directory you specified in the Get file
You can copy several one line files or you can submit a file with a number of json (one every line).
- Have a look to the application logs of nifi and see the greenplum tables populated
test=# select * from test;
data
---------------------------------------------------
{"name": "John", "age": "31", "city": "New York"}
{"name": "John", "age": "31", "city": "New York"}
{"name": "John", "age": "31", "city": "New York"}
{"name": "John", "age": "31", "city": "New York"}
{"name": "John", "age": "31", "city": "New York"}
(5 rows)
The software is based on maven. to build the project you can just:
mvn build
in the main directory.
This will create a .nar file inside ./nifi-gpss-nar/target that you can deploy on nifi.