The Cortex Profiles SDK is collection of Java/Kotlin libraries, examples, and templates for utilizing Cortex Fabric in a Spark based environment, either on your local instance or in a Cortex Fabric Cluster.
These examples are structured in step-by-step way to display the array of usages of the currently available features in the Cortex Profiles SDK.
The core of the Profiles SDK is a library that exposes an interface to Cortex for utilizing Spark for custom processing
of Profile related data. The entrypoint to the Profiles SDK is the CortexSession
, a session based API around Spark and the SparkSession
.
The Profiles SDK provides:
- An extensible dependency injected platform that allows for process, module, and environment (local vs in Cortex cluster) specific configuration
- Access to Cortex Catalog
- Access to Cortex Backend Storage (e.g. Managed Content and Profiles)
- Configurable provider for Cortex Secrets
- Stream and batch processing support for Cortex Connections
- Access to Cortex Fabric job flows for ingesting Data Sources and building Profiles
- Spark property-based configuration options
- A Cortex Skill Template with a spark-submit based launcher
The Cortex Profiles SDK consists of:
- The Profiles SDK jar file (
com.c12e.cortex.profiles:profiles-sdk
) - Platform dependencies jar file (
com.c12e.cortex.profiles:platform-dependencies
) - Example materials and templates located in this repo
The Profile SDK jar files can be pulled from CognitiveScale's JFrog Artifactory if access has been shared with you. Follow the JFrog Artifactory Developer Setup.
JVM Settings can be set via the GRADLE_OPTS
environment variable:
export GRADLE_OPTS="-Dorg.gradle.jvmargs='-Xmx2g -XX:MaxMetaspaceSize=512m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+OptimizeStringConcat'"
Alternatively, you can update the $USER_HOME/.gradle/gradle.properties
file by adding the following lines. Create the
file if it does not already exist.
org.gradle.jvmargs=-Xmx2g -XX:MaxMetaspaceSize=512m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+OptimizeStringConcat
- Install Java 11 using the Resources section.
- Obtain JFrog Artifactory credentials (shared in LastPass with everyone in
Shared-Engineering
folder). - Install IntelliJ IDEA with the latest Kotlin plugin enabled (Intellij IDEA).
- Put JFrog Artifactory credentials in
$USER_HOME/.gradle/gradle.properties
file. (Seegradle.properties.template
for instructions.)
To work with a local (developer) installation of the Cortex Profiles SDK see dev.md.
Examples are structured to build upon themselves and grow in complexity. Each provides its own instructions for running as well as additional context. The top level main-app is a CLI wrapper around the other examples:
Sequence | Example | Description |
---|---|---|
1 | Using Local Cortex Clients | This example is intended to introduce working with the Cortex Profiles SDK in local development environment. |
2 | Join Two Connections | This example is a CLI application for Joining two Cortex Connections and saving the resulting dataset to another Cortex Connection. |
3 | Refresh a DataSource | This example is a CLI application for refreshing a Data Source by reading its Cortex Connection and writing the dataset to the Data Source. |
4 | Build Profiles | This example is a CLI application for building Cortex Profiles. |
5 | Streaming to a Data Source | This example contains a CLI application for refreshing a Data Source via streaming. |
6 | Using a CData Connection | This example is a CLI application for reading data from a JDBC CData Cortex Connection and writing that data to a separate Connection. |
7 | Reading From BigQuery | This example is a CLI application that writes data from a Google BigQuery Table to the location of a Cortex Connection. This builds off of the Local Clients example for its initial setup. |
8 | Caching Profiles | This example is a CLI application that writes Profiles Data from a Delta table to Redis, for real-time profile fetch. This builds off of the Local Clients and Build Profilesexample for its initial setup. |
9 | Profiles Daemon for Realtime Query | This example is a Spring API server application that exposes some APIs for realtime Profile fetch. This works in conjunction with Cache Profile example for its initial setup. |
10 | KPI Queries | This CLI Application enables users to be able to evaluate KPI expression written in Javascript, through the profiles-sdk, similar to KPI Dashboard. The goal is to provide an interface over profiles to evaluate straight forward KPI expressions or to define cohorts on the profiles to write complex KPI expressions to be aggregated over certain window duration between a timeframe. This example builds off of the Local Clients and Build Profiles example for its initial setup. |
11 | Filter and Aggregate Query examples | This CLI Application showcases filter and aggregate queries for member-profile Profile Schema using profiles-sdk. This example builds off of the Local Clients and Build Profilesexample for its initial setup. |
12 | Catalog Management | This example is a CLI application that uses a secondary configuration, app-config.json, to define a number of catalog entities to be managed during execution. The Connections/Data Sources/Profile Schema in the app-config.json are created with the attributes defined in the configuration and are then available to use through the Profiles SDK. |
picocli is used by each example to create a minimal CLI application for running the example. Refer to the instructions in each example.
The examples are structured as a Gradle multi-project build.
To include a new project in the example Profiles application you will need to:
- Create a new Java module. Ensure the new project is included in the
settings.gradle.kts
file. - Include the
com.c12e.cortex.profiles:profiles-sdk
andcom.c12e.cortex.profiles:platform-dependencies
as api dependencies in your configuration. You can refer to the join-connections/build.gradle.kts for an example setup including the Profiles SDK, picocli, and Junit dependencies. - (Optional) Include a main CLI entrypoint in your module using picocli.
- Include your project in the main application.
- Add your project as a dependency of the
main-app
. Inmain-app/build.gradle.kts
addimplementation(project(":<your-project>"))
todependencies
. - Add your project source to
main-app
jar file. Inmain-app/build.gradle.kts
addfrom(project(":<your-project>"))
to theJar
task (tasks.withType<Jar>
). - (Optional) If you included a CLI entrypoint in your module, then you can list it as a subcommand by importing the class in Application.java. Refer to the existing subcommands in Application.java for an example on how to include the class as a subcommand.
- Add your project as a dependency of the
The Skill Template directory contains files for packaging as Cortex Job Skill, where:
- The input to the skill is a JSON Payload with the path to Spark Configuration File.
- The output of the skill is the Job execution logs.
- The Docker image for the
main-app
uses a spark-submit based wrapper to launch the Spark application. The resources for thespark-submit
wrapper is in the main-app/src/main/resources/python/ directory and is necessary for packaging the Skill.
NOTE: The ENTRYPOINT
for the Docker image is scuttle. When running application
in a Docker container locally, you should set the --entrypoint
option.
- Before creating the Skill, you will need to:
- Set the Private Registry URL accessible from Cortex as an environment variable,
export DOCKER_PREGISTRY_URL=...
. - Set the Name of the Project to save the Skill, Action, and Types,
export PROJECT_NAME=xxxx
. - Set the Cortex Token authenticating to Cortex,
export CORTEX_TOKEN=xxxx
. - Update the spark-conf.json with the CLI application command and other config options.
- Verify the Skill's
payload.json
file refers to the above Spark configuration file (in the built container).
-
Build the Skill.
make all
The final output should look similar to:
docker build --build-arg base_img=c12e/spark-template:profile-jar-base-6.3.2-rc.2 -t profiles-example:latest -f ./main-app/build/resources/main/Dockerfile ./main-app/build [+] Building 1.5s (17/17) FINISHED => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 37B 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for docker.io/c12e/spark-template:profile-jar-base-6.3.2-rc.2 1.0s => [auth] c12e/spark-template:pull token for registry-1.docker.io 0.0s => FROM docker.io/redboxoss/scuttle:latest 0.4s => => resolve docker.io/redboxoss/scuttle:latest 0.4s => [internal] load build context 0.0s => => transferring context: 1.16kB 0.0s => [stage-0 1/9] FROM docker.io/c12e/spark-template:profile-jar-base-6.3.2-rc.2@sha256:331f93e1290442934adbd14e904740ef458d2ea012c3288d689608e9202899dd 0.0s => [auth] redboxoss/scuttle:pull token for registry-1.docker.io 0.0s => CACHED [stage-0 2/9] COPY --from=redboxoss/scuttle:latest /scuttle /bin/scuttle 0.0s => CACHED [stage-0 3/9] COPY ./resources/main/python/ . 0.0s => CACHED [stage-0 4/9] RUN pip3 install -r requirements.txt 0.0s => CACHED [stage-0 5/9] COPY ./libs/main-app-1.0.0-SNAPSHOT.jar /app/libs/app.jar 0.0s => CACHED [stage-0 6/9] COPY ./libs/main-app-1.0.0-SNAPSHOT.jar /opt/spark/jars 0.0s => CACHED [stage-0 7/9] COPY ./resources/main/spark-conf /opt/spark/conf 0.0s => CACHED [stage-0 8/9] COPY ./resources/main/lib/*.jar /opt/spark/jars/ 0.0s => CACHED [stage-0 9/9] COPY ./resources/main/conf /app/conf 0.0s => exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:94dd0d11a31abcd3d23d97b003ffceb30e9eafc38c77dba9f3b92d6ea8633526 0.0s => => naming to docker.io/library/profiles-example:latest 0.0s docker tag profiles-example:latest private-registry.dci-dev.dev-eks.insights.ai/profiles-example:latest docker push private-registry.dci-dev.dev-eks.insights.ai/profiles-example:latest The push refers to repository [private-registry.dci-dev.dev-eks.insights.ai/profiles-example] 12898f60c37f: Layer already exists ece5a5cb892e: Layer already exists 4bc783276212: Layer already exists e310b2eec001: Layer already exists b995497ccd6e: Layer already exists 4b156e5303b4: Layer already exists 2012777427ee: Layer already exists 4fae41f79235: Layer already exists ea16b97b6399: Layer already exists 7628da35a3c9: Layer already exists 5f70bf18a086: Layer already exists 7723dc94285f: Layer already exists latest: digest: sha256:aa4c3a4dc42a4af55ab0eac6d5bdc3c226828b133013bdea21d297698b43471d size: 2830 cortex types save -y templates/types.yaml --project testi-69257 Type definition saved cortex actions deploy --actionName profiles-example --actionType job --docker private-registry.dci-dev.dev-eks.insights.ai/profiles-example:latest --project laguirre-testi-69257 --cmd '["scuttle", "python", "submit_job.py"]' --podspec ./templates/podspec.yaml { "success": true, "action": { "_isDeleted": false, "_projectId": "testi-69257", "_createdBy": "[email protected]", "name": "profiles-example", "description": "", "image": "private-registry.dci-dev.dev-eks.insights.ai/profiles-example:latest", "type": "job", "command": [ "scuttle", "python", "submit_job.py" ], "scaleCount": 1, "podSpec": "[{\"path\":\"/containers/0/imagePullPolicy\",\"value\":\"Always\"}]", "jobTimeout": 0, "k8sResources": [], "environmentVariables": null, "createdAt": "2022-07-15T22:33:46.250Z", "updatedAt": "2022-07-15T22:33:46.250Z", "_version": 8 } } cortex skills save -y templates/skill.yaml --project testi-69257 Skill saved: {"success":true,"version":8,"message":"Skill definition profiles-example saved."}
-
Invoke the Skill.
make invoke
Example Output:
cortex skills invoke --params-file templates/payload.json profiles-example params --project laguirre-testi-69257 { "success": true, "activationId": "115d1196-408a-47fa-91c4-6f8e8a391641" }
-
Run
cortex agents get-activation <activation-id>
orcortex tasks logs <task-name>
to view logs of the skill activation.