Datahub Configuration Deployment: mim-deploy

mim-deploy is a cli to deploy a datahub configuration from a git repo to the Mimiro datahub. It creates a manifest and stores it in the datahub under the content-endpoint and uses this to compare file updates with previous md5 file hashes. Based on the comparison, it creates a list of operations and utilizes the mim cli client to execute them.

Expected configuration file structure

├── README.md
├── contents
│   ├── S3
│   │   └── content-s3.json
│   └── mysystem
│       └── content-mysystem.json
├── dataset
│   ├── myfolder
│   │   └── my-dataset.json
│   └── my-other-dataset.json
├── environments
│   ├── variables-dev.json
│   ├── variables-prod.json
│   └── variables-test.json
├── jobs
│   ├── import-mysystem-owner.json
│   ├── import-mysystem-order.json
│   └── order-myothersystem.json
└── transforms
    ├── myTransform.js
    └── order-myothersystem.js

Required configuration changes

Jobs and content need a type property with either "job" or "content" as value.
Jobs with transform need to have a "path" property containing the relative path for the transform file inside the transform directory.

{
    "id" : "import-mysystem-owner",
    "type": "job",
    "triggers": [
        {
            "triggerType": "cron",
            "jobType": "incremental",
            "schedule": "@every 120m"
        }
    ],
    "paused": "{{ myVariable }}",
    "source" : {
        "Type" : "HttpDatasetSource",
        "Url" : "http://localhost:4343/datasets/Owner/changes"
    },
    "sink" : {
        "Type" : "DatasetSink",
        "Name": "mysystem.Owner"
    },
    "transform": {
        "Path": "myTransform.js",
        "Type": "JavascriptTransform"
    }
}

Template functionality

Variables

To use variables in your config files, you can replace a value with {{ myVariable }}. The program will then look for this variable in the json file defined by the ENVIRONMENT_FILE env variable.

Example

ROOT_PATH/environments/variables-dev.json

{
  "myVariable": true
}

ROOT_PATH/jobs/import-mysystem-owner.json

{
    "id" : "import-mysystem-owner",
    "type": "job",
    "triggers": [
        {
            "triggerType": "cron",
            "jobType": "incremental",
            "schedule": "@every 120m"
        }
    ],
    "paused": "{{ myVariable }}",
    "source" : {
        "Type" : "HttpDatasetSource",
        "Url" : "http://localhost:4343/datasets/Owner/changes"
    },
    "sink" : {
        "Type" : "DatasetSink",
        "Name": "mysystem.Owner"
    }
}

Include file content

If you have a large configuration file you want to split up into multiple files, you can achieve that by using the include syntax:

{
    "baseNameSpace": "http://data.mimiro.io/mysystem/",
    "baseUri": "http://data.mimiro.io/mysystem/",
    "database": "mydb",
    "id": "db1",
    "type": "content",
    "tableMappings": "{% include list('contents/mysystem/tableMappings/*.json') %}"
}

This will then join all matching json files into a list and push the generated json config to the datahub. If you wish to only add a single object and not a list, you can instead write like this:

{
    "tableMappings": "{% include 'contents/mysystem/tableMappings/owner.json' %}"
}

If a wildcard is used in the file path, and it matches more than one file, it will automatically add the content as a list.

Ignore paths from deployment

To ignore specific paths from being deployed add the environment variable:

--ignorePath ../datahub-config/<path_to_ignore>

to your bash command

Dataset creation and public namespaces

When you define a DatasetSink, the named dataset will be created when the configuration is deployed to the datahub.

Public namespaces

If you need to define public namespaces for the dataset used by the sink, this can be defined in the job like this.

    "sink": {
        "Type": "DatasetSink",
        "Name": "mysystem.Owner",
        "publicNamespaces": [
            "http://data.mimiro.io/owner/event/",
            "http://data.mimiro.io/people/birthdate/"
        ]
    }

If the dataset is already created, the dataset will be updated with the defined public namespaces.

Create dataset and upload entities stored in your config

In some cases we need to datasets that we manually create and can't be read from a different source. This can be achieved by adding files under the dataset directory. Files in there need to structured like this:

{
    "type": "dataset",
    "datasetName": "cima.AnimalType",
    "publicNamespaces": [],
    "entities": [
        {
            "id": "@context",
            "namespaces": {
                "ns1": "http://data.mimiro.io/cima/",
                "ns2": "http://data.mimiro.io/sdb/animaltype/",
                "ns3": "http://www.w3.org/2000/01/rdf-schema#"
            }
        },
        {
            "id": "ns2:cow",
            "refs": {
                "ns3:type": "ns1:AnimalType"
            },
            "props": {
                "ns1:name": "Cow"
            }
        },
        {
            "id": "ns2:pig",
            "refs": {
                "ns3:type": "ns1:AnimalType"
            },
            "props": {
                "ns1:name": "Pig"
            }
        }
    ]
}

How to run

The following configuration properties can either be set by environment variables or by changing the .env file

Build binary

make mim-deploy

Deploy to local datahub

mim-deploy http://localhost:8080 --path ../datahub-config --env ../datahub-config/environments/variables-local.json --dry-run=false

Deploy to remote datahub

mim login dev --out | mim-deploy https://dev.api.example.com --token-stdin --path ../datahub-config --env ../datahub-config/environments/variables-dev.json --dry-run

Build docker image

make docker

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
cmd/deploy		cmd/deploy
internal		internal
.gitignore		.gitignore
CODE-OF-CONDUCT.md		CODE-OF-CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.go		config.go
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datahub Configuration Deployment: mim-deploy

Expected configuration file structure

Required configuration changes

Template functionality

Variables

Include file content

Ignore paths from deployment

Dataset creation and public namespaces

Create dataset and upload entities stored in your config

How to run

Build binary

Deploy to local datahub

Deploy to remote datahub

Build docker image

About

Releases 24

Packages

Contributors 4

Languages

License

mimiro-io/datahub-config-deployment

Folders and files

Latest commit

History

Repository files navigation

Datahub Configuration Deployment: mim-deploy

Expected configuration file structure

Required configuration changes

Template functionality

Variables

Include file content

Ignore paths from deployment

Dataset creation and public namespaces

Create dataset and upload entities stored in your config

How to run

Build binary

Deploy to local datahub

Deploy to remote datahub

Build docker image

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 24

Packages 0

Contributors 4

Languages

Packages