Skip to content

pipelines for data sync of Jewish data sources to the DB of the muesum of the Jewish people

License

Notifications You must be signed in to change notification settings

Libisch/mojp-dbs-pipelines

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datapackage pipelines for The Museum of The Jewish People

Travis

Pipelines for data sync of Jewish data sources to the DB of The Museum of The Jewish People

Uses the datapackage pipelines framework

Overview

This project provides pipelines that sync data from multiple external sources to the MoJP Elasticsearch DB.

Running the full pipelines environment using docker

  • Install Docker and Docker Compose (refer to Docker guides for your OS)
  • cp .docker/docker-compose.override.yml.example.full docker-compose.override.yml
  • edit docker-compose.override.yml and modify settings (most likely you will need to set the CLEARMASH_CLIENT_TOKEN
  • bin/docker/build_all.sh
  • bin/docker/start.sh

This will provide:

  • Pipelines dashboard: http://localhost:5000/
  • PostgreSQL server: postgresql://postgres:123456@localhost:15432/postgres
  • Elasticsearch server: localhost:19200
  • Data files under: .docker/.data

After every change in the code you should run bin/docker/build.sh && bin/docker/start.sh

Additional features:

Running the tests using docker

  • Build the tests image
    • bin/docker/build_tests.sh
  • Run the tests
    • bin/docker/run_tests.sh
  • Make changes to the code
  • Re-run the tests (no need to build again in most cases)
    • bin/docker/run_tests.sh

Running the pipelines locally

Make sure you have Python 3.6 in a virtualenv

  • bin/install.sh
  • cp .env.example.full .env
  • modify .env as needed
    • most likely you will need to connect to the db / elasticsearch instances
    • the default file connects to the docker instances, so if you ran bin/docker/start.sh it should work as is
  • source .env
  • export DPP_DB_ENGINE=$DPP_DB_ENGINE
  • bin/test.sh
  • dpp

Available Data Sources

Clearmash

Clearmash is A CMS system which is used by MoJP for the MoJP own data

Clearmash exposes an API to get the data

relevant links and documentation (clearmash support site requires login)

About

pipelines for data sync of Jewish data sources to the DB of the muesum of the Jewish people

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.7%
  • Shell 2.3%