Open-source Data integration platform
OpenHEXA is an open-source data integration platform developed by Bluesquare.
Its goal is to facilitate data integration and analysis workflows, in particular in the context of public health projects.
OpenHEXA allows you to:
- Create workspaces to group code, data and users
- Upload and read files from a shared filesystem
- Write and read to a PostgreSQL database
- Use Jupyter notebooks to explore and analyze data
- Run and schedule complex data workflows using data pipelines
- Manage your team members
Please note that this repository does not contain any code: it is a starting point for OpenHEXA users and implementers. Please refer to the technical architecture page of our wiki for more information about the different OpenHEXA components, including the links to the relevant GitHub repositories.
The OpenHEXA documentation lives in our wiki.
To get started, you might be interested in the following pages:
You can find the public roadmap here.
Please report bugs in the issues section of this repository: https://github.com/BLSQ/openhexa/main/issues.
Feel free to reach out in the discussions section if you have questions or suggestions!
Requirements:
- a least Docker 26.1
- Debian bookworm
- Debian packages
gettext-base
,postgresql
(14+),postgresql-<postgresql version>-postgis-3
,duplicity
(optional to manage backup and restore) - yq
After having cloned this repo and change your current dir to it, you can check your installation by running first
./script/setup.sh check
It'll tell you that the .env
is missing, that is expected as it's the next
step.
Then, you need to setup the environment and the database. To do so execute the following command
./script/setup.sh all
This will generate a file in the working directory: .env
(ee below to
know more about the configuration properties).
Then you can prepare the database and environment with
./script/openhexa.sh prepare
Finally, you can run openhexa with
./script/openhexa.sh start
To stop, execute
./script/openhexa.sh stop
If you need to purge the configuration and the database after having stopped it, you can do it by executing the following command
./script/openhexa.sh purge
Once installed, it could be interesting to make sure you have the last version. You can update openhexa with
./script/openhexa.sh update
To build the Debian package, you need to run on a Debian like Linux distribution
and the following packages are required: devscripts
, debhelper
,
build-essential
. To install them, run the following command:
sudo apt install devscripts debhelper build-essential
Notice this requires super user right (that's what sudo
gives you).
When all the requirements are met, run the following script to build the package:
./script/build.sh
The script will check the requirements. Notice that it works with your Git working copy, and all your stage need to be clean. So, if you have any changes, commit or stash them before running the script.
The resulting package is available in the parent directory:
../openhexa_1.0-1_amd64.deb
.
Requirements:
- a least Docker 26.1
- Debian bookworm
- Systemd
- yq
First of all, you need to add our APT repository and GPG public key:
curl -fsSL https://raw.githubusercontent.com/blsq/openhexa/refs/heads/main/pubkey.gpg | sudo gpg --yes --dearmor --output /usr/share/keyrings/openhexa.gpg
echo "deb [signed-by=/usr/share/keyrings/openhexa.gpg] https://viz.bluesquare.org/openhexa/ bookworm main" | sudo tee /etc/apt/sources.list.d/openhexa.list
Make sure your locales are correctly set with locale
. A common setup is
# Set locale
sudo tee -a /etc/default/locale > /dev/null <<EOF
LC_ALL=C.UTF-8
LC_TYPE=C.UTF-8
LC_MESSAGE=C.UTF-8
LC_COLLATE=C.UTF-8
EOF
source /etc/default/locale
Then, you can update your APT database and install openhexa
sudo apt update
sudo apt install openhexa
If you want to manage backup and retore through our script, you can install it
with recommended packages sudo apt install --install-recommends openhexa
.
If you have Systemd, OpenHexa is run as a Systemd service openhexa
(that you
can then manage with systemctl
). If you don't use Systemd, you can still run
the service by running /usr/share/openhexa/openhexa -g start
.
When installed, the Systemd service OpenHexa is started. If you need to get its
status, stop it, restart it, or start it, you can do it with systemctl
.
A command is also installed to ease the interaction with OpenHexa:
/usr/share/openhexa/openhexa.sh
. To get its usage documentation, run:
/usr/share/openhexa/openhexa.sh help
If you want to interact with an OpenHexa installed globally on the system,
you'll have to use the option -g
, or it'll try to interact with the version
in your current directory. For instance, to get its status, you can execute:
/usr/share/openhexa/openhexa.sh -g status
The installation will also sets up the environment, especially the PostgreSQL
database. The configuration is stored in the file /etc/openhexa/env.conf
(see below for more information about the configuration properties). If you
need to change or add, you can directly change this file, then restarts
OpenHexa with sudo systemctl restart openhexa
.
If you need to set it up again, check the installation, or purge the environment
(database and configuration), you can use the tool
/usr/share/openhexa/setup.sh
. To get its usage documentation, run:
/usr/share/openhexa/setup.sh help
During the setup, the following is done on the PostgreSQL side:
- create 2 databases
hexa-app
, andhexa-hub
. The first one is used by the OpenHexa app, the second to manage the notebooks. - create 1 superuser
hexa-app
, owner ofhexa-app
. - create 1 superuser
hexa-hub
, owner ofhexa-hub
. - make PostgreSQL listens on the Docker gateway IP address.
- authorize all users to connect to
hexa-app
from the entire Docker subnetwork with encrypted password authentication. - authorize
hexa-hub
to connect tohexa-hub
from the entier Docker subnetwork with encrypted password authentication.
You can manage your backup and restore directly with OpenHexa. It will backup
all the workspaces data, and all databases. This relies on the tool duplicity
.
Make sure that it is installed if you haven't installed it yet (if you install
OpenHexa with apt
, do it with the recommended packages).
First, you need to set it up:
/usr/share/openhexa/setup.sh backup /mylocaldirecotry/where/to/do/thebackup/ encryption_passkey
Then you can back up the data with:
/usr/share/openhexa/openhexa.sh backup
Depending on the user activities, it might be a good idea to stop the service or simply redirect the website to a maintenance HTML page.
To restore the data, you execute the following:
/usr/share/openhexa/openhexa.sh backup
In this case, we advise you to stop the service before performing a full restore.
Locally, we use Minio to manage the storage. It provides a AWS S3 compatible
API. To access to it, you need to provide a key Id and a secret:
WORKSPACE_STORAGE_ENGINE_AWS_ACCESS_KEY_ID
and
WORKSPACE_STORAGE_ENGINE_AWS_SECRET_ACCESS_KEY
.
Finally, we need the port number where the local PostgreSQL cluster listens:
DB_PORT
In order to be able to send mails to users, you have to provide the configuration options:
EMAIL_HOST
EMAIL_PORT
EMAIL_HOST_USER
EMAIL_USE_TLS
EMAIL_HOST_PASSWORD
To test if OpenHexa has been correctly installed, you can run smoke tests that will check minimum operation. To learn how to do so, please read its dedicated README.
We use Github Actions to automate the package building and its tests. If you
want to run our workflows locally, you can use act
as it follows:
act --action-offline-mode push
Warning: Make sure to remove your local .env
before running it as act
copies your working copy rather than using the checking out action. When it
happens, it overrides other environment files that are provided to the compose
project, which is used to configure it (/etc/openhexa/env.conf
).
Locally, we use Minio to manage the storage. It provides a AWS S3 compatible
API. To access to it, you need to provide a key Id and a secret:
WORKSPACE_STORAGE_ENGINE_AWS_ACCESS_KEY_ID
and
WORKSPACE_STORAGE_ENGINE_AWS_SECRET_ACCESS_KEY
.
Finally, we need the port number where the local PostgreSQL cluster listens:
DB_PORT
The following requires you the following:
- a machine with a public IP address,
- a domain name for which you manage the zone,
- the NGINX service,
Create a file /etc/nginx/sites-available/openhexa
with the following content
(replace example.com
with your domain name):
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
server_name example.com;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
location ~ ^/(?<root_path>hub|user)(?<path>/.*)? {
rewrite ^ /$root_path$path break;
proxy_pass http://localhost:8001;
# websocket headers
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
proxy_buffering off;
}
location / {
proxy_pass http://localhost:3000;
}
}
Enable and check it:
sudo ln -s /etc/nginx/sites-available/openhexa /etc/nginx/sites-enabled/
sudo nginx -t
You need to update on OpenHexa config in /etc/openhexa/env.conf
:
TRUST_FORWARDED_PROTO="false"
PROXY_HOSTNAME_AND_PORT=example.com
INTERNAL_BASE_URL=http://app:8000
FRONTEND_PORT=3000
JUPYTERHUB_PORT=8001
Finally, restart NGINX and OpenHexa:
sudo systemctl restart openhexa nginx
You can browse now OpenHexa app at http://example.com
.
Additionnaly, you need a certificate. The way it has been retrieved is up to the reader. For the rest, follow the same playbook, except to use the following config
in /etc/nginx/sites-available/openhexa
:
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
server_name example.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl;
server_name example.com;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt;
ssl_certificate_key /etc/ssl/private/nginx-selfsigned.key;
ssl_protocols TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
ssl_ecdh_curve secp384r1;
ssl_session_timeout 10m;
ssl_session_cache shared:SSL:10m;
ssl_session_tickets off;
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
add_header X-Frame-Options SAMEORIGIN;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
location ~ ^/(?<root_path>hub|user)(?<path>/.*)? {
rewrite ^ /$root_path$path break;
proxy_pass http://localhost:8001;
# websocket headers
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
proxy_buffering off;
}
location / {
proxy_pass http://localhost:3000;
}
}
and in /etc/openhexa/env.conf
TRUST_FORWARDED_PROTO="true"
PROXY_HOSTNAME_AND_PORT=example.com
INTERNAL_BASE_URL=http://app:8000
FRONTEND_PORT=3000
JUPYTERHUB_PORT=8001