Skip to content

Technical architecture

Quentin Gérôme edited this page Apr 29, 2024 · 8 revisions

OpenHEXA is a data integration platform composed of a series of components:

  • The OpenHEXA backend, usually called openhexa-app for historical reasons, a Python/Django application providing a GraphQL API, a data pipelines' orchestration engine and user management capabilities
  • The OpenHEXA frontend (openhexa-frontend), a Typescript/React/Next.js application providing the OpenHEXA user interface on top of the backend
  • The OpenHEXA notebooks environment (see openhexa-notebooks), a heavily customized JupyterHub/JupyterLab setup running the same image as the pipelines environment

In terms of data storage, we have to make a distinction between:

  • Application data storage, which resides in a PostgreSQL database
  • Workspace storage or user storage (see User manual for more information about workspaces), which is stored either in PosgtreSQL databases or in Object Storage buckets (Google Cloud Storage, AWS S3 or Minio)

When running code using Jupyter notebooks or OpenHEXA data pipelines, technical users can leverage the OpenHEXA Python SDK to interact with the OpenHEXA backend (see openhexa-sdk-python).

Notebooks and data pipelines typically run in containers using one of our Docker images (see openhexa-docker-images) or a custom one set by workspace.

The whole OpenHEXA stack is meant to be deployed in a Kubernetes cluster, so that notebooks and pipelines run in isolated environments and leverage the auto-scaling capabilities offered by Kubernetes.

architecture

Clone this wiki locally