etlhelper
is currently at Beta status.
All main features are complete, however contributions and suggestions are
welcome, particularly in the following areas:
- support for more database types (MySQL, IBM Informix)
- additional recipes / case studies
- improved type annotations
- run more integration tests in GitHub Actions
- performance optimisation
- improved documentation
See the issues list for details.
The DbHelper
class provides a uniform interface to different database types.
Implementing a new DbHelper
requires the following:
- New DbHelper class for database
__init__
method imports the driver and defines error typesget_connection_string
andget_sqlalchemy_connection_string
are definedrequired_parameters
defines set of names of parameters required by DbParamsparamstyle
attribute- other specific function overrides are in place (e.g.
executemany
for PostgreSQL,connect
for SQLite.)
- DbHelper class is registered with DB_HELPER_FACTORY
- Tests at
test/integration/db/test_xxx.py
that cover at least theconnect()
andcopy_rows()
functions
See etlhelper/db_helpers/oracle.py and test/integration/db/test_oracle.py for examples.
https://www.github.com/BritishGeologicalSurvey/etlhelper is a the source-of-truth copy of ETL Helper. There is a mirror on the internal BGS GitLab server. It is used to run integration tests against internal Oracle and MS SQL databases. Pull requests and issues should be targeted at the GitHub repository.
- Python 3.9+ virtual environment
- Git
- Docker
Install locally for development by cloning repository and running the following in the root:
pip install -r requirements-dev.txt
pip install -e .
Development files will be linked to the virtual environment/system folder hierarchy allowing changes to take effect directly.
Integration tests require access to a writer account on a PostGIS database. The easiest way to run these is using a Docker container.
A Dockerised PostGIS version can be started with:
export TEST_PG_PORT=5432
export TEST_PG_PASSWORD=etlhelper_pw
docker run -e POSTGRES_USER=etlhelper_user -e POSTGRES_DB=etlhelper \
-e POSTGRES_PASSWORD=$TEST_PG_PASSWORD --name etlhelper_postgis \
-d --rm -p $TEST_PG_PORT:5432 postgis/postgis:15-3.4
Tests are run with:
bash bin/run_tests_for_developer.sh
The test-runner script will run tests within a dedicated container and provide
HTML coverage output. It can be viewed with firefox htmlcov/index.html
.
Additional integration tests can be run against internal BGS Oracle and SQL Server databases. The DbParams for these databases are defined by environment variables stored within the Continuous Integration system. To run these:
- Go to Settings > CI / CD > Variables in GitLab
- Use "Reveal values" and copy the contents of TEST_ENV_VARS
- Paste into the terminal to set environment variables
- Run tests as before
The documentation is created using Sphinx. To locally build the documentation, run the following:
sphinx-build -a docs docs/_build
The documentation can then be viewed at docs/_build/index.html
Releases are created manually from the main branch via tags. This should be done via the GitHub web interface. The GitLab CI system will automatically perform the release following the next repository mirror synchronisation.