Website · Docs · Example · Report Bug · Request Feature · Discord · X · LinkedIn
Bemi plugs into Django and your existing PostgreSQL database to track data changes automatically. It unlocks robust context-aware audit trails and time travel querying inside your application.
Designed with simplicity and non-invasiveness in mind, Bemi doesn't require any alterations to your existing database structure. It operates in the background, empowering you with data change tracking features.
This library is an optional Django integration, enabling you to pass application-specific context when performing database changes. This can include context such as the 'where' (API endpoint, worker, etc.), 'who' (user, cron job, etc.), and 'how' behind a change, thereby enriching the information captured by Bemi.
- Automatic and secure database change tracking with application-specific context in a structured form
- 100% reliability in capturing data changes, even if executed through direct SQL outside the application
- High performance without affecting code runtime execution and database workload
- Easy-to-use without changing table structures and rewriting the code
- Time travel querying and ability to easily group and filter changes
- Scalability with an automatically provisioned cloud infrastructure
- Full ownership of your data
See an example repo for Django that automatically tracks all changes.
There's a wide range of use cases that Bemi is built for! The tech was initially built as a compliance engineering system for fintech that supported $15B worth of assets under management, but has since been extracted into a general-purpose utility. Some use cases include:
- Audit Trails: Use logs for compliance purposes or surface them to customer support and external customers.
- Change Reversion: Revert changes made by a user or rollback all data changes within an API request.
- Time Travel: Retrieve historical data without implementing event sourcing.
- Troubleshooting: Identify the root cause of application issues.
- Distributed Tracing: Track changes across distributed systems.
- Testing: Rollback or roll-forward to different application test states.
- Analyzing Trends: Gain insights into historical trends and changes for informed decision-making.
Install the Python package
pip install bemi-django
Add the bemi
app to your Django project's INSTALLED_APPS
INSTALLED_APPS = [
# Your other apps
'bemi',
]
Run the migration to add lightweight PostgreSQL triggers for passing application context with all data changes into PostgreSQL's replication log
python manage.py migrate bemi
Add the Bemi middleware to your Django project app and add the path to a get_bemi_context
function to automatically pass application context with all tracked database changes made within an HTTP request:
# settings.py
MIDDLEWARE = [
# Your other middlewares
'bemi.BemiMiddleware',
]
BEMI_CONTEXT_FUNCTION = 'your_project.utils.get_bemi_context'
# utils.py
def get_bemi_context(request):
# Return any custom context dict
return {
'user_id': request.user.id,
'method': request.method,
'path': request.path,
}
Application context:
- Is bound to the current execution thread within an HTTP request.
- Is used only with
INSERT
,UPDATE
,DELETE
SQL queries performed via Django. Otherwise, it is a no-op. - Is passed directly into PG Write-Ahead Log with data changes without affecting the structure of the database and SQL queries.
Application context will automatically include the original SQL query that performed data changes, which is generally useful for troubleshooting purposes.
To test data change tracking and the Django integration with a locally connected PostgreSQL, you need to set up your local PostgreSQL.
First, make sure your database has SHOW wal_level;
returning logical
. Otherwise, you need to run the following SQL command:
-- Don't forget to restart your PostgreSQL server after running this command
ALTER SYSTEM SET wal_level = logical;
To track both the "before" and "after" states on data changes, please run the following SQL command:
ALTER TABLE [YOUR_TABLE_NAME] REPLICA IDENTITY FULL;
Then, run a Docker container that connects to your local PostgreSQL database and starts tracking all data changes:
docker run \
-e DB_HOST=host.docker.internal \
-e DB_PORT=5432 \
-e DB_NAME=[YOUR_DATABASE] \
-e DB_USER=postgres \
-e DB_PASSWORD=postgres \
public.ecr.aws/bemi/dev:latest
Replace DB_NAME
with your local database name. Note that DB_HOST
pointing to host.docker.internal
allows accessing 127.0.0.1
on your host machine if you run PostgreSQL outside Docker. Customize DB_USER
and DB_PASSWORD
with your PostgreSQL credentials if needed.
Now try making some database changes. This will add a new record in the changes
table within the same local database after a few seconds:
psql postgres://postgres:[email protected]:5432/[YOUR_DATABASE] -c \
'SELECT "primary_key", "table", "operation", "before", "after", "context", "committed_at" FROM changes;'
primary_key | table | operation | before | after | context | committed_at
-------------+-------+-----------+----------------------------------------------------+-----------------------------------------------------+-------------------------------------------------------------------------------------------+------------------------
26 | todo | CREATE | {} | {"id": 26, "task": "Sleep", "is_completed": false} | {"user_id": 187234, "endpoint": "/todo", "method": "POST", "SQL": "INSERT INTO ..."} | 2023-12-11 17:09:09+00
27 | todo | CREATE | {} | {"id": 27, "task": "Eat", "is_completed": false} | {"user_id": 187234, "endpoint": "/todo", "method": "POST", "SQL": "INSERT INTO ..."} | 2023-12-11 17:09:11+00
28 | todo | CREATE | {} | {"id": 28, "task": "Repeat", "is_completed": false} | {"user_id": 187234, "endpoint": "/todo", "method": "POST", "SQL": "INSERT INTO ..."} | 2023-12-11 17:09:13+00
26 | todo | UPDATE | {"id": 26, "task": "Sleep", "is_completed": false} | {"id": 26, "task": "Sleep", "is_completed": true} | {"user_id": 187234, "endpoint": "/todo/complete", "method": "PUT", "SQL": "UPDATE ..."} | 2023-12-11 17:09:15+00
27 | todo | DELETE | {"id": 27, "task": "Eat", "is_completed": false} | {} | {"user_id": 187234, "endpoint": "/todo/27", "method": "DELETE", "SQL": "DELETE FROM ..."} | 2023-12-11 17:09:18+00
Check out our Django docs for more details.
Bemi is designed to be lightweight and secure. It takes a practical approach to achieving the benefits of event sourcing without requiring rearchitecting existing code, switching to highly specialized databases, or using unnecessary git-like abstractions on top of databases. We want your system to work the way it already does with your existing database to allow keeping things as simple as possible.
Bemi plugs into both the database and application levels, ensuring 100% reliability and a comprehensive understanding of every change.
On the database level, Bemi securely connects to PostgreSQL's Write-Ahead Log and implements Change Data Capture. This allows tracking even the changes that get triggered via direct SQL.
On the application level, this Python package automatically passes application context to the replication logs to enhance the low-level database changes. For example, information about a user who made a change, an API endpoint where the change was triggered, a worker name that automatically triggered database changes, etc.
Bemi workers then stitch the low-level data with the application context and store this information in a structured easily queryable format, as depicted below:
The cloud solution includes worker ingesters, queues for fault tolerance, and a serverless PostgreSQL. If you are interested in running a self-hosted version yourself, see our self-hosting docs.
Distributed under the terms of the LGPL-3.0 License. If you need to modify and distribute the code, please release it to contribute back to the open-source community.