Releases: alephdata/servicelayer
v1.23.0
The custom messaging queue used by Aleph has been replaced with RabbitMQ. As of this version of servicelayer
, Aleph will use a persistent messaging queue. We have seen an increase in stability, predictability and also in the clarity of debugging since making these changes.
The implementation uses a Default, direct Exchange. RabbitMQ allows users to monitor the activity of the messaging queues using a management interface that one can access from the browser, if the proper port is exposed.
In order to populate the System Status view in Aleph, Redis is used to independently track the state of tasks. job_id
s, instead tracking tasks (task_id
s). The structure of Redis keys has also changed as follows:
Redis keys used by the Dataset
object:
tq:qdatasets
: set of allcollection_id
s of active datasets (a dataset is considered active when it has either running or pending tasks)tq:qdj:<dataset>:taskretry:<task_id>
: the number of timestask_id
was retried
All of the following keys refer to task_id
s or statistics about tasks per a certain dataset (collection_id
):
tq:qdj:<dataset>:finished
: number of tasks that have been marked as "Done" and for which an acknowledgement is also sent by the Worker over RabbitMQ.tq:qdj:<dataset>:running
: set of alltask_id
s of tasks currently running. A "Running" task is a task which has been checked out, and is being processed by a worker.tq:qdj:<dataset>:pending
: set of alltask_id
s of tasks currently pending. A "Pending" task has been added to a RabbitMQ queue (via abasic_publish
call) by a producer (an API call, a UI action etc.).tq:qdj:<dataset>:start
: the UTC timestamp when either the firsttask_id
has been added to a RabbitMQ queue (so, we have our first Pending task) or the timestamp when the firsttask_id
has been checked out (so, we have our first Running task). Thestart
key is updated when the first task is handed to a Worker.tq:qdj:<dataset>:last_update
: the UTC timestamp from the latest change to the state of tasks running for a certaincollection_id
. This is set when: a new task is Pending, a new task is Running, a new task is Done, a new task is canceled.tq:qds:<dataset>:<stage>
: a set of alltask_id
s that are either running or pending, for a certain stage.tq:qds:<dataset>:<stage>:finished
: number of tasks that have been marked as "Done" for a certain stage.tq:qds:<dataset>:<stage>:running
: set of alltask_id
s of tasks currently running for a certain stage.tq:qds:<dataset>:<stage>:pending
: set of alltask_id
s of tasks currently pending for a certain stage.
Tasks are assigned a random priority before being added to the appropriate queues to ensure a fair distribution of execution. The current implementation also allows admin users of Aleph to chose to assign a task either a global minimum priority or a global maximum priority.
What's Changed
- Adds a last_updated timestamp to the dataset status by @stchris in #136
- Pin moto because of breaking changes in version 5.0 + by @stchris in #155
- Remove unused GitHub Actions workflow by @tillprochaska in #154
- Standardize development dependencies / refactor GHA workflow by @tillprochaska in #153
Dependency upgrades
- Bump black from 23.9.1 to 23.11.0 by @dependabot in #135
- Bump wheel from 0.41.2 to 0.42.0 by @dependabot in #134
- Bump prometheus-client from 0.17.1 to 0.19.0 by @dependabot in #133
- Bump ruff from 0.0.292 to 0.1.8 by @dependabot in #138
- Bump pytest from 7.4.2 to 7.4.3 by @dependabot in #121
- Bump pytest-env from 1.0.1 to 1.1.3 by @dependabot in #132
- Bump pytest-mock from 3.11.1 to 3.12.0 by @dependabot in #126
- Update development dependencies in groups by @stchris in #139
- Bump the dev-dependencies group with 1 update by @dependabot in #140
- Bump fakeredis from 2.19.0 to 2.20.1 by @dependabot in #141
- Release 1.22.2 by @tillprochaska in #167
- Bump the dev-dependencies group with 6 updates by @dependabot in #170
- Bump fakeredis from 2.20.1 to 2.22.0 by @dependabot in #168
- Bump prometheus-client from 0.19.0 to 0.20.0 by @dependabot in #159
- Bump structlog from 23.2.0 to 24.1.0 by @dependabot in #151
- Release/1.23.0 by @stchris in #143
Full Changelog: v1.22.1...v1.23.0
v1.22.2
This release includes a fix for the archive functionality in servicelayer. Previously, the generate_url
methods of the Google Cloud Storage archive adapter and the AWS S3 archive adapter were generating URLs instructing AWS S3 and Google Cloud Storage to send a Content-Disposition: inline
header in the response.
When sending this header, most browsers will automatically open the file if the file’s MIME type is supported by the browser. This may not be desired in some cases, for example when downloading files from untrustworthy sources.
Starting with this version of servicelayer, the generated URLs will instead instruct AWS S3 and Google Cloud Storage to send a Content-Disposition: attachment
header. Browsers won’t open files without user interaction if this header is set.
v1.22.1
What's Changed
- Change default port for Prometheus metrics endpoint to 9100 by @tillprochaska in #129
- Misc Promethus changes by @tillprochaska in #130
Full Changelog: v1.22.0...v1.22.1
v1.22.0
What's Changed
- Add basic Prometheus instrumentation for workers by @tillprochaska in #111
- Log worker retry count and retry count exhaustion by @stchris in #113
Dependency upgrades
- Bump pytest-env from 0.8.1 to 1.0.1 by @dependabot in #110
- Bump wheel from 0.40.0 to 0.41.2 by @dependabot in #108
- Bump ruff from 0.0.270 to 0.0.292 by @dependabot in #119
- Bump fakeredis from 2.13.0 to 2.19.0 by @dependabot in #118
- Bump black from 23.3.0 to 23.9.1 by @dependabot in #117
- Bump structlog from 23.1.0 to 23.2.0 by @dependabot in #116
- Bump pytest from 7.3.1 to 7.4.2 by @dependabot in #115
- Bump normality from 2.4.0 to 2.5.0 by @dependabot in #114
- Bump pytest-mock from 3.10.0 to 3.11.1 by @dependabot in #98
New Contributors
- @tillprochaska made their first contribution in #111
Full Changelog: v1.21.2...v1.22.0
v1.21.0
What's Changed
-
Add Sentry support to servicelayer workers by @stchris in #88
This release adds support for sending error tracebacks to sentry.io (or a self-hosted instance). This is controlled by two environment variables: SENTRY_DSN and SENTRY_ENVIRONMENT. Note that you also have to take care of installing the sentry_sdk package.
-
Add and enforce linter (ruff) and code formatter (black) by @stchris in #89
This updates the development environment and CI configuration to be closer to what we have in Aleph.
Full Changelog: v1.20.7...v1.21.0
v1.20.7
v1.20.6
What's Changed
- Bump pika from 1.3.0 to 1.3.1 by @dependabot in #76
- Bump fakeredis from 1.9.1 to 1.10.0 by @dependabot in #77
- Bump fakeredis from 1.10.0 to 1.10.1 by @dependabot in #79
- refactor to suport SQLAlchemy 2.0 migration by @catileptic in #82
New Contributors
- @catileptic made their first contribution in #82
Full Changelog: v1.20.5...v1.20.6
v1.20.5
What's Changed
- Bump fakeredis from 1.8.1 to 1.9.0 by @dependabot in #71
- Bump fakeredis from 1.9.0 to 1.9.1 by @dependabot in #73
- Bump pika from 1.2.0 to 1.3.0 by @dependabot in #68
- Update structlog requirement from <22.0.0,>=20.2.0 to >=20.2.0,<23.0.0 by @dependabot in #70
Full Changelog: v1.20.4...v1.20.5
1.19.1
What's Changed
- Bump fakeredis from 1.7.1 to 1.8 by @dependabot in #66
- Bump fakeredis from 1.8 to 1.8.1 by @dependabot in #67
Full Changelog: v1.19.0...v1.19.1