-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background workers #86
Changes from 28 commits
3f937bc
c5d836c
96305e8
c9c76d9
815ac49
bc8170b
f0c6b3a
018af5c
d9a10c7
28b5b84
a21d00f
c24c2a3
e6ebec7
7b89e7b
fd679ed
4bb8f23
a58f0c2
9fb9e34
1663277
df9cf1f
ffefb6c
961d7c7
c289041
70893bb
39bd0c3
0ea2502
e9ffd7d
7efb22c
3da595f
3bf55ab
dc53f2f
3d6b866
b973804
d33aa11
3261190
1c3daef
193a5c9
d07a9ab
310460e
be6fa49
cbe24fe
95b3205
7f314e8
f05a5f8
54b4515
74d2d7b
1c8d2f0
c165ffd
a141ba1
2379123
46be8f3
cecaf44
a8e08a9
d277b9d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Celery may not be able to be a straightforward backend as-is. And I'm not sure that's a problem we want to fix. Celery, afaik, requires tasks to be registered in the runner. But the more I think about the interface, the more I like that this doesn't require the callables to be registered somewhere, and I hope we can keep that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. However this means that moving a task function breaks all in-flight tasks, plus all the lovely security issues that come from a remote source giving us arbitrary functions, by path, to call. Registration is definitely the way, unfortunately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to elaborate on this, what if a worker receives a malicious message like so: {"function": "subprocess.check_call", "args": ["rm", "-rf", "/"]} There has to be a form of task registration, so that there is an allowlist of specific functions to run when a task is received. This also helps decouple the functions import path from the actual message, which is a good thing ™️ . You could perhaps restrict this to functions that live in {"function": "some_app_name.tasks.subprocess.check_call", "args": ["rm", "-rf", "/"]} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that we should attempt to make this interface resilient to malicious messages any more than Django generally make things resilient to malicious database queries. Sending messages is dangerous, and it is intended to do dangerous things, and you need to make sure you trust the code that can do that. It's true that you can't change the location of a callable all at once without the possible need for downtime, but that's pretty easy to deal with by creating a wrapper function in one place that calls the other. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There is a vast difference between this and "anyone who can add messages into the queue now has full remote-code execution capabilities, no questions asked". Invoking completely arbitrary, user-supplied functions with user-supplied arguments from outside sources has always ended rather poorly. There are also unresolved issues around how (and where!) to handle There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm certain it is not the only one. Dramatiq also requires actors (analog to Celery tasks) to be registered.
I agree. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Security needs to be on by default. We already receive emails to security@ that boil down to "When I ignore the security warning in the docs, there is a security gap". Any opt-out would need to be documented with very clear "Security warning" to scare away some who don't actually need to use an opt-out, and provide an easy link for us to reply to those security report emails.
Django is opinionated and already applies the auto-discovery pattern for several types of resources; most relevant examples being Models and management commands. Adding task auto-discovery to AppConfig with a This AppConfig auto-discovery opens the opportunity for a built-in management commands to inspect and run/enqueue tasks. I've seen this exact cron calling a management command that enqueues a celery task pattern way too many times in my career. It would be nice to remove that boilerplate code because people will repeat that pattern if cron is not included. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FWIW, DLQ takes this approach -- it auto-loads There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this discussion, I've made quite a few changes to the API, including adding an explicit decorator to create a I'm not super familiar with how Celery "marks" a task behind the scenes, and then validates it. If it just needs to store a function reference somewhere, that's easy enough to implement with the current calling pattern. If it needs the globally importable function to be of a given type, that might also be doable with a little more work. @ryanhiebert as you've clearly used Celery more than I, I'm interested in your thoughts! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it needs it to be of a specific type (a subclass of celery.Task), and registered with the (usually global) celery app, I believe.
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,383 @@ | ||
============================= | ||
DEP XXXX: Background workers | ||
============================= | ||
|
||
:DEP: XXXX | ||
:Author: Jake Howard | ||
:Implementation Team: Jake Howard | ||
:Shepherd: Carlton Gibson | ||
:Status: Draft | ||
:Type: Feature | ||
:Created: 2024-02-07 | ||
:Last-Modified: 2024-02-09 | ||
|
||
.. contents:: Table of Contents | ||
:depth: 3 | ||
:local: | ||
|
||
Abstract | ||
======== | ||
|
||
Django doesn't have a first-party solution for long-running tasks, however the ecosystem is filled with incredibly popular frameworks, all of which interact with Django in slightly different ways. Other frameworks such as Laravel have background workers built-in, allowing them to push tasks into the background to be processed at a later date, without requiring the end user to wait for them to occur. | ||
|
||
Library maintainers must implement support for any possible task backend separately, should they wish to offload functionality to the background. This includes smaller libraries, but also larger meta-frameworks with their own package ecosystem such as `Wagtail <https://wagtail.org>`_. | ||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Specification | ||
============= | ||
|
||
The proposed implementation will be in the form of an application wide "task backend" interface. This backend will be what connects Django to the task runners with a single pattern. The task backend will provide an interface for either third-party libraries, or application developers to specify how tasks should be created and pushed into the background. | ||
|
||
Backends | ||
-------- | ||
|
||
A backend will be a class which extends a Django-defined base class, and provides the common interface between Django and the underlying task runner. | ||
|
||
.. code:: python | ||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
from datetime import datetime | ||
from typing import Callable, Dict, List | ||
|
||
from django.tasks import BaseTask | ||
from django.tasks.backends.base import BaseTaskBackend | ||
|
||
|
||
class MyBackend(BaseTaskbackend): | ||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def __init__(self, options: Dict): | ||
""" | ||
Any connections which need to be setup can be done here | ||
""" | ||
super().__init__(options) | ||
|
||
def is_valid_task_function(self, func: Callable) -> bool: | ||
""" | ||
Determine whether the provided callable is valid as a task function. | ||
""" | ||
... | ||
|
||
def enqueue(self, func: Callable, priority: int | None, args: List, kwargs: Dict) -> BaseTask: | ||
""" | ||
Queue up a task function (or coroutine) to be executed | ||
""" | ||
... | ||
|
||
def defer(self, func: Callable, priority: int | None, when: datetime, args: List, kwargs: Dict) -> BaseTask: | ||
""" | ||
Add a task function (or coroutine) to be completed at a specific (timezone-aware) time | ||
""" | ||
... | ||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
async def aenqueue(self, func: Callable, priority: int | None, args: List, kwargs: Dict) -> BaseTask: | ||
""" | ||
Queue up a task function (or coroutine) to be executed | ||
""" | ||
... | ||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
async def adefer(self, func: Callable, priority: int | None, when: datetime, args: List, kwargs: Dict) -> BaseTask: | ||
""" | ||
Add a task function (or coroutine) to be completed at a specific (timezone-aware) time | ||
""" | ||
... | ||
|
||
def get_task(self, task_id: str) -> BaseTask: | ||
""" | ||
Retrieve a task by its id (if one exists). | ||
If one doesn't, raises self.TaskDoesNotExist. | ||
""" | ||
... | ||
|
||
async def aget_task(self, task_id: str) -> BaseTask: | ||
""" | ||
Retrieve a task by its id (if one exists). | ||
If one doesn't, raises self.TaskDoesNotExist. | ||
""" | ||
... | ||
|
||
def close(self) -> None: | ||
""" | ||
Close any connections opened as part of the constructor | ||
""" | ||
... | ||
|
||
If a backend doesn't support a particular scheduling mode, it simply does not define the method. Convenience methods ``supports_enqueue`` and ``supports_defer`` will be implemented by ``BaseTaskBackend``. Similarly, ``BaseTaskBackend`` will provide ``a``-prefixed stubs for ``enqueue``, ``defer`` and ``get_task`` wrapped with ``asgiref.sync_to_async``. | ||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
``is_valid_task_function`` determines whether the provided function (or possibly coroutine) is valid for the backend. This can be used to prevent coroutines from being executed, or otherwise validate the callable. | ||
|
||
Django will ship with 3 implementations: | ||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
ImmediateBackend | ||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This backend runs the tasks immediately, rather than offloading to a background process. This is useful both for a graceful transition towards background workers, but without impacting existing functionality. | ||
|
||
DatabaseBackend | ||
This backend uses the Django ORM as a task store. This backend will support all features, and should be considered production-grade. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As you explained, this will suit most users. Could you describe just a little bit more how you'd see that implemented (as I didn't see it yet in your POC implementation) ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've intentionally not put implementation details like that in here, as they're fairly external. But yes, the idea would be a management command to run the worker, a DB model to store the data itself, and likely an admin interface for debugging. |
||
|
||
DummyBackend | ||
This backend doesn't execute tasks at all, and instead stores the ``Task`` objects in memory. This backend is mostly useful in tests. | ||
|
||
Tasks | ||
----- | ||
|
||
A ``Task`` is used as a handle to the running task, and contains useful information the application may need when referencing the task. | ||
|
||
.. code:: python | ||
|
||
from datetime import datetime | ||
from typing import Any, Callable | ||
|
||
from django.tasks import BaseTask, TaskStatus | ||
|
||
class MyBackendTask(BaseTask): | ||
id: str | ||
"""A unique identifier for the task""" | ||
|
||
status: TaskStatus | ||
"""The status of the task""" | ||
|
||
queued_at: datetime | ||
"""When the task was added to the queue""" | ||
|
||
completed_at: datetime | None | ||
"""When the task was completed""" | ||
|
||
priority: int | None | ||
"""The priority of the task""" | ||
|
||
func: Callable | ||
"""The task function""" | ||
|
||
args: list | ||
"""The arguments to pass to the task function""" | ||
|
||
kwargs: dict | ||
"""The keyword arguments to pass to the task function""" | ||
|
||
def __init__(self, **kwargs): | ||
""" | ||
Unpacking the raw response from the backend and storing it here for future use | ||
""" | ||
super().__init__(**kwargs) | ||
|
||
def refresh(self) -> None: | ||
""" | ||
Reload the cached task data from the task store | ||
""" | ||
... | ||
|
||
async def arefresh(self) -> None: | ||
""" | ||
Reload the cached task data from the task store | ||
""" | ||
... | ||
|
||
@property | ||
def result(self) -> Any: | ||
""" | ||
The return value from the task function. | ||
If the task raised an exception, the result will contain that exception. | ||
If the task has not completed, a `ValueError` is raised when accessing. | ||
""" | ||
... | ||
|
||
A ``Task`` is obtained either when scheduling a task function, or by calling ``get_task`` on the backend. If called with a ``task_id`` which doesn't exist, a ``TaskDoesNotExist`` exception is raised. | ||
|
||
A ``Task`` will cache its values, relying on the user calling ``refresh`` / ``arefresh`` to reload the values from the task store. | ||
|
||
A ``Task``'s ``status`` must be one of the follwing values (as defined by an ``enum``): | ||
|
||
:NEW: The task has been created, but hasn't started running yet | ||
:RUNNING: The task is currently running | ||
:FAILED: The task failed | ||
:COMPLETE: The task is complete, and the result is accessible | ||
|
||
If a backend supports more than these statuses, it should compress them into one of these. | ||
|
||
Task functions | ||
-------------- | ||
|
||
A task function is any globally-importable callable which can be used as the function for a task (ie passed into ``enqueue``). | ||
|
||
Before a task can be run, it must be marked: | ||
|
||
.. code:: python | ||
|
||
from django.tasks import task | ||
|
||
@task | ||
def do_a_task(*args, **kwargs): | ||
pass | ||
|
||
The decorator "marks" the task as being a valid function to be executed. This prevent arbitrary methods from being queued, potentially resulting in a security vulnerability (eg ``subprocess.run``). | ||
|
||
Tasks will be validated against the backend's ``is_valid_task_function`` before queueing. The default implementation will validate all generic assumptions: | ||
|
||
- Is the task function globally importable | ||
- Has the task function been marked | ||
|
||
Queueing tasks | ||
------------- | ||
|
||
Tasks can be queued using ``enqueue``, a proxy method which calls ``enqueue`` on the default task backend: | ||
|
||
.. code:: python | ||
|
||
from django.tasks import enqueue, task | ||
|
||
@task | ||
def do_a_task(*args, **kwargs): | ||
pass | ||
|
||
# Submit the task function to be run | ||
task = enqueue(do_a_task) | ||
|
||
# Optionally, provide arguments | ||
task = enqueue(do_a_task, args=[], kwargs={}) | ||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Similar methods are also available for ``defer``, ``aenqueue`` and ``adefer``. When multiple task backends are configured, each can be obtained from a global ``tasks`` connection handler: | ||
|
||
.. code:: python | ||
|
||
from django.tasks import tasks, task | ||
|
||
@task | ||
def do_a_task(*args, **kwargs): | ||
pass | ||
|
||
# Submit the task function to be run | ||
task = tasks["special"].enqueue(do_a_task) | ||
|
||
# Optionally, provide arguments | ||
task = tasks["special"].enqueue(do_a_task, args=[], kwargs={}) | ||
|
||
When enqueueing tasks, ``args`` and ``kwargs`` are intentionally their own dedicated arguments to make the API simpler and backwards-compatible should other attributes be added in future. | ||
|
||
Here, ``do_a_task`` can either be a regular function or coroutine. It will be up to the backend implementor to determine whether coroutines are supported. In either case, the function must be globally importable. | ||
|
||
Deferring tasks | ||
--------------- | ||
|
||
Tasks may also be "deferred" to run at a specific time in the future: | ||
|
||
.. code:: python | ||
|
||
from django.utils import timezone | ||
from datetime import timedelta | ||
from django.tasks import defer | ||
|
||
task = defer(do_a_task, when=timezone.now() + timedelta(minutes=5)) | ||
|
||
When scheduling a task, it may not be **exactly** that time a task is executed, however it should be accurate to within a few seconds. This will depend on the current state of the queue and task runners, and is out of the control of Django. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❓ How important do you feel it is to prescribe the precision of the deferred running? Could this be left up to implementers to define? The latter would allow for configurable implementations if e.g: a user needs more or is happy with far less precision. 👍 to calling out that it's out of Django's control and there may be some inaccuracy here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this may be a layover from a previous verison of this DEP, where it was more important to define this. It's absolutely backend dependent, but I still wanted to flag both that there's little Django can do, and that there's a risk anyway (a risk that is probably the case anyway). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ❓ Given that running tasks on a cron is excluded from this proposal, how important do you feel it is to include support for running at an arbitrary time in the future? Thinking about the implementation side, cron support and delayed running support feel fairly similar. For some context here, the queueing system I'm most familiar with (https://github.com/thread/django-lightweight-queue) supports cron but not delayed running of tasks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Implementation wise, they're very similar, sure. Definition wise however they're quite different. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are two broadly different uses for an interface like this, and the implementations should be very different depending on what target is in view. A task retry use-case, the automatic version of which we're avoiding here, is suitable for relatively small numbers of tasks scheduled to happen in the near future, which I would define as roughly within the general expected lifetime of a individual runner. This has often been implemented by using the queue to deliver the task to a runner, who will then hang onto the task in memory until it is time to enqueue it properly. A more generic scheduled job interface would be resilient to higher volumes of delayed tasks and further-in-the-future scheduling. Only this would be generally applicable for things like detailed scheduling of a large email campaign, including follow-ups. This would need to be implemented in a more permanent mode of a datastore, such as a database. I suspect that having this scheduling is intended as foundational step toward the retry functionality. If I'm right about that, I suggest we should note the expected limitations of the interface described here. Because the two are API code-compatible, but with important different semantics, we should be documenting which semantic range we're targeting with this API. |
||
|
||
Sending emails | ||
-------------- | ||
|
||
One of the easiest and most common places that offloading work to the background can be performed is sending emails. Sending an email requires communicating with an external, potentially third-party service, which adds additional latency and risk to web requests. These can be easily offloaded to the background. | ||
|
||
Django will ship with an additional task-based SMTP email backend, configured identically to the existing SMTP backend. The other backends included with Django don't benefit from being moved to the background. | ||
|
||
Async tasks | ||
----------- | ||
|
||
Where the underlying task runner supports it, backends may also provide an ``async``-compatible interface for task queueing, using ``a``-prefixed methods: | ||
|
||
.. code:: python | ||
|
||
from django.tasks import aenqueue | ||
|
||
await aenqueue(do_a_task) | ||
|
||
Similarly, a backend may support queueing an async task function: | ||
|
||
.. code:: python | ||
|
||
from django.tasks import aenqueue, enqueue, task | ||
|
||
@task | ||
async def do_an_async_task(): | ||
pass | ||
|
||
await aenqueue(do_an_async_task) | ||
|
||
# Also works | ||
enqueue(do_an_async_task) | ||
|
||
Settings | ||
--------- | ||
|
||
.. code:: python | ||
|
||
TASKS = { | ||
"default": { | ||
"BACKEND": "django.tasks.backends.ImmediateBackend", | ||
"OPTIONS": {} | ||
} | ||
} | ||
|
||
``OPTIONS`` is passed as-is to the backend's constructor. | ||
|
||
Motivation | ||
========== | ||
|
||
Having a first-party interface for background workers poses 2 main benefits: | ||
|
||
Firstly, it lowers the barrier to entry for offloading computation to the background. Currently, a user needs to research different worker technologies, follow their integration tutorial, and modify how their tasks are called. Instead, a developer simply needs to install the dependencies, and work out how to *run* the background worker. Similarly, a developer can start determining which actions should run in the background before implementing a true background worker, and avoid refactoring should the backend change over time. | ||
|
||
Secondly, it allows third-party libraries to offload some of their execution. Currently, library maintainers need to either accept their code will run inside the request-response lifecycle, or provide hooks for application developers to offload actions themselves. This can be particularly helpful when offloading certain expensive signals. | ||
|
||
One of the key benefits behind background workers is removing the requirement for the user to wait for tasks they don't need to, moving computation and complexity out of the request-response cycle, towards dedicated background worker processes. Moving certain actions to be run in the background not improves performance of web requests, but also allows those actions to run on specialised hardware, potentially scaled differently to the web servers. This presents an opportunity to greatly decrease the percieved execution time of certain common actions performed by Django projects. | ||
|
||
The target audience for ``DatabaseBackend`` and a SQL-based queue are likely fairly well aligned with those who may choose something like PostgreSQL FTS over something like ElasticSearch. ElasticSearch is probably better for those 10% of users who really need it, but doesn't mean the other 90% won't be perfectly happy with PostgreSQL, and probably wouldn't benefit from ElasticSearch anyway. | ||
|
||
But what about *X*? | ||
------------------- | ||
|
||
The most obvious alternative to this DEP would be to standardise on a task implementation and vendor it in to Django. The Django ecosystem is already full of background worker libraries, eg Celery and RQ. Writing a production-ready task runner is a complex and nuanced undertaking, and discarding the work already done is a waste. | ||
|
||
This proposal doesn't seek to replace existing tools, nor add yet another option for developers to consider. The primary motivation is creating a shared API contract between worker libaries and developers. It does however provide a simple way to get started, with a solution suitable for most sizes of projects (``DatabaseBackend``). Slowly increasing features, adding more built-in storage backends and a first-party task runner aren't out of the question for the future, but must be done with careful planning and consideration. | ||
|
||
Rationale | ||
========= | ||
|
||
This proposed implementation specifically doesn't assume anything about the user's setup. This not only reduces the chances of Django conflicting with existing task systems a user may be using (eg Celery, RQ), but also allows it to work with almost any hosting environment a user might be using. | ||
|
||
This proposal started out as `Wagtail RFC 72 <https://github.com/wagtail/rfcs/pull/72>`_, as it was becoming clear a unified interface for background tasks was required, without imposing on a developer's decisions for how the tasks are executed. Wagtail is run in many different forms at many differnt scales, so it needed to be possible to allow developers to choose the backend they're comfortable with, in a way which Wagtail and its associated packages can execute tasks without assuming anything of the environment it's running in. | ||
|
||
The global task connection ``tasks`` is used to access the configured backends, with global versions of those methods available for the default backend. This contradicts the pattern already used for storage and caches. A "task" is already used in a number of places to refer to an executed task, so using it to refer to the default backend is confusing and may lead to it being overridden in the current scope: | ||
|
||
.. code:: python | ||
|
||
from django.tasks import task | ||
|
||
# Later... | ||
task = task.enqueue(do_a_thing) | ||
|
||
# Clearer | ||
thing_task = task.enqueue(do_a_thing) | ||
|
||
Backwards Compatibility | ||
======================= | ||
|
||
So that library maintainers can use this integration without concern as to whether a Django project has configured background workers, the default configuration will use the ``ImmediateBackend``. Developers on older versions of Django but who need libraries which assume tasks are available can use the reference implementation. | ||
|
||
Reference Implementation | ||
======================== | ||
|
||
The reference implementation will be developed alongside this DEP process. This implementation will serve both as an "early-access" demo to get initial feedback and start using the interface, as the basis for the integration with Django core, but also as a backport for users of supported Django versions prior to this work being released. | ||
|
||
A more complete implementation picture can be found at https://github.com/RealOrangeOne/django-core-tasks, however it should not be considered final. | ||
|
||
Future iterations | ||
================= | ||
|
||
The field of background tasks is vast, and attempting to implement everything supported by existing tools in the first iteration is futile. The following functionality has been considered, and deemed explicitly out of scope of the first pass, but still worthy of future development: | ||
|
||
- Completion hooks, to run subsequent tasks automatically | ||
- Bulk queueing | ||
- Automated task retrying | ||
- A generic way of executing task runners. This will remain the responsibility of the underlying implementation, and the user to execute correctly. | ||
- Observability into task queues, including monitoring and reporting | ||
- Cron-based scheduling | ||
|
||
RealOrangeOne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Copyright | ||
========= | ||
|
||
This document has been placed in the public domain per the Creative Commons | ||
CC0 1.0 Universal license (http://creativecommons.org/publicdomain/zero/1.0/deed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should declare the expected acknowledgement semantics.
While I very heavily use Celery's
acks_late=True
in combination with reliable message broker like RabbitMQ, I think that if we want to keep a simpler interface we should define that they are acknowledged before work on the task begins. Thus, we should document that backends are expected to run tasks at most once in order to be compatible with this interface. Senders of tasks (including library authors) that perform such dangerous actions as sending out emails need to be confident that they aren't going to be sent out multiple times.At least once execution is also a very helpful semantic for a broad variety of tasks, and I hope that we can standardize an interface for that in the future. What I think is unwise would be to leave this semantic distinction undefined in the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"at most once" semantics are definitely the easiest to reason about, although in some cases people may want the latter. I'm not exactly how we encode that in the current API. But I think I agree that we can assume "at most once" for all backends, and if we want it to be configurable over time, it'll probably end up being another argument to
enqueue
(or perhaps something defined at config time if it's not sensibly / easily configured per-task).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we document it as the semantic we expect, then the API doesn't need to encode it. It's along the same lines as the scheduling API, where we want to be clear about the intended semantics of the API we're exposing.