-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Background workers #86
Conversation
draft/0000-background-workers.rst
Outdated
The global task connection ``tasks`` is used to access the configured backends, with global versions of those methods available for the default backend. This contradicts the pattern already used for storage and caches. A "task" is already used in a number of places to refer to an executed task, so using it to refer to the default backend is confusing and may lead to it being overridden in the current scope: | ||
|
||
.. code:: python | ||
|
||
from django.contrib.tasks import task | ||
|
||
# Later... | ||
task = task.enqueue(do_a_thing) | ||
|
||
# Clearer | ||
thing_task = task.enqueue(do_a_thing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming things is hard - I'd love some alternative proposals for how to handle this naming. Even having the connection handler called tasks
erks me out a little
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observation: in the case of from django.cache import cache
or from django.db import connection
, "cache" & "connection" are the name of the thing which hosts the action to be done or provides access to said host. In the case of "tasks" though, the naming thus far has come from the name of the thing to be worked on. Perhaps there's a name for the container/host analogue which could be used instead?
For example: "Runner", "Worker", "Queue" come to mind, though I'm not sure they're great alternatives as they sound a bit too close to implementations. "Background" maybe? (perhaps too abstract?)
🥘 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original version used background
, but I changed it for exactly that reason.
"Runner", "Worker" and "Queue" are all fairly specific components in the task execution pipeline, and that's not what this really is. I guess a task backend could be considered adding a task to a "Queue", but when developing a backend, having 2 very different things called "queues" is quite annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking inspiration from the Rails ActiveJob API, I quite like this API:
from django.tasks import perform_later
perform_later(send_email)
perform_later(send_email, wait_until=timezone.now() + timedelta(minutes=5))
Only exposing one function (and its async counterpart) avoids having to come up with different names.
Leaving run_later
, call_later
, execute_later
as alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Laravel uses a similarly unified dispatch API, so unifying the call API could be nice. There's still a discovery API for whether scheduling is supported, so we can just reuse.
perform_later
doesn't quite solve what to call the overarching connection manager, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate thought on naming, which I think someone mentioned in a different thread but I can't immediately find -- I think it would be useful to separate the ideas of:
- a function which can be enqueued & called later
- the representation of such an enqueueing which is pending being run
Currently these seem to both be called "tasks", which potentially leads to confusion. One spelling for splitting this out might be tasks (definitions) & jobs (instances) respectively1.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some of the latest updates he's referred to the task definition as a "task function", as opposed to the task which is a "job" in your definition above. I'm not sure we'll find terminology everyone is happy with, so I've lived with the ambiguity, although I did make an oblique suggestion that we need a glossary at the end of this comment.
Thanks so much for getting to this so quickly @RealOrangeOne! 🎁 I’ll have a proper read over the weekend but, one initial remark: I don’t think this should be in the |
Hello! Thanks for putting this proposal up, IMO this would be great to have. Dropping some comments here for completeness (as we discussed in DMs), though I'm not sure this is the right place. Task/transaction race conditionIn a backend which queues tasks onto an external queue (let's say, Redis), there's a race condition between a task runner starting the task and the transaction being committed. If I as a caller create a The problem is, beyond being a bit of a pain to do manually and being easily forgotten, it's not clear what is best to do in the Transactions in
|
Thank you so much for putting this together, @RealOrangeOne! This has long been on my personal Django wishlist, so I'm absolutely stoked to see it moving forward. I've just had a quick read, so don't have super-detailed comments, but I do have a couple of big-picture questions that I don't see adressed, and that I think need to have some answers:
|
No where near Celery, at least not for a while. The vast majority of this proposal is about writing the glue which sits between Django and tools like Celery, so developers and library maintainers have a single interface to using queues, without needing to worry about the implementation details. The biggest risk of scope-creep for this is definitely the I think the
I think there are 2 things which would need to transition: Code wise, I'd like there to be as close to 0 changes needed to swap out the backend (up to a certain point). If implementation details are encoded into how tasks are queued (eg the need for As for the queue storage itself, I think that's a little more complex. Exactly how to say move a bunch of tasks from RQ to Celery and RabbitMQ is probably quite a difficult story, which doesn't have an obvious answer. However, I don't think that needs to be on Django to solve, since it's only providing the glue between the runner and Django, as opposed to a library for other task runners to integrate with (Other than implementing this interface, I'd like there to be no changes Celery would need to make to support this). The easiest transition path might be running both queues in parallel for a while until the first queue is empty, and then stopping the runners. |
I love this, thank you @RealOrangeOne. A couple of thoughts: Passing a plain callable to Should we add some concept of ownership/queue/routing to the task model itself? Given a task ID how do I know what queue it was sent to, or how would I send a task to a specific queue? Defining lots of duplicate backends (1 per queue) seems redundant and not very performant. An execution of a task is a separate thing to the task itself - a task might have 0 or more executions (i.e retries, timeouts, etc). The current model seems to mix both of these together - is it worth making a distinction? Can we/should we tie enqueuing into “transaction.on_commit” by default? Seems like a big footgun people will run into almost immediately. I personally hate |
This is fantastic framing, and something that should probably end up in the eventually docs for this. Super-well-said! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together 👏. It's really cool to see something like this coming to Django.
I've added some comments & questions inline, though I also have some thoughts towards an alternative API which alleviates some of the things I and others have raised. I suspect this is into the realm of bigger thoughts so I'll save that for another venue. (The short version is that it's somewhere between DLQ's API and Django's ORM). I'll DM you somewhere to chat.
For context: my experience with queueing systems is almost entirely with somewhat home-grown and less well known one (https://github.com/thread/django-lightweight-queue, DLQ), which I suspect has fewer features than the libraries already mentioned. As a consequence I've had some hands-on experience extending such a system. Many of the comments I've added are built on learnings from that.
Just a thought -- would it be worth inviting the maintainers of Celery/RQ/etc. to have a look at this PR? Especially if the goal is that transitioning from Django's internal system to one of those is relatively easy I suspect they could have a lot of wisdom to share.
draft/0000-background-workers.rst
Outdated
- Bulk queueing | ||
- Automated task retrying | ||
- A generic way of executing task runners. This will remain the repsonsibility of the underlying implementation, and the user to execute correctly. | ||
- Observability into task queues, including monitoring and reporting | ||
- Cron-based scheduling. For now, this can be achieved by triggering the same task once it finishes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the desire to keep things simple, however (speaking from experiences adding some of these features to a queueing implementation) many of the things pushed out here are likely have significant impacts on the API design. I would encourage considering these upfront if possible, even if they are not included in the implementation for now.
In particular the ones I'm expecting could impact the API are:
- Bulk enqueueing
- Automated task retrying (see also @prophile's comments around reliability and races in the database backend, which are kinda the other half of this)
- (from @prophile's comments) dedicated capacity/priority semantics
- (not mentioned anywhere) job timeouts and other shutdown interactions (whether to wait for or kill an in-flight task)
I'd also encourage thinking about the observability aspects. While a little less likely to impact the enqueuing API, I expect this will impact the design of the backends and may impact what plumbing APIs we want.
draft/0000-background-workers.rst
Outdated
|
||
task = defer(do_a_task, when=timezone.now() + timedelta(minutes=5)) | ||
|
||
When scheduling a task, it may not be **exactly** that time a task is executed, however it should be accurate to within a few seconds. This will depend on the current state of the queue and task runners, and is out of the control of Django. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ How important do you feel it is to prescribe the precision of the deferred running? Could this be left up to implementers to define? The latter would allow for configurable implementations if e.g: a user needs more or is happy with far less precision.
👍 to calling out that it's out of Django's control and there may be some inaccuracy here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be a layover from a previous verison of this DEP, where it was more important to define this. It's absolutely backend dependent, but I still wanted to flag both that there's little Django can do, and that there's a risk anyway (a risk that is probably the case anyway).
draft/0000-background-workers.rst
Outdated
Deferring tasks | ||
--------------- | ||
|
||
Tasks may also be "deferred" to run at a specific time in the future: | ||
|
||
.. code:: python | ||
|
||
from django.utils import timezone | ||
from datetime import timedelta | ||
from django.tasks import defer | ||
|
||
task = defer(do_a_task, when=timezone.now() + timedelta(minutes=5)) | ||
|
||
When scheduling a task, it may not be **exactly** that time a task is executed, however it should be accurate to within a few seconds. This will depend on the current state of the queue and task runners, and is out of the control of Django. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Given that running tasks on a cron is excluded from this proposal, how important do you feel it is to include support for running at an arbitrary time in the future? Thinking about the implementation side, cron support and delayed running support feel fairly similar.
For some context here, the queueing system I'm most familiar with (https://github.com/thread/django-lightweight-queue) supports cron but not delayed running of tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation wise, they're very similar, sure. Definition wise however they're quite different. when
is just a "do not execute before X" field in the queue. cron however not only requires the list of tasks to be defined statically (to avoid a bootstrapping issue), but also some kind of lock to confirm 2 workers don't try to run it at once.
This suffered from a bootstrapping problem. Instead, people can use whatever techniques they're using at the moment, and task-ify them as needed.
I don't think I can encode this in the type signature, but yes the function will need to be global. I suspect we may need to validate this just before triggering. Whilst certain backends may support it, most won't.
Yes, absolutely we should. A
Currently, these are one and the same. The current implementation doesn't support retries directly, so would likely be separate tasks.
I don't think so. It's a foot-gun in some cases, but might end up being confusing in others. In some cases, you might not care about having objects committed to the DB, so it just delays the task unnecessarily. I think in the first-pass, we can solve this with documentation.
Naming things is hard. I'm not attached to them by any means. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for bringing this forward. I fully support it. I've spent the past two weeks battling myself on what things I think are critical features for this interface.
draft/0000-background-workers.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should declare the expected acknowledgement semantics.
While I very heavily use Celery's acks_late=True
in combination with reliable message broker like RabbitMQ, I think that if we want to keep a simpler interface we should define that they are acknowledged before work on the task begins. Thus, we should document that backends are expected to run tasks at most once in order to be compatible with this interface. Senders of tasks (including library authors) that perform such dangerous actions as sending out emails need to be confident that they aren't going to be sent out multiple times.
At least once execution is also a very helpful semantic for a broad variety of tasks, and I hope that we can standardize an interface for that in the future. What I think is unwise would be to leave this semantic distinction undefined in the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"at most once" semantics are definitely the easiest to reason about, although in some cases people may want the latter. I'm not exactly how we encode that in the current API. But I think I agree that we can assume "at most once" for all backends, and if we want it to be configurable over time, it'll probably end up being another argument to enqueue
(or perhaps something defined at config time if it's not sensibly / easily configured per-task).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we document it as the semantic we expect, then the API doesn't need to encode it. It's along the same lines as the scheduling API, where we want to be clear about the intended semantics of the API we're exposing.
draft/0000-background-workers.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Celery may not be able to be a straightforward backend as-is. And I'm not sure that's a problem we want to fix. Celery, afaik, requires tasks to be registered in the runner. But the more I think about the interface, the more I like that this doesn't require the callables to be registered somewhere, and I hope we can keep that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However this means that moving a task function breaks all in-flight tasks, plus all the lovely security issues that come from a remote source giving us arbitrary functions, by path, to call.
Registration is definitely the way, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to elaborate on this, what if a worker receives a malicious message like so:
{"function": "subprocess.check_call", "args": ["rm", "-rf", "/"]}
There has to be a form of task registration, so that there is an allowlist of specific functions to run when a task is received. This also helps decouple the functions import path from the actual message, which is a good thing ™️ .
You could perhaps restrict this to functions that live in [app_name].tasks
, but it still suffers from the same issue if subprocess
is imported:
{"function": "some_app_name.tasks.subprocess.check_call", "args": ["rm", "-rf", "/"]}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we should attempt to make this interface resilient to malicious messages any more than Django generally make things resilient to malicious database queries. Sending messages is dangerous, and it is intended to do dangerous things, and you need to make sure you trust the code that can do that.
It's true that you can't change the location of a callable all at once without the possible need for downtime, but that's pretty easy to deal with by creating a wrapper function in one place that calls the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending messages is dangerous, and it is intended to do dangerous things, and you need to make sure you trust the code that can do that.
There is a vast difference between this and "anyone who can add messages into the queue now has full remote-code execution capabilities, no questions asked". Invoking completely arbitrary, user-supplied functions with user-supplied arguments from outside sources has always ended rather poorly.
There are also unresolved issues around how (and where!) to handle task.enqueue(some_model_instance.send_email)
. Some things can't, and shouldn't, be done with this kind of pass-an-arbitrary-callable interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Celery requires task functions to be marked, then it's unlikely they're the only library
I'm certain it is not the only one. Dramatiq also requires actors (analog to Celery tasks) to be registered.
personally I'd prefer Django be forcing good practice on a user, rather than require they opt-in to security, or allowing them to create a foot-gun and potential CVE for Django later on.
I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this opt-in functionality, or even opt-out, but personally I'd prefer Django be forcing good practice on a user, rather than require they opt-in to security, or allowing them to create a foot-gun and potential CVE for Django later on.
Security needs to be on by default. We already receive emails to security@ that boil down to "When I ignore the security warning in the docs, there is a security gap". Any opt-out would need to be documented with very clear "Security warning" to scare away some who don't actually need to use an opt-out, and provide an easy link for us to reply to those security report emails.
While it follows common patterns, a decorator for task registration does affect usability compared to just sending the callable, because it requires that the modules be imported in order for the tasks to be found. This can be done explicitly or by some auto-discovery magical convention, and while you may prefer those patterns they aren't without trade-offs (ones that I would prefer to avoid).
Django is opinionated and already applies the auto-discovery pattern for several types of resources; most relevant examples being Models and management commands. Adding task auto-discovery to AppConfig with a def import_tasks(self)
would be a natural fit.
This AppConfig auto-discovery opens the opportunity for a built-in management commands to inspect and run/enqueue tasks. I've seen this exact cron calling a management command that enqueues a celery task pattern way too many times in my career. It would be nice to remove that boilerplate code because people will repeat that pattern if cron is not included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Django is opinionated and already applies the auto-discovery pattern for several types of resources
FWIW, DLQ takes this approach -- it auto-loads tasks.py
files in apps to discover tasks which register themselves via a decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this discussion, I've made quite a few changes to the API, including adding an explicit decorator to create a Task
object based on a function (which in turn both validates the function is globally importable, and ensures only registered functions can be used as tasks).
I'm not super familiar with how Celery "marks" a task behind the scenes, and then validates it. If it just needs to store a function reference somewhere, that's easy enough to implement with the current calling pattern. If it needs the globally importable function to be of a given type, that might also be doable with a little more work.
@ryanhiebert as you've clearly used Celery more than I, I'm interested in your thoughts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs it to be of a specific type (a subclass of celery.Task), and registered with the (usually global) celery app, I believe.
I think that queuing under the backend interface described would rightly be an implementation detail of the backend, and can actually be reasonably capable. The backend, for example, can separate callables into queues based on the import path. That would get a lot of the projects I've worked on a good deal of mileage, to support the common case of separating long-running callables from short-lived callables. If ownership/routing is anything like exchanges and routing keys in Celery / AMQP, I think those are advanced use-cases that we don't want to get into. |
This allows it to handle raising the exception itself, which makes errors more descriptive
It's the `TaskResult` which has the status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an author of one of these too many packages dealing with background tasks (django-toosimple-q), I can't wait to see this land in Django :-) Thanks so much for working on this !
draft/0000-background-workers.rst
Outdated
- Completion / failed hooks, to run subsequent tasks automatically | ||
- Bulk queueing | ||
- Automated task retrying | ||
- A generic way of executing task runners. This will remain the responsibility of the underlying implementation, and the user to execute correctly. | ||
- Observability into task queues, including monitoring and reporting | ||
- Cron-based scheduling | ||
- Task timeouts | ||
- Swappable argument serialization (eg `pickle`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can include these in this list (which I see as potentially important and which I don't see discussed in the DEP)
- Logging (besides the result, the task results could capture logs, stdout and stderr)
- Realtime output (for long running tasks, having the ability to get some kind of feedback before it finishes would for instance allow to implement progress bars)
- Non-django tasks (not 100% clear if the proposed approach already supports enqueuing non-django tasks, but if not, having a worker agnostic background task API would open quite some doors for distributed architectures)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Capturing stdout / stderr is probably an antipattern anyway - collect metrics like that manually
- As part of the initial interface, that's definitely not happening without manual implementation. Depending on how granular the progress is, it could end up a performance bottleneck
- This one has bugged me for a while - there's not a huge amount in here that's Django specific, so it could be extracted out to Python itself. However, I'd rather let the backends deal with that themselves. If someone wants to maintain a package which can let arbitrary Python apps schedule tasks using say the built-in DB queue, I'm not opposed to that at all.
draft/0000-background-workers.rst
Outdated
DatabaseBackend | ||
This backend uses the Django ORM as a task store. This backend will support all features, and should be considered production-grade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you explained, this will suit most users. Could you describe just a little bit more how you'd see that implemented (as I didn't see it yet in your POC implementation) ?
I guess it would be an optional contrib module providing a management command to run the workers, and probably some admin integration to see task results ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've intentionally not put implementation details like that in here, as they're fairly external. But yes, the idea would be a management command to run the worker, a DB model to store the data itself, and likely an admin interface for debugging.
Thanks for the discussion so far all. I've asked for @django/steering-council review on the Forum here: https://forum.djangoproject.com/t/steering-council-vote-on-background-tasks-dep-14/31131 |
Overall the proposal looks good, but there is one thing that I can't wrap my head around it yet. Let's assume you are writing a third-party application and want to use this task framework. Which backend/queue names are you putting into the Two approaches come to mind (there might be more and even saner approaches):
TASKS = {
"high_prio": {
"BACKEND": "django.tasks.backends.ImmediateBackend",
"QUEUES": [],
"OPTIONS": {},
"ALIASES": ["default", "third_party_app1"] # See here
},
"low_prio": {
"BACKEND": "django.tasks.backends.ImmediateBackend",
"QUEUES": [],
"OPTIONS": {},
"ALIASES": ["third_party_app2"] # and here…
}
} |
I think the points raised by @prophile in Additionally we should probably add a flag to the database backend so it can operate in two ways: one where it joins the existing transaction on The database backend part doesn't have to be part of the initial implementation, but we really need to get the semantics of the immediate backend right. |
Unless you have a good reason, just omit it. If you want users to be able to configure the attribute themselves, you can define your own setting and let the user configure it there. I do like the I imagine |
These are the semantics which concern me about an initial implementation. I suspect configuration is the way to go here, as there isn't one obvious answer ( I've intentionally not started much work on the |
I didn't consider a setting here; but I agree that it is a good workaround for now and we can add
I guess the immediate answer would be "behave like other backends would" :D But once one tested that the backends behave properly it might make sense to switch to "I know this code works, now run fast so my tests are quicker and better to test". |
@RealOrangeOne As per the Forum thread, I think you're good to move this to the |
Done - Over to you @carltongibson! 🥳 |
TASKS = { | ||
"default": { | ||
"BACKEND": "django.tasks.backends.ImmediateBackend", | ||
"QUEUES": [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
"QUEUES": [] | |
"QUEUES": [], |
|
||
:NEW: The task has been created, but hasn't started running yet | ||
:RUNNING: The task is currently running | ||
:FAILED: The task failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could we clarify what's meant by "failed" here? Is this timed out? Threw a particular exception? Threw any exception? Returned an unexpected result? Returned no result where one was expected?
Is this something the backends are expected to create rules for themselves?
I suspect that "threw any exception" is probably the simplest and may be what's intended, though it would be good to say as much IMO.
|
||
If a backend supports more than these statuses, it should compress them into one of these. | ||
|
||
For convenience, calling a ``TaskResult`` will execute the task's function directly, which allows for graceful transitioning towards background tasks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For convenience, calling a ``TaskResult`` will execute the task's function directly, which allows for graceful transitioning towards background tasks: | |
For convenience, calling a ``Task`` will execute the task's function directly, which allows for graceful transitioning towards background tasks: |
I assume this is what's intended here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, yes it is 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #90.
For convenience, calling a ``TaskResult`` will execute the task's function directly, which allows for graceful transitioning towards background tasks: | ||
|
||
.. code:: python | ||
|
||
from django.tasks import task | ||
|
||
@task() | ||
def do_a_task(*args, **kwargs): | ||
pass | ||
|
||
# Calls `do_a_task` as if it weren't a task | ||
do_a_task() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the intent however I am concerned that this will become a significant foot-gun, especially for new-to-django developers entering a codebase which already has tasks present within it. Their editor is almost certainly going to show them that the symbol is callable and could make assumptions about the relative safety of just calling the function. For a task which doesn't take too much time locally this could also really easily slip into production.
Here are some ideas for alternatives:
- Omit this and let people use
task.func(...)
for direct calls (as I assume we'd expect in tests etc.) - Offer a derived decorator which offers this functionality for transitioning, but is expected to be removed over time in favour of
task
which doesn't
Given the ease of creating the latter, I'd really suggest that we not offer this at all in at least the first version and perhaps not ever. Documenting how to create such a decorator to use for transitions might be a good idea though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During implementation, I his this foot-gun too. My proposed solution is a .call
method (and .acall
) instead of implementing __call__
. I do know some task libraries use __call__
to enqueue a task, so I agree this could be confusing.
I'd also thought of additionally having `call enqueue a task, but that felt like the wrong way forward. "There should be one-- and preferably only one --obvious way to do it."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What benefit does a Task.call
/Task.acall
method offer when Task.func
already exists? (Assuming I'm reading the DEP right about that member)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of beginners, .func(…)
doesn’t really convey much intent. .call()
is a bit clearer. .func
also gives you the underlying function object, it’s possible we want some extra logic in enqueue
and call
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are. The main benefit is shimming over whether the .func
is async or not. .call
and .acall
transparently shim the function based on the callers context, rather than its own definition.
I'm yet to battle-test whether this is the right approach, or whether it introduces other bugs, but I think it's still a useful API, even if it ends up undocumented and only used for debugging.
Sorry to add a bunch of comments right after this has merged. I've been following this PR with interest though hadn't realised that it had progressed to a near final state -- I would have tried to re-review it more completely prior to now if I had. |
I've now opened up discussions on the in-progress implementation, and pre-populated it with the "Future iterations" items. I suggest we move further discussions on the DEP there. After the initial influx of comments and feedback there, I'll backport the relevant changes to the DEP in a separate PR (hence the distinction between "accepted" and "final"). Please do raise any and all comments / questions there (and PRs too)! |
This is a proposal for adding a background workers interface to Django, to better facilitate developers (both working with Django and library maintainers) moving certain tasks from their web processes to separate background runners.
This work originally started in Wagtail as RFC 72, but was found to be generic enough to also apply to Django, so @carltongibson suggested I adapt it to a DEP. See also my more complete code examples.
Got feedback? Feel free to leave a comment? Got bigger feedback, I'm more than happy to chat on Twitter, Mastodon, or another channel if you already know me (so this PR doesn't get too messy).
View rendered content