-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC 72: Background workers #72
RFC 72: Background workers #72
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting, nice work! I have put through some misc feedback, sorry if you already addressed some items and I missed it.
Similarly to Django's caching framework, a global "background" object can be imported, which is used to add tasks. This will be a "Huey" instance, potentially subclassed with some Wagtail sugar. This global will be pre-configured based on the application's settings such as backend, immediate mode, and connection details. | ||
|
||
```python | ||
from wagtail.contrib.tasks import background |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be a bit early in the process to think about naming, but maybe we should add a section for proposed names.
I think task
is already taken up in the context of Workflow/Tasks.
Maybe; job
or worker
But I realise Huey uses Task in its terminology so maybe unavoidable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the name here is definitely just a WIP one to fill a gap.
"worker" feels like it's the thing which performs the task, as opposed to the task itself.
"job" is definitely a common enough term that it could be fine. Agree the conflict with workflows isn't ideal. We could be more explicit and call them "background tasks" or alike?
text/072-background-workers.md
Outdated
- Is this _contrib_? | ||
- Should Huey be an optional dependency (thus requiring us to implement "immediate mode" ourselves) or a required dependency. | ||
- The background runner should probably have a name of some sort (and be consistent around terminology eg tasks) | ||
- Should Django ORM support be a day-1 feature? (Wider support base, but delays release) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional questions
- Should we integrate with audit logs or reporting
- Should there be some kind of UI built to review status/logs from these tasks
- What other Wagtail (or Django) packages exist
- Should a more generic problem be looked at (message queue/publish / subscribe etc), for example being able to request a webhook (or set of webhooks) when certain actions are made, probably out of scope but just a thought
- is it worth preparing a Wagtail package that implements this RFC (as much as possible), in a similar process to https://github.com/wagtail/wagtail-generic-chooser (although, that is not an RFC), this way it can get some practical usage and feedback before going into core... or maybe an isolated package will be good enough
Some relevant links
- No longer use celery for sending notifications wagtail#1120 (looks like Django Celery was used a long time ago for sending emails)
- https://github.com/neon-jungle/wagtail-linkchecker/ package using Celery to check links across the site
- https://www.accordbox.com/blog/how-to-setup-celery-django/ tutorial for Django (not Wagtail)
- https://github.com/rq/django-rq & https://github.com/rq/rq (however, this is redis specific)
General feedback
- I actually thought this week about a Workflow Task that does a background job (e.g. cliche SEO check or maybe runs Google lighthouse on the draft published page), this RFC would be a great candidate for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some kind of UI built to review status/logs from these tasks?
I suspect not. These tasks are really just an implementation detail, as opposed to anything someone may want to monitor. With that said, I could definitely see a use for a list of scheduled tasks, giving admins the ability to trigger them manually if they wish?
What other Wagtail (or Django) packages exist
See "Why Huey?" section
Should a more generic problem be looked at (message queue/publish / subscribe etc), for example being able to request a webhook (or set of webhooks) when certain actions are made, probably out of scope but just a thought
I think they can be the same thing. Doing external network requests in the request/response cycle can hurt performance in many cases, so if the webhook doesn't respond immediately it'll slow everything down. Given that, webhooks could definitely be an interesting (future) extension, and probably warrants having background workers regardless.
is it worth preparing a Wagtail package that implements this RFC
In an ideal world, yes, although some of the benefits of background workers are in moving parts of wagtail-core into the background, which can't be done with an external package. With that said, integration with things like signals and hooks could definitely be done externally.
One of the main benefits of this solution though is that it's built in. If it's an external package, it just becomes another competing standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @RealOrangeOne!
text/072-background-workers.md
Outdated
|
||
By default, Huey has no integrations with Django, allowing it to work in isolation, reducing any potential impact on people's existing sites. Huey does ship with an [optional Django integration](https://huey.readthedocs.io/en/latest/contrib.html#django), but it makes several assumptions about how Huey is used, and would conflict with users already using Huey or with any extended customization we may need. | ||
|
||
Whilst this proposal also covers scheduled tasks for Wagtail, enabling those should be opted in separately to task scheduling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little unclear on the scheduled tasks part of this. Would we replace crons (such as publish_scheduled_pages
) as Huey periodic tasks, or would they remain as crons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My imagining was both. They'd remain as management commands exactly as they are now, but there'd be the option of triggering them as scheduled tasks through a setting in settings.py
. I've just realised I've not fleshed out what that API would look like yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Is the main benefit for the user that they wouldn't need to set up separate scheduled tasks if they're already running a worker, or are there other benefits you're considering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partly that - may save on infrastructure costs. But it also adds the ability for installed packages to "register" scheduled tasks which just get run automatically, without the user having to do anything, or even know / remember they're there.
text/072-background-workers.md
Outdated
|
||
Using this object, tasks are submitted using the existing Huey constructs and patterns. | ||
|
||
For Wagtail [hooks](https://docs.wagtail.io/en/stable/reference/hooks.html), there will be an additional property passed when registering the hook, which will transparently convert the hook to a task, and ensure it's submitted as a task when the hook should be called. Only certain hooks will support background tasks. Others, such as those for registering URLs or menu items must be run synchronously, and so will ignore the background argument. This will only be applicable for certain hooks, and will do nothing when passed to these hooks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it is the responsibility of the caller of the hook to decide whether it should be run asynchronously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be cases with some hooks where the user wants it to run synchronously, which I've intentionally not ruled out. It will have to be down to the caller of the hook as to whether it's possible for it to be run async, though.
Great Work Guys, KEEP GOING PLEASE |
Completely agree. Wagtail can't be assuming anything on how people are setup. Some hosting environments don't have the ability to run background tasks like this, and we don't want to limit where we can run.
This may be true for the things which are currently run in the background, but as this integration develops, things which currently block web requests will be moved over too, possibly allowing more features if we know they're not time sensitive. It's entirely possible some people won't need the background workers, but I highly doubt that all of those people wouldn't see a benefit from deploying them. Whether that deployment is worthwhile to them is obviously 100% subjective and not something we can assume, again hence making it both optional and opt-in. |
…lementing a full runner ourselves
I've re-written the implementation chunk of this RFC, specifically to change to more of a backend-based approach rather than having Wagtail implement a lot of it itself or depend fully on Huey. This makes the implementation far more generic and powerful, and reduces the amount of complexity added into Wagtail itself. Thanks to @tomusher for starting the thoughts around this new direction and initial sketches. |
Great work writing this up... and I agree this would be very beneficial to have in Wagtail. While the lowest friction option is to use an existing task scheduler system (Huey, Celery, etc.) those are going to have a couple design challenges that will make this feature harder to adopt. Data storage: Huey supports SQLite and Redis, celery supports Redis. It would be really nice if the datastore could be the same as the wagtail database to avoid adding another layer of complexity. That means task queuing would ideally be backed by Django models. The database is already the source of truth... there is no reason to require yet another service in the mix. Task runner process: The hosting provider will most likely have to run a second process (aside from WSGI/server) for the task scheduler. It would be really beneficial if whatever task scheduler backends provided by wagtail have a uniform way of spawning this process from Python, similar to how WSGI works (e.g. Django has a Management of this process becomes more complicated with distributed setups (e.g. people running wagtail via docker images, multi-server setups, etc.). Because no one backend would be in charge of running the process, so you'd want a dedicated process server. I'm not saying we need to solve for that scenario, just provide official guidance on how to handle it. |
The ORM backend will yes use the same database as Wagtail is using to avoid complexity.
The original Huey implementation of this RFC allowed both a long running process, and a cron-based "run everything then terminate" style executor. Whilst it's down to whatever task runner a user runs as to how they want to run the actual workers, I think yes the ORM backend in Wagtail should have the ability to run under the cron-style runner so it doesn't depend on a dedicated process. In theory it'd be possible to do things in the same process, but that'd depend strictly on ASGI.
|
Not to today... Just forgot to update it before.
Discussion thread -> wagtail/wagtail#7750 (Specific comments on PR still more than welcome!) |
Cron-style tasks are much more complex if the job queue doesn't natively support them (which many don't). It's also often much more efficient to handle this scheduling externally.
Saw this Python library pop up and thought it would be a good link to go here https://rocketry.readthedocs.io/en/stable/ Not saying we should use it but it has a nice API and may serve as some inspiration for parts of this RFC. |
I discovered this one as well that is based on Postgres https://procrastinate.readthedocs.io/en/stable/discussions.html#how-does-this-all-work No additionnel dependencies needed, only Postgres |
Hello, everyone! Just chiming in with a package suggestion. There is django-q2, which integrates seamlessly with Django and supports multiple backends, including the Django ORM, Redis, etc. |
Hi all. There's a massive appetite for a solution like this in Django itself. As per a fedi-thread, I'd like to suggest making this RFC into a DEP and pushing on getting it into Django. From the thread:
To which...
My view is that Django should have had a story here a while back. It's something that should be part of the core framework. That you're putting in the effort here, it's time. We should point that at core, and then make it happen. I'm happy to help with a DEP and play the shepherd role there if that's of help. (Also, ref @Tobi-De's comment, Django-Q(2) has a lot of good bits to take inspiration from.) |
@carltongibson thanks! ❤️ Expect a draft DEP from me in the coming days. We discussed this in a Wagtail Core Team meeting this morning, and agree it makes sense to create the initial draft package externally to both Wagtail and Django, to allow Wagtail to use it sooner, with the intention to upstream it into Django in parallel. |
@RealOrangeOne AWESOME! 🤩 |
Simply magnificent, the best thing I've read this morning. |
Hey, do you know if this is fully implemented? Ig this was one of the past GSOC ideas. If it's not, can it be done in GSOC 2024? |
This idea has moved to django itself so this framework is going to be developed and shipped as part of django once it's rfc planning is complete and merged |
The core implementation of this is now a DEP: django/deps#86. It's likely Wagtail will be one of the first deployed uses of it, but for now the proposal process and discussion has moved there. |
👋 I’m going to close this now, as the RFC has been replaced by a DEP. There’ll be follow-up work needed in Wagtail to leverage those new capabilities but this either won’t require an RFC, or would require one that’s more specific to the use case. |
Just to add, a draft PR has been made for a POC of turning Wagtail's mechanisms into tasks for django-tasks: wagtail/wagtail#12040 |
View rendered version