-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create data_fetch and set_metadata celery tasks #13655
Conversation
18a9faf
to
1715a64
Compare
That didn't seem very hard at first glance, but it looks like this is going to be a bit more complicated than I had anticipated. Once the task is actually running it is pretty hard to cancel reliably (that's true for celery and dask). We could:
I don't really love that the state callback approach means we'd have to identify potentially long running things and pass the callback around. The asyncio approach sounds really cool but might also be a bigger project. I think I'm going to try using a ProcessPool for the worker and see how well that would work. |
Failed attempt in mvdbeek@cd8da77 ... this fails because we can't (easily ....) pickle the complex arguments that our tasks receive. I tried cloudpickle which did a little better with the magic_partial functions, but we still hold on to a lot of unpicklable things, so I don't think this is going to be a good solution. I'll see if we can use asgiref's sync_to_async and async_to_sync to simulate canceling a work thread. |
Hmm, converting sync functions to coroutines and then canceling the coroutine also fails as soon as we're blocking on a synchronous function. Failed attempt in 2cd208e :(. This could be made to work if we convert the download and file handling to use async libraries, but that will fail at the latest when we get to the file sources, which don't support async operations at all and there's no plan to do this upstream, currently. |
6a7a385
to
67c5438
Compare
Another idea worth trying to bring down the pickling / process startup overhead is to spawn processes in forkserver mode. https://gist.github.com/mvdbeek/caa94edff5776ad078826376e9400c99 is how that could look like. Celery workers could have a pool of the size that is equivalent to their concurrency setting ready, with a datatype registry and an initialized model "ready to go". |
c97e37f
to
a389fe6
Compare
Should be handled by autouse fixture
This PR was merged without a "kind/" label, please correct. |
TODO:
How to test the changes?
(Select all options that apply)
License