-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel is already publishing-- revert late acknowledgment #3968
Comments
@bjester sir, I would like to work on this. I'll need some background understanding of this. Let us connect on Slack about this sir. |
My specific question is: When the task is under processing, celery should know that the task is under processing since the worker has not acknowledged its completion/failure...? so why does it redeliver to worker for duplicate processing? |
Celery supports many different backends, both for queuing and result storage, so its underlying behavior is somewhat more generic and follows a 'message queue' architecture. In simple terms, the mechanics of this 'message queue' fall into a few categories: a message, a producer, a queue, and a consumer. This structure decouples message consumption from the the queue handling and the producer logic. Since a task's status is a part of the consumer's role, that can't be relied upon in the mechanics of the message queue (and out of the box, celery supports saving task results, not pending tasks-- the existing of pending tasks in the database is something we added on ourselves). Message acknowledgement is the primary mechanism for interacting between the different parts, which removes the message from the queue. This allows celery to support services like RabbitMQ, whose docs have some additional info.
Technically, we configured celery this way, @vkWeb! It's currently configured to acknowledge messages after completion or failure. That allows us to take advantage of features (without extra development from ourselves) to ensure tasks are processed in the event of unexpected situations like the worker crashes, gets killed for some reason, or support delivery to another worker if another worker becomes available when all others are busy. That in itself is implementation specific to what the tasks actually do as well. The downside is that tasks need to be performant in order to prevent re-delivery if they were to take too long. So because we're making acknowledgement early, that means we'll need to do that 'extra development' in order to get that protection from worker issues. Making acknowledgement early is as simple as flipping a boolean: https://github.com/learningequality/studio/blob/unstable/contentcuration/contentcuration/utils/celery/tasks.py#L108 |
Summary
If async tasks aren't acknowledged within a certain amount of time, celery will redeliver them to a worker, which means more than one worker could be attempting to process channel changes (like publishing) at a time, resulting in the error below. Currently, tasks are configured to acknowledge 'late' meaning they're acknowledged once completed. This is a helpful feature if workers crash, go offline, or are interrupted by a release, since it means those tasks will be redelivered. But if a task takes a long time, then those could cause inadvertent problems from the concurrency.
Tasks
Sentry Issue: STUDIO-FC5
The text was updated successfully, but these errors were encountered: