Channel is already publishing-- revert late acknowledgment #3968

sentry-io · 2023-03-03T19:41:58Z

Summary

If async tasks aren't acknowledged within a certain amount of time, celery will redeliver them to a worker, which means more than one worker could be attempting to process channel changes (like publishing) at a time, resulting in the error below. Currently, tasks are configured to acknowledge 'late' meaning they're acknowledged once completed. This is a helpful feature if workers crash, go offline, or are interrupted by a release, since it means those tasks will be redelivered. But if a task takes a long time, then those could cause inadvertent problems from the concurrency.

Tasks

Give feedback

Change back to early acknowledgment
Add a new management command reconcile_publishing_status, similar to reconcile_change_tasks which compares a channel's publishing status with any unapplied publishing change events or the presence of any queued or running apply_channel_changes_tasks for the same channel and resets the publishing status if there are none
Add a makefile target reconcile which will call both reconcile_* commands, for use in a k8s cron job
Options

Sentry Issue: STUDIO-FC5

ValidationError: [ErrorDetail(string='Channel is already publishing', code='invalid')]
  File "contentcuration/viewsets/channel.py", line 467, in publish_from_changes
    self.publish(
  File "contentcuration/viewsets/channel.py", line 484, in publish
    raise ValidationError("Channel is already publishing")

The text was updated successfully, but these errors were encountered:

vkWeb · 2023-03-06T08:07:55Z

@bjester sir, I would like to work on this. I'll need some background understanding of this. Let us connect on Slack about this sir.

vkWeb · 2023-03-06T08:14:25Z

My specific question is:
celery will redeliver them to a worker

When the task is under processing, celery should know that the task is under processing since the worker has not acknowledged its completion/failure...? so why does it redeliver to worker for duplicate processing?

bjester · 2023-03-06T15:49:42Z

Celery supports many different backends, both for queuing and result storage, so its underlying behavior is somewhat more generic and follows a 'message queue' architecture. In simple terms, the mechanics of this 'message queue' fall into a few categories: a message, a producer, a queue, and a consumer. This structure decouples message consumption from the the queue handling and the producer logic. Since a task's status is a part of the consumer's role, that can't be relied upon in the mechanics of the message queue (and out of the box, celery supports saving task results, not pending tasks-- the existing of pending tasks in the database is something we added on ourselves). Message acknowledgement is the primary mechanism for interacting between the different parts, which removes the message from the queue. This allows celery to support services like RabbitMQ, whose docs have some additional info.

When the task is under processing, celery should know that the task is under processing since the worker has not acknowledged its completion/failure...?

Technically, we configured celery this way, @vkWeb! It's currently configured to acknowledge messages after completion or failure. That allows us to take advantage of features (without extra development from ourselves) to ensure tasks are processed in the event of unexpected situations like the worker crashes, gets killed for some reason, or support delivery to another worker if another worker becomes available when all others are busy. That in itself is implementation specific to what the tasks actually do as well. The downside is that tasks need to be performant in order to prevent re-delivery if they were to take too long.

So because we're making acknowledgement early, that means we'll need to do that 'extra development' in order to get that protection from worker issues. Making acknowledgement early is as simple as flipping a boolean: https://github.com/learningequality/studio/blob/unstable/contentcuration/contentcuration/utils/celery/tasks.py#L108

bjester added DEV: backend P0 - critical Priority: Release blocker or regression labels Mar 3, 2023

bjester added this to the Studio: next major release milestone Mar 3, 2023

vkWeb self-assigned this Mar 12, 2023

vkWeb mentioned this issue Mar 14, 2023

Revert Celery Late Acknowledgement #3984

Merged

24 tasks

bjester closed this as completed in #3984 Apr 12, 2023

bjester mentioned this issue May 3, 2023

Hotfixes release: Kolibri 0.16 support #4002

Merged

bjester mentioned this issue Jul 5, 2023

Release v2023.07.05 - Kolibri 0.16 support #4187

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channel is already publishing-- revert late acknowledgment #3968

Channel is already publishing-- revert late acknowledgment #3968

sentry-io bot commented Mar 3, 2023 •

edited by vkWeb

Loading

Tasks

vkWeb commented Mar 6, 2023

vkWeb commented Mar 6, 2023

bjester commented Mar 6, 2023

Channel is already publishing-- revert late acknowledgment #3968

Channel is already publishing-- revert late acknowledgment #3968

Comments

sentry-io bot commented Mar 3, 2023 • edited by vkWeb Loading

Summary

Tasks

vkWeb commented Mar 6, 2023

vkWeb commented Mar 6, 2023

bjester commented Mar 6, 2023

sentry-io bot commented Mar 3, 2023 •

edited by vkWeb

Loading