-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: keep jobs in waiting list when queue is paused #2769
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments, more will come.
@@ -16,13 +16,13 @@ const originalTree = await flow.add({ | |||
name, | |||
data: { idx: 0, foo: 'bar' }, | |||
queueName: 'childrenQueueName', | |||
opts: { failParentOnFailure: true }, | |||
opts: { onChildFailure: 'fail' }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice change.
python/bullmq/scripts.py
Outdated
@@ -451,6 +462,15 @@ async def retryJobs(self, state: str, count: int, timestamp: int): | |||
result = await self.commands["moveJobsToWait"](keys=keys, args=args) | |||
return result | |||
|
|||
async def repairDeprecatedPausedKey(self, maxCount: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need proper documentation on how and when a call to this method is needed.
@@ -39,10 +39,8 @@ const logger = debuglog('bull'); | |||
|
|||
const optsDecodeMap = { | |||
de: 'debounce', | |||
fpof: 'failParentOnFailure', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't we need to keep this old mappings to not cause a data breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should introduce a "migration" mechanism here. Something like migration steps for going between versions that are run if required in Lua scripts for atomicity. But we would need to keep a version number in the meta key, adding complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can keep this mappings that will be only used to format options of jobs with these old values, but no migration is needed as we are evaluation old and new option in lua scripts, new jobs will use new option
src/classes/job.ts
Outdated
@@ -1210,27 +1208,6 @@ export class Job< | |||
throw new Error(`Delay and repeat options could not be used together`); | |||
} | |||
|
|||
if (this.opts.removeDependencyOnFailure && this.opts.failParentOnFailure) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice to be able to remove this
src/classes/queue.ts
Outdated
/** | ||
* Remove legacy markers before v5 | ||
*/ | ||
removeLegacyMarkers(): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation about this is also needed. Most importantly practical information in how to apply something like this in the context of a production deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a me migrations section for it
src/classes/queue.ts
Outdated
* | ||
* @param maxCount - Max quantity of jobs to be moved to wait per iteration. | ||
*/ | ||
async repairDeprecatedPausedKey(maxCount: number = 1000): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "repair" be the proper name or actually something like "migrate" would be more accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it, let me do it
docs/gitbook/guide/migrations/v6.md
Outdated
|
||
If you have paused queues after upgrading to this version. These jobs will be moved to wait state when initializing any of our instances (Worker, Queue, QueueEvents or FlowProducer). | ||
|
||
Paused key is not longer needed as this state is already represented by queue meta key. It also improve the process of pausing or resuming a queue as we don't need to rename any key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by meta key -> inside the meta key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"It also improves..."
|
||
Paused key is not longer needed as this state is already represented by queue meta key. It also improve the process of pausing or resuming a queue as we don't need to rename any key. | ||
|
||
## Remove legacy markers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are going to need more detailed information on how to safely perform the migration.
docs/gitbook/guide/migrations/v6.md
Outdated
|
||
# Migration to v6 | ||
|
||
Make sure to pass **skipMigrationsExecution** option in any of our instances as false in order to execute all necessary changes when coming from an older version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think default should be perform the migration? I think it makes more sense to have the reverse, default is not to do the migration, you must explicitly do it, In fact it would be better to give an error if the queue has not been migrated yet and require you to manually migrate first as part of your deployment steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically we should have only one way to migrate which would be using the run migrations utility, and not implicitly when the queue is instantiated.
docs/gitbook/guide/migrations/v6.md
Outdated
|
||
1. Pause your queues. | ||
2. Upgrade to v6. | ||
3. Instantiate any instance passing skipMigrationsExecution option as false where migrations will be executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels a bit awkward to require to use false to actually trigger a one time behaviour and then after that you must use true.
src/classes/queue-base.ts
Outdated
'Queue has pending migrations. See https://docs.bullmq.io/guide/migrations', | ||
); | ||
} else { | ||
return runMigrations(client, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we should silently run migrations here, this just opens a window to shoot ourselves in the foot.
src/classes/queue.ts
Outdated
* | ||
* @param maxCount - Max quantity of jobs to be moved to wait per iteration. | ||
*/ | ||
async migrateDeprecatedPausedKey(maxCount = 1000): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be moved to a migration step.
ARGV[1] count | ||
]] | ||
|
||
local maxCount = tonumber(ARGV[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this script. It should not be possible to have both wait and paused keys at the same time, either it is wait or paused but not both, so basically just renaming the key from paused to wait should be enough as a migration step + setting the queue status to paused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small comment in the python version.
python/bullmq/scripts.py
Outdated
@@ -465,6 +463,15 @@ async def retryJobs(self, state: str, count: int, timestamp: int): | |||
result = await self.commands["moveJobsToWait"](keys=keys, args=args) | |||
return result | |||
|
|||
async def migrateDeprecatedPausedKey(self, maxCount: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this? I think for migration users should use the migration script available in the NodeJs for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what I think we need in the python version is to throw an exception if you are trying to run this version on an older BullMQ version that has not yet been migrated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this logic in a following pr
This PR removes the legacy code regarding the old paused queue functionality, which was based on renaming the wait list key to paused. As paused keys are now handled as a queue state, we do not need the old mechanism anymore and so the code can be removed and simplified. We need some migration steps for old queues that may be in paused status at the time of the upgrade.