-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): Ensure job processor does not reprocess amended executions #11438
fix(core): Ensure job processor does not reprocess amended executions #11438
Conversation
* cause a crashed execution to be enqueued. We refrain from processing it, | ||
* until we have reworked both mechanisms to prevent this scenario. | ||
*/ | ||
if (execution.status === 'crashed') return { success: false }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using the return value for anything?
I only found this function being called here, and there we're not using the result:
await this.jobProcessor.processJob(job); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically we never used the result, we've only relied on promise resolution or rejection. The important bit is the early return to resolve the promise without any reprocessing of the crashed execution. I went for { success: false }
to keep the return type simple, aligned with what is already there.
n8n Run #7595
Run Properties:
|
Project |
n8n
|
Branch Review |
pay-2100-cannot-read-properties-of-undefined-reading-node
|
Run status |
Passed #7595
|
Run duration | 04m 36s |
Commit |
3ffc4b7c0c: 🌳 🖥️ browsers:node18.12.0-chrome107 🤖 ivov 🗃️ e2e/*
|
Committer | Iván Ovejero |
View all properties for this run ↗︎ |
Test results | |
---|---|
Failures |
0
|
Flaky |
0
|
Pending |
0
|
Skipped |
0
|
Passing |
460
|
View all changes introduced in this branch ↗︎ |
✅ All Cypress E2E specs passed |
Got released with |
Summary
This PR fixes an edge case caused by an unwanted interaction between Bull's implicit retry mechanism and n8n's execution recovery mechanism.
When a worker crashes mid-execution, it fails to send a heartbeat back to Redis, prompting Bull to re-enqueue the job. When the worker restarts, its event bus looks at the filesystem logs and marks the execution as
crashed
and amends its data. But, the job is still in the queue. When this worker (or another worker, makes no difference) receives the job, the processor tries to run the execution but finds its amended data is lacking some of the data needed for execution, henceCannot read properties of undefined (reading 'node')
whereundefined
should have been the zeroth item in thenodeExecutionStack
.This unwanted interaction comes from two legacy systems that we are planning to rework in future, so this PR ensures that amended executions cannot be reprocessed by workers.
Related Linear tickets, Github issues, and Community forum posts
Review / Merge checklist
release/backport
(if the PR is an urgent fix that needs to be backported)