-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): Filter out certain executions from crash recovery #9904
fix(core): Filter out certain executions from crash recovery #9904
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from existing ones. At some point we need to simplify all this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only change in this file is adding a dependency.
…-executions-from-crash-recovery
if (config.getEnv('executions.mode') === 'regular') { | ||
// removal of active executions will no longer release capacity back, | ||
// so that throttled executions cannot resume during shutdown | ||
this.concurrencyControl.disable(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
1 flaky test on run #5743 ↗︎
Details:
e2e/5-ndv.cy.ts • 1 flaky test
Review all test suite changes for PR #9904 ↗︎ |
✅ All Cypress E2E specs passed |
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <[email protected]>
Got released with |
…9904) Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <[email protected]>
Summary
This PR fixes an unwanted interaction between concurrency control and crash recovery, by logging new events (execution throttled, execution started during bootup) and filtering out those executions from the log writer's check.
Context
During startup we start running any executions that had been enqueued at the time of shutdown, running them up to the concurrency limit and enqueuing the excess. This does not block, so that concurrency control can let through or throttle these executions as usual.
During startup we also have the execution recovery service, which receives event logs about unfinished executions and uses them to amend data in executions truncated by an instance crash. It is the log writer that collects these event logs about unfinished executions, by checking for events written to
n8nEventLog-{n}.log
.Hence during startup we move executions to
running
status up to the limit and keep the excess withnew
(i.e. enqueued) status, but because all these executions have not finished yet, the log writer reports them to the recovery service, which marks them as crashed.Testing
Currently the log writer is not easily testable. To test manually:
new
orrunning
executions. SetN8N_CONCURRENCY_PRODUCTION_LIMIT=2
and start the instance.running
executions and a handful ofnew
executions to accumulate.success
and a handful ofnew
executions. Start the instance again.new
executions are moved torunning
up to the limit, without being marked ascrashed
.Related Linear tickets, Github issues, and Community forum posts
https://n8nio.slack.com/archives/C069HS026UF/p1719846780406959?thread_ts=1719825397.534179&cid=C069HS026UF
Review / Merge checklist
release/backport
(if the PR is an urgent fix that needs to be backported)