fix(core): Filter out certain executions from crash recovery #9904

ivov · 2024-07-01T16:10:10Z

Summary

This PR fixes an unwanted interaction between concurrency control and crash recovery, by logging new events (execution throttled, execution started during bootup) and filtering out those executions from the log writer's check.

Context

During startup we start running any executions that had been enqueued at the time of shutdown, running them up to the concurrency limit and enqueuing the excess. This does not block, so that concurrency control can let through or throttle these executions as usual.

During startup we also have the execution recovery service, which receives event logs about unfinished executions and uses them to amend data in executions truncated by an instance crash. It is the log writer that collects these event logs about unfinished executions, by checking for events written to n8nEventLog-{n}.log.

Hence during startup we move executions to running status up to the limit and keep the excess with new (i.e. enqueued) status, but because all these executions have not finished yet, the log writer reports them to the recovery service, which marks them as crashed.

Testing

Currently the log writer is not easily testable. To test manually:

Ensure you have no new or running executions. Set N8N_CONCURRENCY_PRODUCTION_LIMIT=2 and start the instance.
Create a workflow with a 2s cron and a 20s wait, activate it. Allow 2 running executions and a handful of new executions to accumulate.
Deactivate the workflow, stop the instance. After graceful shutdown there should be 2 success and a handful of new executions. Start the instance again.
Result - new executions are moved to running up to the limit, without being marked as crashed.

Related Linear tickets, Github issues, and Community forum posts

https://n8nio.slack.com/archives/C069HS026UF/p1719846780406959?thread_ts=1719825397.534179&cid=C069HS026UF

Review / Merge checklist

PR title and summary are descriptive. (conventions)
Docs updated or follow-up ticket created.
Tests included.
PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

ivov · 2024-07-02T07:54:38Z

packages/cli/src/eventbus/EventMessageClasses/EventMessageExecution.ts

Copied from existing ones. At some point we need to simplify all this.

ivov · 2024-07-02T07:55:04Z

packages/cli/src/concurrency/__tests__/concurrency-control.service.test.ts

Only change in this file is adding a dependency.

…-executions-from-crash-recovery

netroy · 2024-07-02T13:32:00Z

packages/cli/src/ActiveExecutions.ts

+		if (config.getEnv('executions.mode') === 'regular') {
+			// removal of active executions will no longer release capacity back,
+			// so that throttled executions cannot resume during shutdown
+			this.concurrencyControl.disable();
+		}


cypress · 2024-07-02T13:45:12Z

1 flaky test on run #5743 ↗︎

0	399	0	0	1

Details:

🌳 🖥️ browsers:node18.12.0-chrome107 🤖 ivov 🗃️ e2e/*
Project: n8n	Commit: `63b527d2b5`
Status: Passed	Duration: 04:47 💡
Started: Jul 2, 2024 1:40 PM	Ended: Jul 2, 2024 1:45 PM

e2e/5-ndv.cy.ts • 1 flaky test

View Output Video

Test		Artifacts
NDV > should not retrieve remote options when required params throw errors		`Screenshots` `Video`

Review all test suite changes for PR #9904 ↗︎

github-actions · 2024-07-02T13:45:14Z

✅ All Cypress E2E specs passed

Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <[email protected]>

janober · 2024-07-03T09:17:15Z

Got released with [email protected]

…9904) Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <[email protected]>

fix(core): Filter out enqueued executions from crash recovery

4c3b542

n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Jul 1, 2024

Fix graceful shutdown

c9f7d2e

ivov added the release/backport Changes that need to be backported to older releases. label Jul 2, 2024

Add event execution-started-during-bootup

b8ef71f

ivov marked this pull request as ready for review July 2, 2024 07:52

ivov changed the title ~~fix(core): Filter out enqueued executions from crash recovery~~ fix(core): Filter out certain executions from crash recovery Jul 2, 2024

ivov commented Jul 2, 2024

View reviewed changes

packages/cli/src/concurrency/__tests__/concurrency-control.service.test.ts

Copy link

Contributor Author

ivov Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only change in this file is adding a dependency.

Merge remote-tracking branch 'origin/master' into filter-out-enqueued…

63b527d

…-executions-from-crash-recovery

netroy approved these changes Jul 2, 2024

View reviewed changes

ivov merged commit 7044d1c into master Jul 2, 2024
27 checks passed

ivov deleted the filter-out-enqueued-executions-from-crash-recovery branch July 2, 2024 15:07

cstuncsik pushed a commit that referenced this pull request Jul 3, 2024

fix(core): Filter out certain executions from crash recovery (#9904)

2661c24

Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <[email protected]>

github-actions bot mentioned this pull request Jul 3, 2024

🚀 Release 1.48.3 #9920

Merged

janober added the Released label Jul 3, 2024

github-actions bot mentioned this pull request Jul 3, 2024

🚀 Release 1.49.0 #9927

Merged

adrian-martinez-onestic pushed a commit to onesdata/n8n-fork that referenced this pull request Jul 8, 2024

fix(core): Filter out certain executions from crash recovery (n8n-io#…

3c9a256

…9904) Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): Filter out certain executions from crash recovery #9904

fix(core): Filter out certain executions from crash recovery #9904

ivov commented Jul 1, 2024 •

edited

Loading

ivov Jul 2, 2024

ivov Jul 2, 2024

netroy Jul 2, 2024

cypress bot commented Jul 2, 2024

github-actions bot commented Jul 2, 2024

janober commented Jul 3, 2024

fix(core): Filter out certain executions from crash recovery #9904

fix(core): Filter out certain executions from crash recovery #9904

Conversation

ivov commented Jul 1, 2024 • edited Loading

Summary

Context

Testing

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

ivov Jul 2, 2024

Choose a reason for hiding this comment

ivov Jul 2, 2024

Choose a reason for hiding this comment

netroy Jul 2, 2024

Choose a reason for hiding this comment

cypress bot commented Jul 2, 2024

1 flaky test on run #5743 ↗︎

Review all test suite changes for PR #9904 ↗︎

github-actions bot commented Jul 2, 2024

janober commented Jul 3, 2024

ivov commented Jul 1, 2024 •

edited

Loading