Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): Filter out certain executions from crash recovery #9904

Merged
merged 4 commits into from
Jul 2, 2024

Conversation

ivov
Copy link
Contributor

@ivov ivov commented Jul 1, 2024

Summary

This PR fixes an unwanted interaction between concurrency control and crash recovery, by logging new events (execution throttled, execution started during bootup) and filtering out those executions from the log writer's check.

Context

During startup we start running any executions that had been enqueued at the time of shutdown, running them up to the concurrency limit and enqueuing the excess. This does not block, so that concurrency control can let through or throttle these executions as usual.

During startup we also have the execution recovery service, which receives event logs about unfinished executions and uses them to amend data in executions truncated by an instance crash. It is the log writer that collects these event logs about unfinished executions, by checking for events written to n8nEventLog-{n}.log.

Hence during startup we move executions to running status up to the limit and keep the excess with new (i.e. enqueued) status, but because all these executions have not finished yet, the log writer reports them to the recovery service, which marks them as crashed.

Testing

Currently the log writer is not easily testable. To test manually:

  • Ensure you have no new or running executions. Set N8N_CONCURRENCY_PRODUCTION_LIMIT=2 and start the instance.
  • Create a workflow with a 2s cron and a 20s wait, activate it. Allow 2 running executions and a handful of new executions to accumulate.
  • Deactivate the workflow, stop the instance. After graceful shutdown there should be 2 success and a handful of new executions. Start the instance again.
  • Result - new executions are moved to running up to the limit, without being marked as crashed.

Related Linear tickets, Github issues, and Community forum posts

https://n8nio.slack.com/archives/C069HS026UF/p1719846780406959?thread_ts=1719825397.534179&cid=C069HS026UF

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Jul 1, 2024
@ivov ivov added the release/backport Changes that need to be backported to older releases. label Jul 2, 2024
@ivov ivov marked this pull request as ready for review July 2, 2024 07:52
@ivov ivov changed the title fix(core): Filter out enqueued executions from crash recovery fix(core): Filter out certain executions from crash recovery Jul 2, 2024
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from existing ones. At some point we need to simplify all this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only change in this file is adding a dependency.

Comment on lines +207 to +211
if (config.getEnv('executions.mode') === 'regular') {
// removal of active executions will no longer release capacity back,
// so that throttled executions cannot resume during shutdown
this.concurrencyControl.disable();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

Copy link

cypress bot commented Jul 2, 2024

1 flaky test on run #5743 ↗︎

0 399 0 0 Flakiness 1

Details:

🌳 🖥️ browsers:node18.12.0-chrome107 🤖 ivov 🗃️ e2e/*
Project: n8n Commit: 63b527d2b5
Status: Passed Duration: 04:47 💡
Started: Jul 2, 2024 1:40 PM Ended: Jul 2, 2024 1:45 PM
Flakiness  e2e/5-ndv.cy.ts • 1 flaky test

View Output Video

Test Artifacts
NDV > should not retrieve remote options when required params throw errors Screenshots Video

Review all test suite changes for PR #9904 ↗︎

Copy link
Contributor

github-actions bot commented Jul 2, 2024

✅ All Cypress E2E specs passed

@ivov ivov merged commit 7044d1c into master Jul 2, 2024
27 checks passed
@ivov ivov deleted the filter-out-enqueued-executions-from-crash-recovery branch July 2, 2024 15:07
cstuncsik pushed a commit that referenced this pull request Jul 3, 2024
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <[email protected]>
@github-actions github-actions bot mentioned this pull request Jul 3, 2024
@janober
Copy link
Member

janober commented Jul 3, 2024

Got released with [email protected]

@github-actions github-actions bot mentioned this pull request Jul 3, 2024
adrian-martinez-onestic pushed a commit to onesdata/n8n-fork that referenced this pull request Jul 8, 2024
…9904)

Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team release/backport Changes that need to be backported to older releases. Released
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants