-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config: enable crash_loop_limit by default #13431
config: enable crash_loop_limit by default #13431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crash loop limit enforcement is disabled in developer mode which is also set by rpk's dev-container mode.
The idea is that crash_loop_limit tracking has value only when running at scale and disabling it prevents inadverdent lockups (and hence better UX) when running in container mode / CI etc.
Thanks for explaining this context.
Should this be backported? It kinda seems like yes since the same basic problem can happen to older systems as well. cc @piyushredpanda ?
I think so too but Piyush had a different opinion. Piyush, can you please confirm? Think the ducktape failures are related, please don't merge, I need to take a closer look. |
Yep, I think I'm good to backport actually (tagged y'all in parallel in the slack thread internally :) ) |
avoiding force merge accident until failures are triaged
bb7c730
to
dc2b335
Compare
This disables it also for dev-container mode in rpk The idea is that crash_loop_limit tracking has value only when running at scale and disabling it prevents inadverdent lockups (and hence better UX) when running in container mode / CI etc.
Defaults to 5. If a broker shuts down uncleanly 5 times back to back, it is considered to be in a crash loop.
dc2b335
to
12b5afa
Compare
Test failures: Known issue: #11998 |
@dotnwat this is ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
was wondering if the extra checks of the tracker file existing would be needed if we did
auto limit = developer_mode() ? std::nullopt : config::node().crash_loop_limit.value();
/backport v23.2.x |
/backport v23.1.x |
Failed to create a backport PR to v23.1.x branch. I tried:
|
Defaults to 5. If a broker shuts down uncleanly 5 times back to back, it is considered to be in a crash loop.
Crash loop limit enforcement is disabled in
developer
mode which is also set by rpk'sdev-container
mode.The idea is that crash_loop_limit tracking has value only when running at scale and disabling it prevents inadverdent lockups (and hence better UX) when running in container mode / CI etc.
Backports Required
Release Notes
Features
developer
mode and rpk'sdev-container
mode.