-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster: destroy on main thread #14954
Conversation
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
ClusterInfo is not really destroyed because mainthread dispatcher stops dispatching posted tasks. SSLContextManager and symbol tables are not happy: these two managers expect all resources destroyed. Before this commit the main thread dispatcher will drain the post queue but though, but the exit flag is fragile. Any advice on how to gracefully shut down dispatcher? @antoniovicente |
The tests are failing because symbol tables is not empty when the IsolatedStoreImpl is destroyed. I am investigating. But it's also good to hear if this is a big issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterInfo is not really destroyed because mainthread dispatcher stops dispatching posted tasks. SSLContextManager and symbol tables are not happy: these two managers expect all resources destroyed.
Before this commit the main thread dispatcher will drain the post queue but though, but the exit flag is fragile.
Any advice on how to gracefully shut down dispatcher? @antoniovicente
Clean shutdown of dispatchers is a fairly involved topic. The envoy dispatcher class hasn't really tried to shoot for clean shutdown. There are some things that could be done to move in that direction like:
- introducing a bool tryPost(cb) that rejects additional callbacks after shutdown.
- executing scheduled callbacks as part of the shutdown process
- force close downstream connections
- not sure what's relevant to timers
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Signed-off-by: Yuchen Dai <[email protected]>
Is tsan error a red herring?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
fixing format |
Signed-off-by: Yuchen Dai <[email protected]>
Thank you! |
@cpakulski The buggy behavior caused lots of churns in istio. See the PR description. Can we backport it? |
@lambdai Sure. To which releases? |
Thanks! The newly added SdsCdsIntegrationTest doesn't use any recent features. I am afraid all the stable releases are impacted. |
OK - will port to all 4 latest releases. |
Awesome. Thanks! |
Signed-off-by: Yuchen Dai <[email protected]> Signed-off-by: Christoph Pakulski <[email protected]>
* Dispatcher: keeps a stack of tracked objects. (#14573) Dispatcher will now keep a stack of tracked objects; on crash it'll "unwind" and have those objects dump their state. Moreover, it'll invoke fatal actions with the tracked objects. This allows us to dump more information during crash. See related PR: #14509 Will follow up with another PR dumping information at the codec/parser level. Signed-off-by: Kevin Baichoo <[email protected]> Signed-off-by: Christoph Pakulski <[email protected]> * cluster: destroy on main thread (#14954) Signed-off-by: Yuchen Dai <[email protected]> Signed-off-by: Christoph Pakulski <[email protected]> * Updated release notes. Signed-off-by: Christoph Pakulski <[email protected]> Co-authored-by: Kevin Baichoo <[email protected]> Co-authored-by: Yuchen Dai <[email protected]>
@cpakulski could you please confirm starting with version of envoy is this fixed? |
I do not remember the starting version. It was long long time ago. Please check the code. |
Commit Message:
Always destroy cluster info object on the master thread.
Fixed SDS churn and a rare crash case.
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
Fixes:
#13209
istio/istio#30199
istio/istio#28315
[Optional Deprecated:]
[Optional API Considerations:]