-
-
Notifications
You must be signed in to change notification settings - Fork 660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fail fast may cause Serial spec or cleanup Node interrupted #1178
Conversation
hey there - thanks for taking the time to put together a reproducer and for proposing a fix. this part of the codebase is some of the most complex and I appreciate the effort! I agree that cleanup nodes should not be interrupted when a different process fails during
Also - we'll need to add tests for these in internal_integration. Particularly for the |
Hi @onsi , I'm not so sure how to call |
no worries - i'll try to work on this in the next couple of days. |
hey - sorry, i had another project take priority over this. it's still on my radar, but it's going to be a few weeks before i can get to it |
1. inter-process aborts should not interrupt cleanup nodes 2. whenever we fetch interrupt status, check and see if an abort has happened. if it has ensure we return the latest, correct, abort state. this allows us to avoid accidentally starting the next spec because the ABORT_POLLING_INTERVAL hasn't fired yet
2a8be0e
to
15b4871
Compare
hey - i ended up prioritizing this today to get it done. CI is running now but assuming it succeeds i'll be pulling this in and cutting a release. i added tests to cover the case of abort interrupting cleanup nodes. i also came up with a different approach for closing the gap between when an abort occurs and when the other ginkgo processes notice. now instead of just polling we also check whenever interrupt Status is requested. this ensures that nothing that shouldn't run ever runs (not just serial specs) |
Problem
When using
fail-fast
, ginkgo start a goroutine to poll from server every 500ms (ABORT_POLLING_INTERVAL
) to see if current spec should be interrupted by other process.There is a chance that before Serial Spec or cleanup Node start to run, the interruption status is still unset, so they won't be skipped and start to run. However, it may get interrupted by other process during running.
Please refer to https://github.com/cvvz/go-playground/tree/master/ginkgo/sample for the steps of reproduction.
Resolution
ABORT_POLLING_INTERVAL
before Serial Spec start to run, this will make sure the interruption status is the latest.