-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that cc ng waits to populate routes - second try #132
Ensure that cc ng waits to populate routes - second try #132
Conversation
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/164745558 The labels on this github issue will be updated when the story is started. |
✅ Hey rowanjacobs! The commit authors and yourself have already signed the CLA. |
A little worried that we don't check that cc has come up at all, we just wait 60 seconds and hope. Maybe we could add a |
Yeah I think we should definitely keep in the I actually think switching it to hitting the internal address will help a lot, since hitting the external might make followup Cloud Controllers unlock too early. I'm also wondering if 60 seconds is too long. For an environment with just a few Cloud Controllers a few extra minutes is nothing, but for an environment with n Cloud Controllers we're saying that Also noodling a bit on how #154448217 plays into all of this... 🤔 It may have inadvertently made it worse.. or maybe not since I believe dependent |
[#164409886] **[Feature Improvement]** improve PDRATs reliability by improving the way autoscaling, usage-service and notifications block on CAPI availability in their BBR scripts Signed-off-by: Slawek Ligus <[email protected]>
We discussed this over a call with @oozie and @selzoc on Friday. Plan is to continue calling |
Putting the sleep back in after reviewing this [conversation](cloudfoundry#132). It seems to be have been added to allow time for route propagation
* Update post-backup-unlock.sh.erb If the timeout fails then the workers never get started, but monit will eventually restart the web process if the CF install eventually recovers, leaving a VM that is half working (with an unhealthy bosh state) after the script runs. We could also change the exit behavior of the time out with `set +x` (or is it e? I forget), but it would seem that the only point of timing out is to alert the operator to a possible issue since the CC API can still restart. * Update post-backup-unlock.sh.erb Putting the sleep back in after reviewing this [conversation](#132). It seems to be have been added to allow time for route propagation --------- Co-authored-by: MerricdeLauney <[email protected]>
Improve PDRATs reliability by improving the way autoscaling, usage-service and notifications block on CAPI availability in their BBR scripts
Signed-off-by: Slawek Ligus [email protected]
Thanks for contributing to the
capi_release
. To speed up the process of reviewing your pull request please provide us with:With this change, the unlock scripts for CC ng job will wait (60 seconds) until CC is probably back up.
This is similar to our previous PR, #131, but in this PR we sleep instead of attempting to check if CC is back up. This skips all the weirdness around maintenance mode.
As a result of this change the subsequent BBR jobs will not hit a 404/502 error when trying to target CC API endpoint.
This is an updated version of #131.
I have viewed signed and have submitted the Contributor License Agreement
I have made this pull request to the
develop
branchI have run CF Acceptance Tests on bosh lite- we have not done this because CATS does not cover this use case. We have run DRATS on a deployment of PAS 2.4 (HA) and SRT 2.4 (single CC VM) on GCP.cc @oozie @pivotaljohn / @mcwumbly