-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PolicyAPIClient.IncomingLink() may incorrectly report Policy validation error if executed during service-controller restart #753
Comments
I will improve error handling here to present a meaningful error message, stating it is unable Is this just happening while validating incoming links? |
I can't really tell. As far as I can tell, it happened only once, yesterday. It did not happen again today. In any case, the root cause seems to be on |
Another occurrence today, on a different test (tcp.TestIperf.skupper-iperf3-sites_1-size_1G-clients_1):
|
Signed-off-by: Danilo Gonzalez Hashimoto <[email protected]>
* Handling controller restarts during policy validation Fixes #753 * Updated retry interval for policy validation and improved error on timeout
* Handling controller restarts during policy validation Fixes #753 * Updated retry interval for policy validation and improved error on timeout
Signed-off-by: Danilo Gonzalez Hashimoto <[email protected]>
Signed-off-by: Danilo Gonzalez Hashimoto <[email protected]>
Signed-off-by: Danilo Gonzalez Hashimoto <[email protected]>
Signed-off-by: Danilo Gonzalez Hashimoto <[email protected]>
Signed-off-by: Danilo Gonzalez Hashimoto <[email protected]>
Description
During a full integration test on a cluster with Policy CRD and an all-allowing policy (all fields with
'*'
ortrue
), the HipsterShop test failed with the output below:Looking at the code,
PolicyAPIClient.IncomingLink()
callsget
on theservice-controller
container. Error137
indicates a process that was killed withSIGKILL
.How to reproduce
I'll try to create a reproducer that uses the actual code. Getting the
137
return code should not be difficult. However, reproducing the exact situation from the HipsterShop that causes the pod to restart and generate the137
might be impossible.Meanwhile, I have reproduced only the
137
response with the following process:get policies -h
podman exec
on a loop, as the error will be different)oc delete pod skupper-service-controller-76d867f558-mm2vz
Sample output:
Suggestions
PolicyAPIClient.execGet()
callsget
twice: one with-h
, and the other with the actual query. The fix needs to be done on both places, as either process may be the one killed during restartget
already handles return code 1. I'd suggest that any return codes above 128 be also handled, with a message that the process was killed by a signalWith that said, should HipsterShop have failed in the first place, because of a restarting
service-controller
pod? Would it have, if policies were disabled? Perhaps, besides fixing the message, a further check should be done on how activating the policies impacts the handling of restarting pods by pre-existing pieces of the code.Other notes
As a side note (as this is probably
assert.Assert
's fault, the errorassertion failed: error is not nil: Policy validation error: command terminated with exit code 137: unable to create token to public1 cluster
looks weird, in that it indicates thatunable to create token to public1 cluster
would be wrapped withincommand terminated with exit code 137
, which is not the case.I'm not sure whether that can be fixed, but it surely confuses things.
The text was updated successfully, but these errors were encountered: