-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add error handling for host allocation failures #1149
Add error handling for host allocation failures #1149
Conversation
Add logic to catch allocation errors while moving hosts and fail gracefully Related HD ticket: http://rails.spimageworks.com/helpdesk/tickets/423339
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change generally LGTM. Just a couple minor comments, mostly related to the warnings generated by the lint check.
self.cuebotCall(host.setAllocation, | ||
"Set Allocation on %s Failed" % host.data.name, | ||
allocation) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more specific exception type you can catch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not. The gRPC call returns a Rendevous exception, hence I parse the exception details to check if it's an allocation error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently stumbled upon a similar grpc exception handling issue and this is a wide spread issue, most locations on the code are catching a general exception to handle grpc exceptions.
A better option is to catch grpc.RpcError
and check the .details()
to check the msg.
try:
something_that_triggers_rendezvous()
except grpc.RpcError as rpc_error_call:
code = rpc_error_call.code() # (grpc.StatusCode.UNAVAILABLE, grpc.StatusCode.CANCELLED, ...)
details = rpc_error_call.details()
See for exemple connectGrpcWithRetries
on rqnetwork.py.
IMO this can be handled in another PR as it's a widespread implementation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcipriano waiting for your confirmation to merge this
Thank you for the approval. I don't seem to have write permissions so I am not able to merge the PR. |
Add logic to catch allocation errors while moving hosts
and fail gracefully