-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exception handling for 5xx server errors #334
Conversation
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #334 +/- ##
==========================================
- Coverage 61.29% 61.20% -0.10%
==========================================
Files 157 157
Lines 9537 9543 +6
Branches 1232 1232
==========================================
- Hits 5846 5841 -5
- Misses 3432 3444 +12
+ Partials 259 258 -1
... and 3 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
balsam/site/job_source.py
Outdated
logger.exception("Failed to communicate with server") | ||
logger.exception(str(exec)) | ||
continue | ||
jobs = self.session.acquire_jobs(**params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should still be guarded against a timeout error, because even if the client catches the error, it'll only retry a few times before backoff()
gives up and re-raises an error, which would crash the site/launcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Misha! We are still catching the timeout. There is probably another round of site stability that we'll have to do, but this one guards against HTTPErrors which were making sites unresponsive immediately.
Response to issue #325