Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exception handling for 5xx server errors #334

Merged
merged 4 commits into from
Mar 31, 2023
Merged

exception handling for 5xx server errors #334

merged 4 commits into from
Mar 31, 2023

Conversation

cms21
Copy link
Contributor

@cms21 cms21 commented Mar 28, 2023

Response to issue #325

@codecov-commenter
Copy link

codecov-commenter commented Mar 28, 2023

Codecov Report

Patch coverage: 14.28% and project coverage change: -0.10 ⚠️

Comparison is base (ba03a36) 61.29% compared to head (115725d) 61.20%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #334      +/-   ##
==========================================
- Coverage   61.29%   61.20%   -0.10%     
==========================================
  Files         157      157              
  Lines        9537     9543       +6     
  Branches     1232     1232              
==========================================
- Hits         5846     5841       -5     
- Misses       3432     3444      +12     
+ Partials      259      258       -1     
Impacted Files Coverage Δ
balsam/site/job_source.py 65.92% <14.28%> (-2.30%) ⬇️

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

logger.exception("Failed to communicate with server")
logger.exception(str(exec))
continue
jobs = self.session.acquire_jobs(**params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should still be guarded against a timeout error, because even if the client catches the error, it'll only retry a few times before backoff() gives up and re-raises an error, which would crash the site/launcher.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Misha! We are still catching the timeout. There is probably another round of site stability that we'll have to do, but this one guards against HTTPErrors which were making sites unresponsive immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants