Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛Fix infinite Waiting for cluster when dask-scheduler is restarted #5252

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Jan 18, 2024

What do these changes do?

This PR fixes #5237 by:

  • NOT swallowing a raised exception
  • a bit of refactoring
    I will intentionally not undergo heavy refactoring now

Related issue/s

fixes #5237

How to test

Dev Checklist

DevOps Checklist

@sanderegg sanderegg added the a:director-v2 issue related with the director-v2 service label Jan 18, 2024
@sanderegg sanderegg added this to the This is Sparta! milestone Jan 18, 2024
@sanderegg sanderegg self-assigned this Jan 18, 2024
Copy link

codecov bot commented Jan 18, 2024

Codecov Report

Attention: 19 lines in your changes are missing coverage. Please review.

Comparison is base (42d7630) 86.1% compared to head (fbba568) 68.7%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #5252      +/-   ##
=========================================
- Coverage    86.1%   68.7%   -17.4%     
=========================================
  Files         864     535     -329     
  Lines       38093   26897   -11196     
  Branches      347     202     -145     
=========================================
- Hits        32814   18491   -14323     
- Misses       5202    8356    +3154     
+ Partials       77      50      -27     
Flag Coverage Δ
integrationtests 65.0% <60.0%> (+10.8%) ⬆️
unittests 84.6% <70.7%> (-0.5%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...mcore_service_director_v2/models/dask_subsystem.py 100.0% <100.0%> (ø)
...rector_v2/modules/comp_scheduler/dask_scheduler.py 89.7% <100.0%> (-0.1%) ⬇️
...tor_v2/modules/db/repositories/comp_tasks/_core.py 98.0% <100.0%> (ø)
...r-v2/src/simcore_service_director_v2/utils/dask.py 89.8% <100.0%> (+<0.1%) ⬆️
...simcore_service_director_v2/modules/dask_client.py 92.5% <84.8%> (-0.7%) ⬇️
...rector_v2/modules/comp_scheduler/base_scheduler.py 86.2% <41.6%> (-0.3%) ⬇️

... and 592 files with indirect coverage changes

@sanderegg sanderegg marked this pull request as ready for review January 19, 2024 15:28
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Did a pair review and concluded some interesting ideas:

@sanderegg sanderegg force-pushed the comp-backend/bugfix/waiting-for-cluster branch from 01ec040 to b866bc0 Compare January 19, 2024 16:48
Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sanderegg sanderegg force-pushed the comp-backend/bugfix/waiting-for-cluster branch from b866bc0 to fbba568 Compare January 22, 2024 07:58
Copy link

sonarcloud bot commented Jan 22, 2024

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@sanderegg sanderegg merged commit e32946f into ITISFoundation:master Jan 22, 2024
55 checks passed
@sanderegg sanderegg deleted the comp-backend/bugfix/waiting-for-cluster branch January 22, 2024 08:49
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Feb 14, 2024
39 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:director-v2 issue related with the director-v2 service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

directorv2 fails to reconnect to new scheduler and remains "waiting for cluster"
3 participants