Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set max consecutive futile launches #102

Merged
merged 4 commits into from
Jul 28, 2023
Merged

Set max consecutive futile launches #102

merged 4 commits into from
Jul 28, 2023

Conversation

wlandau
Copy link
Owner

@wlandau wlandau commented Jul 28, 2023

Prework

Related GitHub issues and pull requests

Summary

This PR implements a new launch_max argument to the controllers and launchers which sets the maximum number of allowed launch attempts in a row which do not complete any tasks. @shikokuchuo suggested this as a robust uniform solution to wlandau/crew.cluster#19 and shikokuchuo/mirai#20. Indeed it prevents a nightmare of indefinitely racking up costs on a malfunctioning platform, but given #79, launch_max can almost never be zero. So at best, this PR is only an approximate solution. I hope there is a more precise / less wasteful way to solve this long-term. Maybe it will have to involve understanding why some workers become backlogged in the first place and to prevent that outcome entirely.

FYI @multimeric

@wlandau wlandau changed the title Set max futile launches Set max consecutive futile launches Jul 28, 2023
@codecov-commenter
Copy link

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (cd1f4e9) 100.00% compared to head (49f3039) 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #102   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           22        22           
  Lines         1261      1285   +24     
=========================================
+ Hits          1261      1285   +24     
Files Changed Coverage Δ
R/crew_controller_local.R 100.00% <100.00%> (ø)
R/crew_launcher.R 100.00% <100.00%> (ø)
R/crew_launcher_local.R 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shikokuchuo
Copy link
Contributor

Re. your comment on #79, whether a worker is 'backlogged' shouldn't affect the actual completed count - if a worker does a task this will increment regardless. I think you'd want to set it higher than 1 to allow for random errors though.

@wlandau
Copy link
Owner Author

wlandau commented Jul 28, 2023

By the way: @shikokuchuo, it's starting to look like shikokuchuo/mirai@3f15ead really fixed #76 and ropensci/targets#1101! I almost always get lots of host segfaults and dangling dispatchers when I develop and test crew for a feature like this, but this time around all those problems went away! Not a single instance.

@wlandau
Copy link
Owner Author

wlandau commented Jul 28, 2023

Re. your comment on #79, whether a worker is 'backlogged' shouldn't affect the actual completed count - if a worker does a task this will increment regardless. I think you'd want to set it higher than 1 to allow for random errors though.

So then I can trust that backlogged workers will have at least 1 completed task? I thought if completed < assigned then it might be the case that completed = 0 when it should be 1 (or at least 1 task should have actually completed).

@shikokuchuo
Copy link
Contributor

shikokuchuo commented Jul 28, 2023

Re. your comment on #79, whether a worker is 'backlogged' shouldn't affect the actual completed count - if a worker does a task this will increment regardless. I think you'd want to set it higher than 1 to allow for random errors though.

So then I can trust that backlogged workers will have at least 1 completed task? I thought if completed < assigned then it might be the case that completed = 0 when it should be 1 (or at least 1 task should have actually completed).

Yes, it is hard to remember how things were... but now it's all cumulative stats. So if 'backlogged' as you've termed it - assigned will be 1 larger than complete. At most 1. All that means is there is a task waiting to be completed at the socket. If a daemon dials in and completes that task complete will increment. Conversely, if it wasn't backlogged, assigned = complete to start with, then it gets sent a task, it completes it, both increase. So whether it is 'backlogged' shouldn't affect this feature. I just wanted to be sure there wasn't something getting mixed up here.

@wlandau wlandau merged commit 464d5c2 into main Jul 28, 2023
@wlandau wlandau deleted the 101 branch July 28, 2023 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants