-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wreck: add support for parallel debuggers with wreckrun #16
wreck: add support for parallel debuggers with wreckrun #16
Conversation
This is useful @grondo for coordination! Do you plan to backport PID support in your branch? If not, I can also do that as well. |
Codecov Report
@@ Coverage Diff @@
## master #16 +/- ##
==========================================
- Coverage 81.87% 80.18% -1.69%
==========================================
Files 325 196 -129
Lines 52209 35131 -17078
==========================================
- Hits 42745 28170 -14575
+ Misses 9464 6961 -2503
|
01dc877
to
d3e8aff
Compare
If FLUX_WRECKRUN_JOBID_FD is set in environment, write jobid to this file descriptor as soon as the jobid is known and close the fd. This allows synchronization with an external command executing wreckrun as a child process.
Add a wreck:setopt() method to allow setting options manually in wreckrun or submit.
Add a new --jobid option to flux-wreckrun to allow running a job across the same ranks as a previous job. By default, ntasks is set such that one task per rank is executed.
Problem: The --detach option did not exit wreckrun when a job scheduler is used (or without --immediate), however it still resulted in disabling I/O, which is very confusing. When jobs are being scheduled, treat --detach just like --wait-until=runrequest Fixes flux-framework#10
When stop-children-in-exec option is used, write a "proctable" entry in KVS for each task for parallel debugger support.
9302bf7
to
6c29c6a
Compare
I've push a couple more commits here to add basic testing and documentation of the new wreckrun Given our dwindling emphasis on this older version, I'm not sure our time is best spent creating exhaustive tests for this feature, and perhaps we can think about merging this PR as is. What do you think @dongahn? |
Yes we want to minimize our effort to this branch. I just used |
You may want to first rebase your PR on top of this branch and ensure something I've done here doesn't inadvertently cause you trouble. If that goes well, then let's merge this PR and then you can rebase the |
Nice work! 👍 |
Hm, I forgot to also test procdesc functionality. That shouldn't take too long so I'll add that here quickly |
Ok. But procdesc should be exercised and tested with job-debug tests. So even if you don't have tests for them, that will have some coverage. |
Ok I found a couple of other fixes needed in wreck, and added just 2 sanity checks for proctable support. Later I'll expand on the commit messages and do a double check, but I think this PR will be ready to go (assuming it even passes travis) |
2375c93
to
722a32d
Compare
Add short description of the new flux-wreckrun --jobid option.
Add basic sanity check for flux wreckrun --jobid.
Problem: To release jobs from the "sync" state a SIGCONT signal needs to be sent, but flux-wreck kill will only kill a job in the running state. Allow jobs in "sync" state as well as "running" to be sent a signal.
Track when a job has been stopped in exec in the "sync" state and transition to "running" when SIGCONT is sent.
Add a very simple set of tests to ensure that wreck jobs emit per-task procdesc to kvs on '-o stop-children-in-exec' and the wreck.<id>.proctable event.
722a32d
to
4290ddf
Compare
Thanks @dongahn! I just force-pushed a fix for the doc bug you found, and removed "WIP" from the title. |
Thanks! |
Still a work in progress (needs testing), but I thought it might be useful to open a PR since I'll be out the next few days.
This PR is in support of #12. It adds the
--jobid
option to wreckrun as well as simplified support forFLUX_WRECKRUN_JOBID_FD
as needed by the proposedflux job-debug
.