-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wreck: easily get the nodelist/hostlist for a job #866
Comments
I can take a crack at this. Are you thinking an environment variable like FLUX_NODELIST or similar? Or something in the kvs? Or both? |
I think @garlick had mentioned that it would be nice if
The next step would just be to have something during wreck launch translate the list of actual ranks allocated to their hostnames and store in kvs -- this could perhaps be done by a lua plugin... |
I was thinking it would be useful in the kvs, then the lua plugin that handles environment could just pull it out. For this case I was also thinking of using a wreck/lua plugin to generate the hostlist from resource.hwloc data in wrexecd on rank 0 at launch time. |
I'm already playing around with the lua plugin approach, but if you have other ideas or want to take this over just LMK @chu11. I actually only have an hour or so left before I have to take a kid to cross country meet... I'll push up my resource-hwloc changes to a branch and reference here. |
@grondo No particular ideas, just an area of code I haven't dug into yet. If you got it, that's cool. |
Ok, I've saved state to my wreck-hostlist branch. Currently, it works well enough to export the hostlist to a |
BTW, I kind of backed off the approach in the wreck-hostlist branch mentioned here -- it is not going to be scalable to have rank 0 wrexecd walk through each rank dir to build the hostlist itself. For now, @chu11, I think the best way to generate a hostlist might be to just run |
@grondo Ok, haven't looked through that branch yet anyways. I'll do something like you suggest for the time being. It's minimally nicer than my xml grepping hack through the kvs. |
This was predominantly an exercise and to find subtle issues within Flux when running w/ Magpie. It was experimented on around v0.5.0 release of Flux. See following issues as examples as discussion that occurred: flux-framework/flux-core#883 flux-framework/flux-core#864 flux-framework/flux-core#866 For info on how to run see magpie.sbatch-srun-spark-test.sh example. See information at the top for how to run and see info at the bottom for workarounds for flux.
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1468 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1534 Closes flux-framework#1468 Closes flux-framework#1443 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#1407 Closes flux-framework#1393 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1534 Closes flux-framework#1468 Closes flux-framework#1443 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#1407 Closes flux-framework#1393 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1534 Closes flux-framework#1468 Closes flux-framework#1443 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#1407 Closes flux-framework#1393 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
closed by #1988 |
The wreck exec system is worthless, remove it along with associated commands, tests, and support code. Since libjsc doesn't work without wreck, it is removed as well. Fixes flux-framework#1984 Closes flux-framework#1947 Closes flux-framework#1618 Closes flux-framework#1595 Closes flux-framework#1593 Closes flux-framework#1468 Closes flux-framework#1438 Closes flux-framework#1419 Closes flux-framework#1410 Closes flux-framework#915 Closes flux-framework#894 Closes flux-framework#866 Closes flux-framework#833 Closes flux-framework#774 Closes flux-framework#772 Closes flux-framework#335 Closes flux-framework#249
As @chu11 mentions in #864, it isn't currently straightforward to get a hostlist or even a rank list for a job run with
flux wreckrun
. This should not be too difficult to accomplish, and has been shown to be useful/required for 3rd party tools.The text was updated successfully, but these errors were encountered: