Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wreck: easily get the nodelist/hostlist for a job #866

Closed
grondo opened this issue Oct 21, 2016 · 9 comments
Closed

wreck: easily get the nodelist/hostlist for a job #866

grondo opened this issue Oct 21, 2016 · 9 comments

Comments

@grondo
Copy link
Contributor

grondo commented Oct 21, 2016

As @chu11 mentions in #864, it isn't currently straightforward to get a hostlist or even a rank list for a job run with flux wreckrun. This should not be too difficult to accomplish, and has been shown to be useful/required for 3rd party tools.

@chu11
Copy link
Member

chu11 commented Oct 21, 2016

I can take a crack at this. Are you thinking an environment variable like FLUX_NODELIST or similar? Or something in the kvs? Or both?

@chu11 chu11 self-assigned this Oct 21, 2016
@grondo
Copy link
Contributor Author

grondo commented Oct 21, 2016

I think @garlick had mentioned that it would be nice if resource-hwloc kept hostname for each rank under resource.hwloc.by_rank.[rank].HostName. It turned out this was fairly trivial:

$ flux kvs get resource.hwloc.by_rank.0.HostName
hype356

The next step would just be to have something during wreck launch translate the list of actual ranks allocated to their hostnames and store in kvs -- this could perhaps be done by a lua plugin...

@grondo
Copy link
Contributor Author

grondo commented Oct 21, 2016

I can take a crack at this. Are you thinking an environment variable like FLUX_NODELIST or similar? Or something in the kvs? Or both?

I was thinking it would be useful in the kvs, then the lua plugin that handles environment could just pull it out. For this case I was also thinking of using a wreck/lua plugin to generate the hostlist from resource.hwloc data in wrexecd on rank 0 at launch time.

@grondo
Copy link
Contributor Author

grondo commented Oct 21, 2016

I'm already playing around with the lua plugin approach, but if you have other ideas or want to take this over just LMK @chu11. I actually only have an hour or so left before I have to take a kid to cross country meet... I'll push up my resource-hwloc changes to a branch and reference here.

@chu11
Copy link
Member

chu11 commented Oct 21, 2016

@grondo No particular ideas, just an area of code I haven't dug into yet. If you got it, that's cool.

@grondo
Copy link
Contributor Author

grondo commented Oct 21, 2016

Ok, I've saved state to my wreck-hostlist branch. Currently, it works well enough to export the hostlist to a hostlist key in the kvs, however, export of the environment variable doesn't work because I'm guessing there is no kvs commit between rexecd_init and rexecd_task_init

@grondo
Copy link
Contributor Author

grondo commented Oct 24, 2016

BTW, I kind of backed off the approach in the wreck-hostlist branch mentioned here -- it is not going to be scalable to have rank 0 wrexecd walk through each rank dir to build the hostlist itself.

For now, @chu11, I think the best way to generate a hostlist might be to just run flux exec hostname | hostlist or something similar. I'll still keep the mod to resource-hwloc to emit by_rank.N.HostName

@chu11
Copy link
Member

chu11 commented Oct 24, 2016

@grondo Ok, haven't looked through that branch yet anyways. I'll do something like you suggest for the time being. It's minimally nicer than my xml grepping hack through the kvs.

@chu11 chu11 removed their assignment Oct 27, 2016
chu11 added a commit to LLNL/magpie that referenced this issue Nov 3, 2016
This was predominantly an exercise and to find subtle issues within
Flux when running w/ Magpie.  It was experimented on around v0.5.0
release of Flux.  See following issues as examples
as discussion that occurred:

flux-framework/flux-core#883
flux-framework/flux-core#864
flux-framework/flux-core#866

For info on how to run see magpie.sbatch-srun-spark-test.sh example.
See information at the top for how to run and see info at the bottom
for workarounds for flux.
grondo added a commit to grondo/flux-core that referenced this issue Feb 5, 2019
The wreck exec system is worthless, remove it along with associated
commands, tests, and support code.

Since libjsc doesn't work without wreck, it is removed as well.

Fixes flux-framework#1984
Closes flux-framework#1947
Closes flux-framework#1618
Closes flux-framework#1595
Closes flux-framework#1593
Closes flux-framework#1468
Closes flux-framework#1438
Closes flux-framework#1419
Closes flux-framework#1410
Closes flux-framework#915
Closes flux-framework#894
Closes flux-framework#866
Closes flux-framework#833
Closes flux-framework#774
Closes flux-framework#772
Closes flux-framework#335
Closes flux-framework#249
grondo added a commit to grondo/flux-core that referenced this issue Feb 5, 2019
The wreck exec system is worthless, remove it along with associated
commands, tests, and support code.

Since libjsc doesn't work without wreck, it is removed as well.

Fixes flux-framework#1984

Closes flux-framework#1947
Closes flux-framework#1618
Closes flux-framework#1595
Closes flux-framework#1593
Closes flux-framework#1534
Closes flux-framework#1468
Closes flux-framework#1443
Closes flux-framework#1438
Closes flux-framework#1419
Closes flux-framework#1410
Closes flux-framework#1407
Closes flux-framework#1393
Closes flux-framework#915
Closes flux-framework#894
Closes flux-framework#866
Closes flux-framework#833
Closes flux-framework#774
Closes flux-framework#772
Closes flux-framework#335
Closes flux-framework#249
grondo added a commit to grondo/flux-core that referenced this issue Feb 5, 2019
The wreck exec system is worthless, remove it along with associated
commands, tests, and support code.

Since libjsc doesn't work without wreck, it is removed as well.

Fixes flux-framework#1984

Closes flux-framework#1947
Closes flux-framework#1618
Closes flux-framework#1595
Closes flux-framework#1593
Closes flux-framework#1534
Closes flux-framework#1468
Closes flux-framework#1443
Closes flux-framework#1438
Closes flux-framework#1419
Closes flux-framework#1410
Closes flux-framework#1407
Closes flux-framework#1393
Closes flux-framework#915
Closes flux-framework#894
Closes flux-framework#866
Closes flux-framework#833
Closes flux-framework#774
Closes flux-framework#772
Closes flux-framework#335
Closes flux-framework#249
grondo added a commit to grondo/flux-core that referenced this issue Feb 9, 2019
The wreck exec system is worthless, remove it along with associated
commands, tests, and support code.

Since libjsc doesn't work without wreck, it is removed as well.

Fixes flux-framework#1984

Closes flux-framework#1947
Closes flux-framework#1618
Closes flux-framework#1595
Closes flux-framework#1593
Closes flux-framework#1534
Closes flux-framework#1468
Closes flux-framework#1443
Closes flux-framework#1438
Closes flux-framework#1419
Closes flux-framework#1410
Closes flux-framework#1407
Closes flux-framework#1393
Closes flux-framework#915
Closes flux-framework#894
Closes flux-framework#866
Closes flux-framework#833
Closes flux-framework#774
Closes flux-framework#772
Closes flux-framework#335
Closes flux-framework#249
@grondo
Copy link
Contributor Author

grondo commented Feb 13, 2019

closed by #1988

@grondo grondo closed this as completed Feb 13, 2019
chu11 pushed a commit to chu11/flux-core that referenced this issue Feb 13, 2019
The wreck exec system is worthless, remove it along with associated
commands, tests, and support code.

Since libjsc doesn't work without wreck, it is removed as well.

Fixes flux-framework#1984
Closes flux-framework#1947
Closes flux-framework#1618
Closes flux-framework#1595
Closes flux-framework#1593
Closes flux-framework#1468
Closes flux-framework#1438
Closes flux-framework#1419
Closes flux-framework#1410
Closes flux-framework#915
Closes flux-framework#894
Closes flux-framework#866
Closes flux-framework#833
Closes flux-framework#774
Closes flux-framework#772
Closes flux-framework#335
Closes flux-framework#249
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants