Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need a way to run across all resources allocated to a job #1393

Closed
trws opened this issue Mar 27, 2018 · 6 comments
Closed

need a way to run across all resources allocated to a job #1393

trws opened this issue Mar 27, 2018 · 6 comments

Comments

@trws
Copy link
Member

trws commented Mar 27, 2018

This is related to #866, but some way to run across an entire inner instance would be really good in support of job steps. I think @grondo mentioned something to this effect at the last meeting, but I just ran up against not having this and had to build a workaround for splash, it's something we'll have to have at some point.

@grondo
Copy link
Contributor

grondo commented Mar 27, 2018

Yeah this is in the use cases we documented -- see Use Case 4.4 here.

I keep meaning to submit a PR for that particular Use Case to document your idea to make that behavior (select all resources) an option instead the default behavior within a job. This solves many problems nicely (flux-run or whatever utility doesn't need a heuristic to decide when to run across all resources, and the default behavior is the same inside or outside of a "batch script", which makes it more consistent)

@trws
Copy link
Member Author

trws commented Mar 28, 2018

As a note, for now all such jobs have to run in an inner instance and I'm using this to generate a hostlist:

flux kvs get -j $(flux kvs dir resource.hwloc.by_rank | sed -e 's/$/.HostName/')

@trws
Copy link
Member Author

trws commented Mar 28, 2018

I should say, "have to run in an inner instance for now" until we have a way to avoid it.

@grondo
Copy link
Contributor

grondo commented Mar 31, 2018

Just so I understand, something like flux wreckrun -N $(flux getattr size) doesn't also work to run across the entire instance?

Are you using the generated hostlist for input to mpirun or mpiexec or similar?

The hostname list could also be generated with flux exec -r all hostname, however the flux kvs usage above is probably more efficient. In an rc script you could probably pre-fetch this and store it to a kvs key. If you wanted it in "hostlist" format, filter the output through hostlist --sort.

@trws
Copy link
Member Author

trws commented Mar 31, 2018

I am at the moment, it's because of the spectrum MPI issue #1382. Wreckrun like that works for launching non-MPI things on there right now, but I didn't think of the getattr option for size, that's a much better way to do it. Maybe putting it under an option on wreckrun/submit would be a quick way to deal with this.

grondo added a commit to grondo/flux-core that referenced this issue Feb 5, 2019
The wreck exec system is worthless, remove it along with associated
commands, tests, and support code.

Since libjsc doesn't work without wreck, it is removed as well.

Fixes flux-framework#1984

Closes flux-framework#1947
Closes flux-framework#1618
Closes flux-framework#1595
Closes flux-framework#1593
Closes flux-framework#1534
Closes flux-framework#1468
Closes flux-framework#1443
Closes flux-framework#1438
Closes flux-framework#1419
Closes flux-framework#1410
Closes flux-framework#1407
Closes flux-framework#1393
Closes flux-framework#915
Closes flux-framework#894
Closes flux-framework#866
Closes flux-framework#833
Closes flux-framework#774
Closes flux-framework#772
Closes flux-framework#335
Closes flux-framework#249
grondo added a commit to grondo/flux-core that referenced this issue Feb 5, 2019
The wreck exec system is worthless, remove it along with associated
commands, tests, and support code.

Since libjsc doesn't work without wreck, it is removed as well.

Fixes flux-framework#1984

Closes flux-framework#1947
Closes flux-framework#1618
Closes flux-framework#1595
Closes flux-framework#1593
Closes flux-framework#1534
Closes flux-framework#1468
Closes flux-framework#1443
Closes flux-framework#1438
Closes flux-framework#1419
Closes flux-framework#1410
Closes flux-framework#1407
Closes flux-framework#1393
Closes flux-framework#915
Closes flux-framework#894
Closes flux-framework#866
Closes flux-framework#833
Closes flux-framework#774
Closes flux-framework#772
Closes flux-framework#335
Closes flux-framework#249
grondo added a commit to grondo/flux-core that referenced this issue Feb 9, 2019
The wreck exec system is worthless, remove it along with associated
commands, tests, and support code.

Since libjsc doesn't work without wreck, it is removed as well.

Fixes flux-framework#1984

Closes flux-framework#1947
Closes flux-framework#1618
Closes flux-framework#1595
Closes flux-framework#1593
Closes flux-framework#1534
Closes flux-framework#1468
Closes flux-framework#1443
Closes flux-framework#1438
Closes flux-framework#1419
Closes flux-framework#1410
Closes flux-framework#1407
Closes flux-framework#1393
Closes flux-framework#915
Closes flux-framework#894
Closes flux-framework#866
Closes flux-framework#833
Closes flux-framework#774
Closes flux-framework#772
Closes flux-framework#335
Closes flux-framework#249
@grondo
Copy link
Contributor

grondo commented Feb 13, 2019

closed by #1988

@grondo grondo closed this as completed Feb 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants