Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add totalview_jobid symbol into flux-job #3110

Closed
dongahn opened this issue Aug 4, 2020 · 9 comments
Closed

Add totalview_jobid symbol into flux-job #3110

dongahn opened this issue Aug 4, 2020 · 9 comments

Comments

@dongahn
Copy link
Member

dongahn commented Aug 4, 2020

A tool like STAT and totalview needs to be able to launch and co-locate its tool daemons with the target MPI processes.

For totalview, I was able to get around this by using its serial launching mode (using ssh or rsh to launch a daemon per each compute node). But I don't think I can get around this for STAT which only supports bulk launching.

I can add the totalview_jobid symbol into flux job (which gets filled with the jobid) which then gets extracted by the tools and use it to expand the tool's launching string.

But we currently don't have a way to co-locate tool daemons with the processes, given the jobid.

BTW I think tool launching should bypass scheduling so flux exec or similar seems to make more sense as the bulk launcher. flux exec won't take JOBID as its input, though correct? We probably don't have to worry about the scalability of flux exec just yet.

Maybe I can turn the hostname list into rank list and use flux exec...

In general, it feels like some minimal support within flux-core can make tool launching easier.

@grondo: any ideas?

@dongahn
Copy link
Member Author

dongahn commented Aug 4, 2020

Tagging @lee218llnl and petertea.

@dongahn
Copy link
Member Author

dongahn commented Aug 4, 2020

Maybe we can just add --jobid=<JOBID> option to flux exec? flux exec can fetch the rankset from R of the JOBID and convert it into --rank.

Tools are so used to only be able to launch one daemon per compute node (rank in our case), it seems this should suffice. Even when they need to launch multiple daemons, they first launch one "super"-daemon and launch the rest.

@grondo
Copy link
Contributor

grondo commented Aug 4, 2020

flux exec is only accessible by instance owner, so this would not work for multi-user jobs in general -- though perhaps in most cases the debugged job would be running within a single user instance (batch job), so maybe this is not a deal breaker.

Also, once jobs are being contained, using rsh or flux exec to launch debugger servers or tool daemons may not work unless there is a method to "enter" the container of the job.

Long term we had planned to use the solution proposed in #2298. (Note that a flux exec --jobid=JOBID solution is proposed there as well). The difference is that the exec server must run in the job shell so that 1) guests can gain exec access, and 2) the spawned subprocesses are launched in the same container as the job shell.

As it happens, I just did a proof of concept implementation and I think this is doable, so good timing.

@dongahn
Copy link
Member Author

dongahn commented Aug 5, 2020

Also, once jobs are being contained, using rsh or flux exec to launch debugger servers or tool daemons may not work unless there is a method to "enter" the container of the job.

By containment, you mean cgroup? At what level cgroup will be imposed? The implicit flux instance launched by flux mini batch will be contained correct? How about the parallel jobs running inside that instance? Will this be contained too? I will look at #2298.

As it happens, I just did a proof of concept implementation and I think this is doable, so good timing.

In terms of creating a parallel task track, what is important for me is the bulk launch interface itself at this point. I can try that interface (whether it is flux exec or something else) to test STAT under single user flux without cgroup. Then, when the containment-capable bulk launching comes, I can retest.

W/ the exec service in the job shell, would the user interface still be flux exec or something else?

@grondo
Copy link
Contributor

grondo commented Aug 5, 2020

By containment, you mean cgroup?

cgroup, and/or namespace. E.g. when using a polyinstantiated /tmp, a login session to a node via ssh or rsh may not be able to see the /tmp used by the job, so local FLUX_URI may not be available.

W/ the exec service in the job shell, would the user interface still be flux exec or something else?

I think that is TBD. It may depend on whether it makes sense to make flux exec job aware, or if it makes more sense to have a flux job rexec (amount of work required for code refactoring might come into play).
My first inclination would be to use flux exec --jobid=JOBID as you described above.

BTW, for single user instance flux exec would work for now. Try something like this:

ƒ(s=95,builddir) grondo@fluke2:~/git/flux-core.git$ flux exec -r `flux jobs -no {ranks} ƒ7LvkmLT` hostname
fluke12
fluke13
fluke11
fluke14
ƒ(s=95,builddir) grondo@fluke2:~/git/flux-core.git$ flux exec -r `flux jobs -no {ranks} ƒ6xb1b35` hostname
fluke9
fluke8
fluke7
fluke10

@dongahn
Copy link
Member Author

dongahn commented Aug 5, 2020

cgroup, and/or namespace. E.g. when using a polyinstantiated /tmp, a login session to a node via ssh or rsh may not be able to see the /tmp used by the job, so local FLUX_URI may not be available.

Ah... Now I'm making the connection. This work actually has two use cases then! 1) job listing of nested instances; 2) tool bulk launching support.

BTW, for single user instance flux exec would work for now. Try something like this:

Let me see if I can make some progress with this. Then, when the new interface comes, I can swap.

My guess is this should unblock me; we will see.

@dongahn dongahn changed the title Bulk launch support for tool daemons Add totalview_jobid symbol into flux-job Aug 5, 2020
@dongahn
Copy link
Member Author

dongahn commented Aug 5, 2020

I changed the Issue ticket name to "Add totalview_jobid symbol into flux-job".

This variable is not a part of MPIR debug interface, but it is used in RMs like SLURM to allow a debugging tool to fetch the target jobid directly from the address space of the launcher. Both STAT and TV depends on this variable for its bulk launching.

Please see LLNL/LaunchMON#50 (comment)

@grondo
Copy link
Contributor

grondo commented Aug 31, 2020

@dongahn, can this issue be closed after merge of #3130?

@dongahn
Copy link
Member Author

dongahn commented Aug 31, 2020

Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants