-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated LSF scheduler to accept number of nodes #5153
Conversation
Thanks @nkeilbart! |
No problem at all. Let me know if there is some documentation somewhere that might need updated as well. Otherwise people might not know that option exists without looking at the source code. I'm still a bit of a novice with all the git repository setup but if you point me in the right direction I'll work on that as well. |
Thanks @nkeilbart . The first question that sprang to my mind is: why did the original implementer say that Edit: I looked at the LSF documentation, and it doesn't mention this |
Hi @sphuber, thanks for taking a look at this. I believe the answer to your question lies in some of the comments I came upon in the LSF scheduler that the original programmer left. They state that you need to check that PARALLEL_SCHED_BY_SLOT=Y is NOT defined in lsb.params. They then mention you can check this with bparams -l. It turns out here at the lab we do have this option enabled. I would imagine that other places would have it implemented in a similar fashion meaning it would be nice to have this feature. I don't think my changes would necessarily break the scheduler for other users that already have it setup to only use number of processors. I think it should still be able to write the job submission script correctly and run. As for the -nnodes option, when I look at "man bsub" on the computer here and look for -nnodes it comes up as an option. The description says, "Specifies the number of compute nodes that are required for the CSM job." I am unable to simply specify number of processors on the server here but must specify the number of nodes instead and then the number of processors later when executing the binary. I'm not sure if we have a slightly modified version for the lab and I could email the people at the server to get some more detailed answers if you'd like. I hope that answers your questions. Let me know what else I can provide. |
Also found some documentation here: https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=options-nnodes Looking at the page it seems to mention this option is available when easy mode LSF job submission is enabled which seems to be what settings I have. |
@nkeilbart I took the liberty to fix the tests and add additional ones for the added functionality. Also cleaned the code up a bit for readability, hope you don't mind. Let me know if you are still happy with this as is, and then I will accept and merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @nkeilbart
I have the scheduler on Lassen at LLNL which is not able to accept the current settings. I have updated the scheduler so that if the option:
use_num_machines : True
is set then you can specify both num_machines and tot_num_mpiprocs as well.