-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dws: use static rabbit layout mapping for JGF #204
Conversation
Problem: admins prefer configuring clusters with a config file to doing so with R. However, the current design of flux-dws2jgf forces the usage of R as input to generate JGF. When flux-framework/flux-core#6245 is implemented, Flux will be able to combine a config file with JGF. Add an option to use a config file as input to the JGF generation, so administrators will be able to use a config file + JGF instead of R + JGF.
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. A solution would be to store some static rabbit data in ansible, generated by reading from kubernetes. This data could be read in to generate JGF. Add a script that generates a JSON file describing which nodes map to which rabbits and what the capacity of each rabbit is.
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. Change flux-dws2jgf to read from a static JSON file generated by the flux-rabbitmapping script, instead of from kubernetes.
Problem: tests for the flux-dws2jgf script need to be updated now that the script reads from a mapping generated by flux-rabbitmapping instead of by polling kubernetes. Change the tests and add a simple test for flux-rabbitmapping.
Problem: the error message for requesting too much lustre storage gives a float, but an integer would be better and more consistent with the other file system types. Provide an integer in the error message by doing integer division instead of normal float division.
8934cdd
to
bcf9be2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice improvement. I just noticed one trivial issue in the tests.
@@ -33,27 +33,32 @@ test_expect_success HAVE_JQ 'smoke test to ensure the storage resources are expe | |||
test $(hostname) = compute-01 | |||
' | |||
|
|||
test_expect_success HAVE_JQ 'flux-rabbitmapping outputs expected mapping' ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test has HAVE_JQ
but I don't see any use of jq in the test itself.
However, also note that flux-core tests and thus flux-sharness.sh
assume jq
availability, so at least in flux-core we've dropped use of HAVE_JQ
and just assume its there (flux-sharness throws an error if not).
(Maybe for future cleanup, no need to address this now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll open another PR for that. Setting MWP.
Problem: as described in #193,
JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes
before starting Flux is not a good option.
The solution I've decided to go with after some discussion is to store some
static rabbit layout data in ansible, generated by reading from kubernetes. This data in turn is then
read in to generate JGF. Unlike kubernetes, we can expect the static file to always exist.
Fixes #193