Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dws: use static rabbit layout mapping for JGF #204

Merged
merged 5 commits into from
Sep 4, 2024

Conversation

jameshcorbett
Copy link
Member

Problem: as described in #193,
JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes
before starting Flux is not a good option.

The solution I've decided to go with after some discussion is to store some
static rabbit layout data in ansible, generated by reading from kubernetes. This data in turn is then
read in to generate JGF. Unlike kubernetes, we can expect the static file to always exist.

Fixes #193

@jameshcorbett jameshcorbett requested a review from grondo August 31, 2024 15:42
Problem: admins prefer configuring clusters with a config file to
doing so with R. However, the current design of flux-dws2jgf forces
the usage of R as input to generate JGF.

When flux-framework/flux-core#6245 is
implemented, Flux will be able to combine a config file with JGF.

Add an option to use a config file as input to the JGF generation,
so administrators will be able to use a config file + JGF instead of
R + JGF.
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in
Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so
generating JGF from kubernetes before starting Flux is not an
option.

A solution would be to store some static rabbit data in ansible,
generated by reading from kubernetes. This data could be read in
to generate JGF.

Add a script that generates a JSON file describing which nodes map
to which rabbits and what the capacity of each rabbit is.
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in
Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so
generating JGF from kubernetes before starting Flux is not an
option.

Change flux-dws2jgf to read from a static JSON file generated by
the flux-rabbitmapping script, instead of from kubernetes.
Problem: tests for the flux-dws2jgf script need to be updated now
that the script reads from a mapping generated by flux-rabbitmapping
instead of by polling kubernetes.

Change the tests and add a simple test for flux-rabbitmapping.
Problem: the error message for requesting too much lustre storage
gives a float, but an integer would be better and more consistent
with the other file system types.

Provide an integer in the error message by doing integer division
instead of normal float division.
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice improvement. I just noticed one trivial issue in the tests.

@@ -33,27 +33,32 @@ test_expect_success HAVE_JQ 'smoke test to ensure the storage resources are expe
test $(hostname) = compute-01
'

test_expect_success HAVE_JQ 'flux-rabbitmapping outputs expected mapping' '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has HAVE_JQ but I don't see any use of jq in the test itself.

However, also note that flux-core tests and thus flux-sharness.sh assume jq availability, so at least in flux-core we've dropped use of HAVE_JQ and just assume its there (flux-sharness throws an error if not).

(Maybe for future cleanup, no need to address this now)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll open another PR for that. Setting MWP.

@mergify mergify bot merged commit 5a3d0a1 into flux-framework:master Sep 4, 2024
8 checks passed
@jameshcorbett jameshcorbett deleted the rabbit-mapping branch September 4, 2024 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make dws2jgf read from a static file, not k8s
2 participants