Make `dws2jgf` read from a static file, not k8s #193

jameshcorbett · 2024-08-07T02:05:32Z

Problem: the dws2jgf.py script currently establishes a connection to kubernetes and generates a Fluxion resource graph in JGF format based on the compute-node-to-rabbit mapping it finds there. However, after discussion with admins and other members of the Flux team, it seems that JGF is very difficult to deal with and manipulate (if for instance a queue were to be added) and also can be large enough that it makes a meaningful contribution to the overall size of the ansible repo.

A fix would be to generate the JGF dynamically whenever the management node restarts. However, @trws said that making the resource graph depend on a service (k8s) which may or may not be online is a bad idea. The solution we decided on offline is to keep a rabbit-to-compute-node mapping saved in ansible which dws2jgf generates its JGF against. That file could be generated by a script that reads from k8s and periodically updated as needed.

Changes needed:

Make a script to go through k8s and write out a rabbit-to-compute-node mapping
Change dws2jgf to read from the mapping and not k8s

The text was updated successfully, but these errors were encountered:

Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. A solution would be to store some static rabbit data in ansible, generated by reading from kubernetes. This data could be read in to generate JGF. Add a script that generates a JSON file describing which nodes map to which rabbits and what the capacity of each rabbit is.

Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. Change flux-dws2jgf to read from a static JSON file generated by the flux-rabbitmapping script, instead of from kubernetes.

Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. A solution would be to store some static rabbit data in ansible, generated by reading from kubernetes. This data could be read in to generate JGF. Add a script that generates a JSON file describing which nodes map to which rabbits and what the capacity of each rabbit is.

Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. Change flux-dws2jgf to read from a static JSON file generated by the flux-rabbitmapping script, instead of from kubernetes.

Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. A solution would be to store some static rabbit data in ansible, generated by reading from kubernetes. This data could be read in to generate JGF. Add a script that generates a JSON file describing which nodes map to which rabbits and what the capacity of each rabbit is.

Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. Change flux-dws2jgf to read from a static JSON file generated by the flux-rabbitmapping script, instead of from kubernetes.

Problem: JGF is too large, as described in flux-framework#193. As a first step to shrink it, do not output default values, instead skipping them and letting the JGF reader supply them. Depends on changes introduced in flux-sched by flux-framework/flux-sched/pull/1293.

jameshcorbett mentioned this issue Aug 31, 2024

Dws: use static rabbit layout mapping for JGF #204

Merged

mergify bot closed this as completed in #204 Sep 4, 2024

jameshcorbett mentioned this issue Sep 5, 2024

dws2jgf: drop defaults for jgf simplification #209

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `dws2jgf` read from a static file, not k8s #193

Make `dws2jgf` read from a static file, not k8s #193

jameshcorbett commented Aug 7, 2024

Make dws2jgf read from a static file, not k8s #193

Make dws2jgf read from a static file, not k8s #193

Comments

jameshcorbett commented Aug 7, 2024

Make `dws2jgf` read from a static file, not k8s #193

Make `dws2jgf` read from a static file, not k8s #193