-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make dws2jgf
read from a static file, not k8s
#193
Comments
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Aug 31, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. A solution would be to store some static rabbit data in ansible, generated by reading from kubernetes. This data could be read in to generate JGF. Add a script that generates a JSON file describing which nodes map to which rabbits and what the capacity of each rabbit is.
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Aug 31, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. Change flux-dws2jgf to read from a static JSON file generated by the flux-rabbitmapping script, instead of from kubernetes.
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Aug 31, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. A solution would be to store some static rabbit data in ansible, generated by reading from kubernetes. This data could be read in to generate JGF. Add a script that generates a JSON file describing which nodes map to which rabbits and what the capacity of each rabbit is.
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Aug 31, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. Change flux-dws2jgf to read from a static JSON file generated by the flux-rabbitmapping script, instead of from kubernetes.
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Sep 4, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. A solution would be to store some static rabbit data in ansible, generated by reading from kubernetes. This data could be read in to generate JGF. Add a script that generates a JSON file describing which nodes map to which rabbits and what the capacity of each rabbit is.
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Sep 4, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in Ansible. On the other hand, Flux's ability to start up and run jobs cannot be dependent on the responsiveness of kubernetes, so generating JGF from kubernetes before starting Flux is not an option. Change flux-dws2jgf to read from a static JSON file generated by the flux-rabbitmapping script, instead of from kubernetes.
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Sep 5, 2024
Problem: JGF is too large, as described in flux-framework#193. As a first step to shrink it, do not output default values, instead skipping them and letting the JGF reader supply them. Depends on changes introduced in flux-sched by flux-framework/flux-sched/pull/1293.
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Sep 18, 2024
Problem: JGF is too large, as described in flux-framework#193. As a first step to shrink it, do not output default values, instead skipping them and letting the JGF reader supply them. Depends on changes introduced in flux-sched by flux-framework/flux-sched/pull/1293.
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Sep 30, 2024
Problem: JGF is too large, as described in flux-framework#193. As a first step to shrink it, do not output default values, instead skipping them and letting the JGF reader supply them. Depends on changes introduced in flux-sched by flux-framework/flux-sched/pull/1293.
jameshcorbett
added a commit
to jameshcorbett/flux-coral2
that referenced
this issue
Oct 1, 2024
Problem: JGF is too large, as described in flux-framework#193. As a first step to shrink it, do not output default values, instead skipping them and letting the JGF reader supply them. Depends on changes introduced in flux-sched by flux-framework/flux-sched/pull/1293.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem: the
dws2jgf.py
script currently establishes a connection to kubernetes and generates a Fluxion resource graph in JGF format based on the compute-node-to-rabbit mapping it finds there. However, after discussion with admins and other members of the Flux team, it seems that JGF is very difficult to deal with and manipulate (if for instance a queue were to be added) and also can be large enough that it makes a meaningful contribution to the overall size of the ansible repo.A fix would be to generate the JGF dynamically whenever the management node restarts. However, @trws said that making the resource graph depend on a service (k8s) which may or may not be online is a bad idea. The solution we decided on offline is to keep a rabbit-to-compute-node mapping saved in ansible which
dws2jgf
generates its JGF against. That file could be generated by a script that reads from k8s and periodically updated as needed.Changes needed:
dws2jgf
to read from the mapping and not k8sThe text was updated successfully, but these errors were encountered: