Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dws2jgf read from a static file, not k8s #193

Closed
jameshcorbett opened this issue Aug 7, 2024 · 0 comments · Fixed by #204
Closed

Make dws2jgf read from a static file, not k8s #193

jameshcorbett opened this issue Aug 7, 2024 · 0 comments · Fixed by #204

Comments

@jameshcorbett
Copy link
Member

Problem: the dws2jgf.py script currently establishes a connection to kubernetes and generates a Fluxion resource graph in JGF format based on the compute-node-to-rabbit mapping it finds there. However, after discussion with admins and other members of the Flux team, it seems that JGF is very difficult to deal with and manipulate (if for instance a queue were to be added) and also can be large enough that it makes a meaningful contribution to the overall size of the ansible repo.

A fix would be to generate the JGF dynamically whenever the management node restarts. However, @trws said that making the resource graph depend on a service (k8s) which may or may not be online is a bad idea. The solution we decided on offline is to keep a rabbit-to-compute-node mapping saved in ansible which dws2jgf generates its JGF against. That file could be generated by a script that reads from k8s and periodically updated as needed.

Changes needed:

  1. Make a script to go through k8s and write out a rabbit-to-compute-node mapping
  2. Change dws2jgf to read from the mapping and not k8s
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Aug 31, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in
Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so
generating JGF from kubernetes before starting Flux is not an
option.

A solution would be to store some static rabbit data in ansible,
generated by reading from kubernetes. This data could be read in
to generate JGF.

Add a script that generates a JSON file describing which nodes map
to which rabbits and what the capacity of each rabbit is.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Aug 31, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in
Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so
generating JGF from kubernetes before starting Flux is not an
option.

Change flux-dws2jgf to read from a static JSON file generated by
the flux-rabbitmapping script, instead of from kubernetes.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Aug 31, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in
Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so
generating JGF from kubernetes before starting Flux is not an
option.

A solution would be to store some static rabbit data in ansible,
generated by reading from kubernetes. This data could be read in
to generate JGF.

Add a script that generates a JSON file describing which nodes map
to which rabbits and what the capacity of each rabbit is.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Aug 31, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in
Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so
generating JGF from kubernetes before starting Flux is not an
option.

Change flux-dws2jgf to read from a static JSON file generated by
the flux-rabbitmapping script, instead of from kubernetes.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Sep 4, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in
Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so
generating JGF from kubernetes before starting Flux is not an
option.

A solution would be to store some static rabbit data in ansible,
generated by reading from kubernetes. This data could be read in
to generate JGF.

Add a script that generates a JSON file describing which nodes map
to which rabbits and what the capacity of each rabbit is.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Sep 4, 2024
Problem: as described in flux-framework#193, JGF is too unwieldy to be stored in
Ansible. On the other hand, Flux's ability to start up and run
jobs cannot be dependent on the responsiveness of kubernetes, so
generating JGF from kubernetes before starting Flux is not an
option.

Change flux-dws2jgf to read from a static JSON file generated by
the flux-rabbitmapping script, instead of from kubernetes.
@mergify mergify bot closed this as completed in #204 Sep 4, 2024
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Sep 5, 2024
Problem: JGF is too large, as described in flux-framework#193.

As a first step to shrink it, do not output default values, instead
skipping them and letting the JGF reader supply them.

Depends on changes introduced in flux-sched by
flux-framework/flux-sched/pull/1293.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Sep 18, 2024
Problem: JGF is too large, as described in flux-framework#193.

As a first step to shrink it, do not output default values, instead
skipping them and letting the JGF reader supply them.

Depends on changes introduced in flux-sched by
flux-framework/flux-sched/pull/1293.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Sep 30, 2024
Problem: JGF is too large, as described in flux-framework#193.

As a first step to shrink it, do not output default values, instead
skipping them and letting the JGF reader supply them.

Depends on changes introduced in flux-sched by
flux-framework/flux-sched/pull/1293.
jameshcorbett added a commit to jameshcorbett/flux-coral2 that referenced this issue Oct 1, 2024
Problem: JGF is too large, as described in flux-framework#193.

As a first step to shrink it, do not output default values, instead
skipping them and letting the JGF reader supply them.

Depends on changes introduced in flux-sched by
flux-framework/flux-sched/pull/1293.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant