-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluxion can't restart running jobs with match-format rv1_nosched
#991
Comments
rv1_nosched
rv1_nosched
I verified this case (reloading Anyway, adding this test to the testsuite demonstrates the issue: diff --git a/t/t1007-recovery-full.t b/t/t1007-recovery-full.t
index 5b2de636..0923eeaa 100755
--- a/t/t1007-recovery-full.t
+++ b/t/t1007-recovery-full.t
@@ -150,6 +150,18 @@ test_expect_success 'recovery: qmanager restarts (rv1_nosched->rv1_nosched)' '
test_expect_code 3 flux ion-resource info ${jobid5}
'
+test_expect_success 'recovery: both modules restart (rv1_nosched->rv1_nosched)' '
+ reload_resource match-format=rv1_nosched \
+ policy=high &&
+ reload_qmanager &&
+ flux module stats sched-fluxion-qmanager &&
+ flux module stats sched-fluxion-resource &&
+ flux ion-resource info ${jobid1} | grep "ALLOCATED" &&
+ flux ion-resource info ${jobid2} | grep "ALLOCATED" &&
+ flux ion-resource info ${jobid3} | grep "ALLOCATED" &&
+ flux ion-resource info ${jobid4} | grep "ALLOCATED"
+'
+
test_expect_success 'recovery: a cancel leads to a job schedule (rv1_nosched)' '
flux job cancel ${jobid1} &&
flux job wait-event -t 60 ${jobid5} start
|
I started to look into fixing this issue, but got lost fairly quickly. It seems like we need a way to convert an R object without the scheduling key into a jgf representation so it can be passed to the Perhaps one of the developers more familiar with Fluxion (@jameshcorbett, @milroy, or @trws) could take a crack at it or offer some advice. Thanks! |
Reloading the
A test could just leave resource running. |
Problem: t1008-recovery-none.t expects the job manager to abort the scheduler if a job fails to re-allocate resources during the hello handshake, but this behavior will change soon. Drop this test. The behavior it is looking for will either be addressed by a true fix to flux-framework#991 or the workaround proposed in flux-framework/flux-core#4894.
Problem: there is no test coverage for module reload with running jobs and rv1_nosched. Add test proposed by @grondo in flux-framework#991, expecting failure for now. The test fails before and after the work-around proposed in flux-framework/flux-core#4894 because it checks for both: - qmanager reload fails (fails before the work-around) - job resources remain allocated (fails after the work-around) Increase the broker stderr log verbosity so the fatal job exceptions generated by the work-around at LOG_INFO level are visible when the test is run with -v.
I started looking into this issue. While it is true that Fluxion features the
After looking at |
I could be wrong, but I think the issue is the amount of data added to Can you expand on your statement that |
The intention appears to have been to use the # system-instance will use full-up rv1 writer
# so that R will contain scheduling key needed
# for failure recovery.
match-format = "rv1" It may be possible to refactor Fluxion to change writers at runtime. If I understand correctly then if restarted under certain conditions Fluxion could switch writers from |
Yes, here's the writer output for a job match with the {"version": 1, "execution": {"R_lite": [{"rank": "-1", "children": {"core": "35"}}], "nodelist": ["node1"], "starttime": 0, "expiration": 3600}} In contrast, JGF provides each resource's unique IDs and graph paths which allow for full resolution of the vertex: {"graph": {"nodes": [{"id": "79", "metadata": {"type": "core", "basename": "core", "name": "core35", "id": 35, "uniq_id": 79, "rank": -1, "exclusive": true, "unit": "", "size": 1, "paths": {"containment": "/tiny0/rack0/node1/socket1/core35"}}}, {"id": "7", "metadata": {"type": "socket", "basename": "socket", "name": "socket1", "id": 1, "uniq_id": 7, "rank": -1, "exclusive": true, "unit": "", "size": 1, "paths": {"containment": "/tiny0/rack0/node1/socket1"}}}, {"id": "3", "metadata": {"type": "node", "basename": "node", "name": "node1", "id": 1, "uniq_id": 3, "rank": -1, "exclusive": false, "unit": "", "size": 1, "paths": {"containment": "/tiny0/rack0/node1"}}}, {"id": "1", "metadata": {"type": "rack", "basename": "rack", "name": "rack0", "id": 0, "uniq_id": 1, "rank": -1, "exclusive": false, "unit": "", "size": 1, "paths": {"containment": "/tiny0/rack0"}}}, {"id": "0", "metadata": {"type": "cluster", "basename": "tiny", "name": "tiny0", "id": 0, "uniq_id": 0, "rank": -1, "exclusive": false, "unit": "", "size": 1, "paths": {"containment": "/tiny0"}}}], "edges": [{"source": "7", "target": "79", "metadata": {"name": {"containment": "contains"}}}, {"source": "3", "target": "7", "metadata": {"name": {"containment": "contains"}}}, {"source": "1", "target": "3", "metadata": {"name": {"containment": "contains"}}}, {"source": "0", "target": "1", "metadata": {"name": {"containment": "contains"}}}]}}
I didn't know that, and I'm struggling to understand how that happens. Fluxion only supports |
Sorry, I meant |
I'm assuming that |
I think |
What I have in mind is fairly complicated and may not work in the end. It would consist of dumping the resources of running jobs to the KVS via |
That might work for a planned and orderly shutdown, but we also have to support a restart after a broker crash, in which case this mechanism could not be used. Could we devise something to put into the |
If that isn’t sufficient I’d be somewhat interested to know why not, because it seems like the kind of problem we could solve with an index if it’s not currently feasible. Do we currently require the full path keys? |
Good point.
Yes, I think so. I'll refresh my memory on the graph |
This came up again because the inability of Fluxion to be reloaded with running jobs is preventing us from reconfiguring resources (e.g. adding or modifying queues) without a downtime on production clusters. As an experiment, I tried setting
The job in question does seem to have JGF in R:
|
It is possible I performed the experiment incorrectly, so it may be good if someone else can verify this behavior. If so, then we'll need a plan to address this issue in the near term (i.e. this issue just got high priority) |
The only line that should be able to produce that error is this one:
I haven't had time to go in and experiment with it yet, but this makes me think that the map of queues doesn't get repopulated on a restart, we might not even be getting to the graph. |
Ah, ok, thats a good point and something (hopefully) easy to fix as a start? I did verify that with no queues enabled and I'll open a separate issue. |
Yeah, I don't think that will be all that bad. I'm a bit confused how that can happen, since everything is called in the right order, but if we have a reasonable reproducing test case this should be relatively straightforward. Glancing at it I'm guessing it's something like the list of queues in the initial config got cleared or some generated name changed or... 🤷 If we add in a bit of context to figure out which queue is missing and which queues exist it should fall out quickly. |
Just following up on the meeting today: fluxion gets the initial Rv1 object from the flux-sched/resource/modules/resource_match.cpp Line 1282 in fe872c8
Fluxion then uses the
So my probably naive question is what required information is missing to allow fluxion to restart with running jobs and rv1_nosched? |
In our meeting today it was asserted that the Rv1 to graph uuid mapping would need to be preserved (in a file or KVS) across a restart in order to map the resources of running jobs to the resource graph. Afterward I'm once again wondering why. The uuid's are purely internal to fluxion - fluxion won't receive a uuid from a running job that was allocated from a past instance of fluxion and need to map it to resources. It will receive rv1 with hostname/ranks and gpu/core logical indices. I don't see how having the old uuid mapping would help. What am I missing? |
I ended up deciding the vertex However, edge data (such as exclusivity tracking) can't be reconstructed without a fully specified format like JGF. Here's an example of what the JGF reader does to update the edges:
|
Thanks for that explanation! Well, I think having the rv1 reader, even a naive one, would be an excellent near term step since it would let the scheduler be unloaded and reconfigured or even updated without affecting running jobs. When the scheduler is misbehaving, that could be really handy. Edit: we could always augment that with some side band storage or whatever if it turns out to be required. |
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
Problem: issue flux-framework#991 identified the need for an `rv1exec` implementation of update (). The need for the implementation is described in detail in the issue, but the primary motivation is to enable reloading Fluxion when using RV1 without the scheduling key and payload. The reader was not originally implemented due to the lack of information in the format. Examples include edges, exclusivity, paths with subsystems, and vertex sizes. To create a workable implementation, strong assumptions need to be made about resource exclusivity and edges. Add support for update () through helper functions that update vertex and edge metadata and infer exclusivity of node-level resources.
The Flux system instance needed to be restarted on tioga recently and there were two active jobs in CLEANUP state. This caused Fluxion to fail to restart with the following errors:
It appears that
parse_R()
inresource/modules/resource_match.cpp
always requires ascheduling
key be set with JGF, but this will not be the case for any job whensched-fluxion-resource.match-format = "rv1_nosched"
.Since Fluxion supports a rv1exec reader,
parse_R()
should fall back to this reader when there is no scheduling key in R for a job.The text was updated successfully, but these errors were encountered: