-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement partial release of resources #1151
Comments
A fragment would contain all the resources allocated to the job on one or more execution targets (broker ranks). That is, it would not be further subdivided. One thought on the JGF problem is that perhaps a combination of job ID and the list of execution target ids from Rv1 would be sufficient to identify the resources being freed. |
Problem: R has to be looked up from the KVS in the sched.free request handler, but now that the job manager caches R, this is an unnecessary extra step. Add R to the sched.free request payload. Note that the `R.scheduling` key is not included. The current design of Fluxion in which `R.scheduling` may contain a voluminous JGF object made caching this part of R impractical. Change libschedutil so that - the sched.free message handler never looks up R in the kvs - the free callback always sets its `R` argument to NULL - the SCHEDUTIL_FREE_NOLOOKUP flag is a no-op Update sched-simple's free callback to unpack R from the message instead of decoding the `R` arugment. Note that Fluxion sets SCHEDUTIL_FREE_NOLOOKUP so it already expects the free callback's R argument to be NULL. Although this change increases the size of sched.free payloads with data that Fluxion currently does not use, the ranks in R will be required by Fluxion in the future to identify resource subsets for partial release (flux-framework/flux-sched#1151). This change should be accompanied by an update to RFC 27. Update sched-simple unit test. Fixes flux-framework#5775
Problem: R has to be looked up from the KVS in the sched.free request handler, but now that the job manager caches R, this is an unnecessary extra step. Add R to the sched.free request payload. Note that the `R.scheduling` key is not included. The current design of Fluxion in which `R.scheduling` may contain a voluminous JGF object made caching this part of R impractical. Change libschedutil so that - the sched.free message handler never looks up R in the kvs - the free callback always sets its `R` argument to NULL - the SCHEDUTIL_FREE_NOLOOKUP flag is a no-op Update sched-simple's free callback to unpack R from the message instead of decoding the `R` arugment. Note that Fluxion sets SCHEDUTIL_FREE_NOLOOKUP so it already expects the free callback's R argument to be NULL. Although this change increases the size of sched.free payloads with data that Fluxion currently does not use, the ranks in R will be required by Fluxion in the future to identify resource subsets for partial release (flux-framework/flux-sched#1151). This change should be accompanied by an update to RFC 27. Update sched-simple unit test. Fixes flux-framework#5775
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that is ostensibly job independent. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released all at once to the scheduler, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that is ostensibly job independent. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released all at once to the scheduler, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that is ostensibly job independent. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released all at once to the scheduler, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that is ostensibly job independent. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released all at once to the scheduler, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
With the assumption that a fragment contains a subset of the job's broker ranks but the entire R (i.e., the full R for each broker rank) for each fragment broker rank, adding this support should be straightforward. Mainly what's needed is to identify the broker ranks in the R fragment and iterate through the vertices in the |
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that is ostensibly job independent. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released all at once to the scheduler, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single borker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single borker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that is ostensibly job independent. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released all at once to the scheduler, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that is ostensibly job independent. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released all at once to the scheduler, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that is ostensibly job independent. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released all at once to the scheduler, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that runs after the job. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released to the scheduler only after all ranks complete housekeeping, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that runs after the job. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released to the scheduler only after all ranks complete housekeeping, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
My understanding is that on elcap systems, the scheduler will need to be initialized from JGF in order to understand rabbit layout. Also, it will need to emit JGF for jobs in order to facilitate scheduler restart. The partial release will come in the form of R but that's OK because of this simplifying assumption right? |
Problem: jobs get stuck in CLEANUP state while long epilog scripts run, causing sadness and idling resources. Introduce a new type of epilog script called "housekeeping" that runs after the job. Instead of freeing resources directly to the scheduler, jobs free resources to housekeeping, post their free event, and may reach INACTIVE state. Meanwhile, housekeeping can run a script on the allocated resources and return the resources to the scheduler when complete. The resources are still allocated to the job as far as the scheduler is concerned while housekeeping runs. However since the job has transitioned to INACTIVE, the flux-accounting plugin will decrement the running job count for the user and stop billing the user for the resources. 'flux resource list' utility shows the resources as allocated. By default, resources are released to the scheduler only after all ranks complete housekeeping, as before. However, if configured, resources can be freed to the scheduler immediately as they complete housekeeping on each execution target, or a timer can be started on completion of the first target, and when the timer expires, all the targets that have completed thus far are freed in one go. Following that, resources are freed to the scheduler immediately as they complete. This works with sched-simple without changes, with the exception that the hello protocol does not currently support partial release so, as noted in the code, housekeeping and a new job could overlap when the scheduler is reloaded on a live system. Some RFC 27 work is needed to resolve ths. The Fluxion scheduler does not currently support partial release (flux-framework/flux-sched#1151). But as discussed over there, the combination of receiving an R fragment and a jobid in the free request should be sufficient to get that working.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed. Switch cancallation behavior based on the job_modify_t enum class.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed. Switch cancallation behavior based on the job_modify_t enum class.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed. Switch cancallation behavior based on the job_modify_t enum class.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed. Switch cancallation behavior based on the job_modify_t enum class.
Problem: Fluxion issue flux-framework#1151 and flux-core issue flux-framework/flux-core#4312 identified the need for partial release of resources. The current functionality need is to release all resources managed by a single broker rank. In the future support for releasing arbitrary subgraphs will be needed for cloud and converged use cases. Modify the rem_* traverser functions to take a modification type and type_to_count unordered_map. Add logic in the recursive job modification calls to distinguish between a full and partial job cancellation and issue corresponding planner interface calls, handling errors as needed. Switch cancallation behavior based on the job_modify_t enum class.
That's correct. The partial cancel/release just uses the Rlite fragment string contained in the
Famous last words. Fortunately the PR is merged and the functionality is in Fluxion now. |
@milroy, it looks like this one can be closed, so I'm closing it. If there's something we need to keep open here feel free to re-open. |
Problem: as discussed in flux-framework/flux-core#4312, the original plan for partial release of resources was to give the scheduler a
free
RPC for each R fragment of a job's resources that can be returned to the pool. In fluxion, the R is ignored in thefree
callback and the jobid is used instead to free all resources allocated to the job.An additional problem is that flux-core cannot fragment the contents of the opaque
scheduling
key in R.Assuming we figure out a way in flux-core to release resource in parts, how can this be made to work in fluxion?
Note that RFC 27 would need to be updated as it currently describes a single free RPC.
The text was updated successfully, but these errors were encountered: