Dynamically reclaiming resources #78

ahg-g · 2022-02-26T19:46:33Z

Currently a job's resources are reclaimed by Kueue only when the whole job finishes; for jobs with multiple pods, this entails waiting until the last pod finishes. This is not efficient as the pods of a parallel job may have laggards consuming little resources compared to the overall job.

One solution is to continuously update the Workload object with the number of completed pods so that Kueue can gradually reclaim the resources of those pods.

k8s-triage-robot · 2022-08-02T19:51:07Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

thisisprasad · 2022-08-11T11:34:55Z

@ahg-g I think we should add a new field in the status of workload, to track the count of completed pods.
What do you think?

Is there a way to get all pods belonging to a workload?

alculquicondor · 2022-08-11T15:04:30Z

This is not something that kueue core controllers should do. It's specific to the kind of workload. In the case of Job, it should be done in pkg/controller/workload/job. And this controller doesn't need to look at Pods, just at the Job status.

I think we should add a new field in the status of workload, to track the count of completed pods.

Yes, we need that, but I would rather see a more complete design before adding the API fields. We can probably do this for the 0.3.0 release.

alculquicondor · 2022-08-11T15:05:07Z

If you are willing to write a design, please feel free to take this issue.

thisisprasad · 2022-08-11T18:19:28Z

@alculquicondor thanks for the information!

thisisprasad · 2022-08-11T18:19:47Z

I would like to work on this task.
/assign

thisisprasad · 2022-08-12T19:31:08Z

API field changes design here:

// WorkloadStatus defines the observed state of Workload
type WorkloadStatus struct {
	// conditions hold the latest available observations of the Workload
	// current state.
	// +optional
	// +listType=map
	// +listMapKey=type
	Conditions []WorkloadCondition `json:"conditions,omitempty"`

	// The number of pods which reached phase Succeeded or Failed.
	// +optional
	CompletedPods int32 `json:"completedPods"`
}

thisisprasad · 2022-08-12T19:32:18Z

High-level flow:

The completed pods in a job is sum of Succeeded+failed pods => sum(succeeded, failed)
We update the workload status object if job.sum(succeded, failed) > wl.CompletedPods
Handle the update event of workload in its reconciler.
Update clusterqueue quota for resource flavor for requests of pod workload in cache.

Currently I don't see any scenario where CompletedPods field will used in the reconciliation routine of workload.

thisisprasad · 2022-08-12T19:33:03Z

Please validate the above design and approach.

alculquicondor · 2022-08-12T19:47:46Z

Why would failed pods matter? The job controller would create a replacement pod, which should be taking quota.

Not sure if a github issue is the best avenue to provide feedback on a design. Could you start a google doc? Alternatively, we could start an enhancements folder where we can add design proposals with a format similar to https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md

thisisprasad · 2022-08-13T12:20:58Z

Will start with enhancements folder and add design proposal.

k8s-triage-robot · 2022-09-12T13:02:12Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-10-12T13:39:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2022-10-12T13:39:38Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alculquicondor · 2022-10-12T15:33:46Z

/reopen

@thisisprasad is currently working on the proposal

k8s-ci-robot · 2022-10-12T15:33:50Z

@alculquicondor: Reopened this issue.

In response to this:

/reopen

@thisisprasad is currently working on the proposal

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2022-11-11T15:56:53Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2022-11-11T15:56:57Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kerthcet · 2022-11-14T03:19:22Z

reopen for tracking.
/reopen

k8s-ci-robot · 2022-11-14T03:19:25Z

@kerthcet: Reopened this issue.

In response to this:

reopen for tracking.
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2022-12-14T03:36:16Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2022-12-14T03:36:20Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tenzen-y · 2022-12-14T03:38:39Z

/reopen

k8s-ci-robot · 2022-12-14T03:38:43Z

@tenzen-y: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tenzen-y · 2022-12-14T03:40:25Z

/remove-lifecycle rotten

alculquicondor · 2022-12-14T20:10:16Z

/unassign @thisisprasad
/assign @mwielgus

Thanks for the progress so far @thisisprasad

k8s-ci-robot · 2022-12-14T20:10:18Z

@alculquicondor: GitHub didn't allow me to assign the following users: mwielgus.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/unassign @thisisprasad
/assign @mwielgus

Thanks for the progress so far @thisisprasad

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mwielgus · 2023-01-11T15:22:22Z

/assign @mwielgus

alculquicondor · 2023-03-20T12:48:56Z

@kerthcet

mwielgus · 2023-04-12T19:32:05Z

/unassign

trasc · 2023-04-27T12:49:26Z

/assign

ahg-g added kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Feb 26, 2022

alculquicondor mentioned this issue Mar 10, 2022

Remove Active condition from QueuedWorkload #111

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2022

k8s-ci-robot assigned thisisprasad Aug 11, 2022

thisisprasad mentioned this issue Aug 15, 2022

KEP-78: Dynamically reclaiming resources #331

Closed

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 12, 2022

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 12, 2022

k8s-ci-robot reopened this Oct 12, 2022

alculquicondor mentioned this issue Oct 17, 2022

Add enhancement for Workload preemption #410

Merged

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 11, 2022

k8s-ci-robot reopened this Nov 14, 2022

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 14, 2022

k8s-ci-robot reopened this Dec 14, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 14, 2022

k8s-ci-robot unassigned thisisprasad Dec 14, 2022

k8s-ci-robot assigned mwielgus Jan 11, 2023

alculquicondor mentioned this issue Mar 15, 2023

☂️ Requirements for v0.4 #636

Closed

k8s-ci-robot unassigned mwielgus Apr 12, 2023

k8s-ci-robot assigned trasc Apr 27, 2023

alculquicondor mentioned this issue May 3, 2023

Discussion about feature gate #740

Closed

This was referenced May 4, 2023

KEP-78: Dynamically reclaiming resources #742

Merged

Dynamically reclaiming resources #756

Merged

k8s-ci-robot closed this as completed in #756 May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically reclaiming resources #78

Dynamically reclaiming resources #78

ahg-g commented Feb 26, 2022 •

edited

Loading

k8s-triage-robot commented Aug 2, 2022

thisisprasad commented Aug 11, 2022

alculquicondor commented Aug 11, 2022

alculquicondor commented Aug 11, 2022

thisisprasad commented Aug 11, 2022

thisisprasad commented Aug 11, 2022

thisisprasad commented Aug 12, 2022

thisisprasad commented Aug 12, 2022

thisisprasad commented Aug 12, 2022 •

edited

Loading

alculquicondor commented Aug 12, 2022

thisisprasad commented Aug 13, 2022

k8s-triage-robot commented Sep 12, 2022

k8s-triage-robot commented Oct 12, 2022

k8s-ci-robot commented Oct 12, 2022

alculquicondor commented Oct 12, 2022

k8s-ci-robot commented Oct 12, 2022

k8s-triage-robot commented Nov 11, 2022

k8s-ci-robot commented Nov 11, 2022

kerthcet commented Nov 14, 2022

k8s-ci-robot commented Nov 14, 2022

k8s-triage-robot commented Dec 14, 2022

k8s-ci-robot commented Dec 14, 2022

tenzen-y commented Dec 14, 2022

k8s-ci-robot commented Dec 14, 2022

tenzen-y commented Dec 14, 2022

alculquicondor commented Dec 14, 2022

k8s-ci-robot commented Dec 14, 2022

mwielgus commented Jan 11, 2023

alculquicondor commented Mar 20, 2023

mwielgus commented Apr 12, 2023

trasc commented Apr 27, 2023

Dynamically reclaiming resources #78

Dynamically reclaiming resources #78

Comments

ahg-g commented Feb 26, 2022 • edited Loading

k8s-triage-robot commented Aug 2, 2022

thisisprasad commented Aug 11, 2022

alculquicondor commented Aug 11, 2022

alculquicondor commented Aug 11, 2022

thisisprasad commented Aug 11, 2022

thisisprasad commented Aug 11, 2022

thisisprasad commented Aug 12, 2022

thisisprasad commented Aug 12, 2022

thisisprasad commented Aug 12, 2022 • edited Loading

alculquicondor commented Aug 12, 2022

thisisprasad commented Aug 13, 2022

k8s-triage-robot commented Sep 12, 2022

k8s-triage-robot commented Oct 12, 2022

k8s-ci-robot commented Oct 12, 2022

alculquicondor commented Oct 12, 2022

k8s-ci-robot commented Oct 12, 2022

k8s-triage-robot commented Nov 11, 2022

k8s-ci-robot commented Nov 11, 2022

kerthcet commented Nov 14, 2022

k8s-ci-robot commented Nov 14, 2022

k8s-triage-robot commented Dec 14, 2022

k8s-ci-robot commented Dec 14, 2022

tenzen-y commented Dec 14, 2022

k8s-ci-robot commented Dec 14, 2022

tenzen-y commented Dec 14, 2022

alculquicondor commented Dec 14, 2022

k8s-ci-robot commented Dec 14, 2022

mwielgus commented Jan 11, 2023

alculquicondor commented Mar 20, 2023

mwielgus commented Apr 12, 2023

trasc commented Apr 27, 2023

ahg-g commented Feb 26, 2022 •

edited

Loading

thisisprasad commented Aug 12, 2022 •

edited

Loading