Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1451] HPA support #2776

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

lianneli
Copy link
Contributor

What changes were proposed in this pull request?

  1. Add HorizontalPodAutoscaler to worker in helm chart.
  2. Add HorizontalPodAutoscaler test.
  3. Add lifecycle preStop hook to worker StatefulSet, when HPA close worker, worker will trigger decommission through http restful api.
  4. Delete duplicated resources key in worker and master StatefulSet.
  5. Change app version to 0.6.0

Why are the changes needed?

For most of time in day time, spark task is very little and shuffle data is barely empty. Celeborn do not need much Pods which got waste of resources. HPA can control this automatically.

Does this PR introduce any user-facing change?

no. I add a switch to the HPA and the default value is false.

How was this patch tested?

Tested locally and in dev environment.

@s0nskar
Copy link
Contributor

s0nskar commented Sep 30, 2024

@lianneli This is a great feature. On what metrics it will upscale/downscale. Is there any document for this?

@lianneli
Copy link
Contributor Author

lianneli commented Oct 8, 2024

@lianneli This is a great feature. On what metrics it will upscale/downscale. Is there any document for this?

@s0nskar The official doc is https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.

For more conveniently using, I make a default setting in Values.yaml. Namely, choose cpu utilization for worker pods as the decisive metric. When cpu utilization higher than 70% as long as 10s, then upscale; and when cpu utilization lower than 70% last for 300s, then downscale.

There are still some risks involved since the worker may still work. Although worker pods will trigger to decommission before close, it's highly recommended to set celeborn.client.push.replicate.enabled to true for more stable performance.

@RexXiong
Copy link
Contributor

RexXiong commented Oct 8, 2024

@lianneli This is a great feature. On what metrics it will upscale/downscale. Is there any document for this?

@s0nskar The official doc is https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.

For more conveniently using, I make a default setting in Values.yaml. Namely, choose cpu utilization for worker pods as the decisive metric. When cpu utilization higher than 70% as long as 10s, then upscale; and when cpu utilization lower than 70% last for 300s, then downscale.

There are still some risks involved since the worker may still work. Although worker pods will trigger to decommission before close, it's highly recommended to set celeborn.client.push.replicate.enabled to true for more stable performance.

Thanks @lianneli for supporting Celeborn use HPA, But for Celeborn StatefulSet, I believe there are several shortcomings with HPA and current implementations:

  1. Worker may still working, although worker pods will trigger to decommission before close as your comment.
  2. Once a worker pod has been decommissioned, there is currently no mechanism in place to recommission it. Consequently, if the cluster experiences increased demand post-decommissioning, new worker pods must be spun up. and the cluster may has low resource efficiency, or data maybe loss because 1.
  3. Such as the stabilization window (stabilizationWindowSeconds), or the ability to dynamically enable/disable scaling up/down, are fixed at deployment time. Adjustments to these settings or altering the number of replicas in the StatefulSet require redeployment, IMO we can support changing those parameter/configuration at runtime.
  4. The solution has limitation as If we want support custom scale behavior/metrics(network/disk space/memory/cpu?) or ResourceManager (Inner Platform )(not direct talk to k8s)

Someone in the community has also proposed a solution for scaling (maybe later send to dev mail list), we can discussion these two solutions about scaling celeborn.

@lianneli
Copy link
Contributor Author

lianneli commented Oct 8, 2024

@RexXiong The solution is great. I will follow up the discussion though mail list.

Copy link

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants