-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-1451] HPA support #2776
base: main
Are you sure you want to change the base?
Conversation
add a new line in the end
@lianneli This is a great feature. On what metrics it will upscale/downscale. Is there any document for this? |
@s0nskar The official doc is https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/. For more conveniently using, I make a default setting in Values.yaml. Namely, choose cpu utilization for worker pods as the decisive metric. When cpu utilization higher than 70% as long as 10s, then upscale; and when cpu utilization lower than 70% last for 300s, then downscale. There are still some risks involved since the worker may still work. Although worker pods will trigger to decommission before close, it's highly recommended to set |
Thanks @lianneli for supporting Celeborn use HPA, But for Celeborn StatefulSet, I believe there are several shortcomings with HPA and current implementations:
Someone in the community has also proposed a solution for scaling (maybe later send to dev mail list), we can discussion these two solutions about scaling celeborn. |
@RexXiong The solution is great. I will follow up the discussion though mail list. |
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
What changes were proposed in this pull request?
resources
key in worker and master StatefulSet.Why are the changes needed?
For most of time in day time, spark task is very little and shuffle data is barely empty. Celeborn do not need much Pods which got waste of resources. HPA can control this automatically.
Does this PR introduce any user-facing change?
no. I add a switch to the HPA and the default value is false.
How was this patch tested?
Tested locally and in dev environment.