-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune memory allocation of arm64 github actions workflows #17045
Comments
@jmhbnz would love to work on this! |
can work on this too :) |
Many thanks @moficodes and @fykaa for volunteering! There are three files that need to be updated to include the new
@moficodes if you can add the initial telemetry action then perhaps @fykaa could follow up with the second pull request after we get some profiling data to update the amount of ram requested for each workflow? |
part of etcd-io#17045 Signed-off-by: Mofi Rahman <[email protected]>
Hi folks, We have our own variant of metering which has a bit less signal noise if you wanted to try it out for comparison: https://x.com/alexellisuk/status/1732327834922766785?s=20 https://gist.github.com/alexellis/1f33e581c75e11e161fe613c46180771 @moficodes @jmhbnz did you notice anything telling about RAM usage from the community workflow action you added? We haven't seen much RAM consumed on the CNCF servers when the etcd-io jobs are running, but these two actions run inside the VM and can give a better picture. Thanks, Alex |
Thanks @alexellis - Yes I believe based on the samples so far we can dramatically reduce our RAM requests. We just wanted to collect samples over a few days then I believe @fykaa will raise the pull request to propose some much lower RAM values for our workflows. |
I just saw there was some commentary on the PR, I think I saw 2GB and 4GB respectively, which is a considerably lower amount than the original 32GB. I'd recommend going no lower than 8GB as a base value. Of course this can also be tuned over time as required. |
Glad to see telemetry in place for this. For context, these servers have (relatively speaking) lots of cores compared to what you are used to seeing on most x86 systems, and again relatively speaking less memory per core that you might anticipate. (80 arm64 cores, 256G memory, so if you do the math that's approximately 3G per core available - of course it's not exactly that in practice). |
@alexellis the ram usage was constantly below the 5gb mark across all arm64 tests. So with a bit of headroom we dropped the ram requirement to 8gb. So far so good. |
Closing - The memory tuning has been completed, thanks everyone 🙏🏻 |
Hey @jmhbnz, sorry for necro bumping this issue. But, should we do the same for bbolt? When we implemented the ARM64 workflows, we did a best guess of what was a good amount of RAM for the runner. I can create an issue there if you think this makes sense. |
Recent issue no problem at all this is actually a great catch. Let's re-open this issue and ensure we address /reopen |
What would you like to be added?
The etcd project currently uses managed
arm64
ci runners provided by actuated.dev. This is defined in our github actions workflow files, for example for e2e:etcd/.github/workflows/e2e-arm64.yaml
Lines 9 to 14 in 51eb29a
In that example we are requesting 8cpu and 32gb ram, but it's not clear that we currently need all that ram currently.
We can introduce https://github.com/catchpoint/workflow-telemetry-action to our arm64 workflows to gather simple charts of job memory consumption and then tune each workflow:
Why is this needed?
If we can tune the amount of ram our jobs are requesting closer to what they actually need, we will be better citizens within the actuated managed runner ecosystem and potentially allow for more runner slots to be made available which will reduce how often our jobs get stuck in queues.
The text was updated successfully, but these errors were encountered: