Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune memory allocation of arm64 github actions workflows #17045

Closed
jmhbnz opened this issue Nov 30, 2023 · 11 comments · Fixed by etcd-io/bbolt#652
Closed

Tune memory allocation of arm64 github actions workflows #17045

jmhbnz opened this issue Nov 30, 2023 · 11 comments · Fixed by etcd-io/bbolt#652

Comments

@jmhbnz
Copy link
Member

jmhbnz commented Nov 30, 2023

What would you like to be added?

The etcd project currently uses managed arm64 ci runners provided by actuated.dev. This is defined in our github actions workflow files, for example for e2e:

runs-on: actuated-arm64-8cpu-32gb
strategy:
fail-fast: false
matrix:
target:
- linux-arm64-e2e

In that example we are requesting 8cpu and 32gb ram, but it's not clear that we currently need all that ram currently.

We can introduce https://github.com/catchpoint/workflow-telemetry-action to our arm64 workflows to gather simple charts of job memory consumption and then tune each workflow:

image

Why is this needed?

If we can tune the amount of ram our jobs are requesting closer to what they actually need, we will be better citizens within the actuated managed runner ecosystem and potentially allow for more runner slots to be made available which will reduce how often our jobs get stuck in queues.

@moficodes
Copy link
Member

@jmhbnz would love to work on this!

@fykaa
Copy link
Contributor

fykaa commented Nov 30, 2023

can work on this too :)

@jmhbnz
Copy link
Member Author

jmhbnz commented Nov 30, 2023

Many thanks @moficodes and @fykaa for volunteering!

There are three files that need to be updated to include the new workflow-telemetry-action:

@moficodes if you can add the initial telemetry action then perhaps @fykaa could follow up with the second pull request after we get some profiling data to update the amount of ram requested for each workflow?

moficodes added a commit to moficodes/etcd that referenced this issue Nov 30, 2023
@alexellis
Copy link
Contributor

Hi folks,

We have our own variant of metering which has a bit less signal noise if you wanted to try it out for comparison:

https://x.com/alexellisuk/status/1732327834922766785?s=20

https://gist.github.com/alexellis/1f33e581c75e11e161fe613c46180771

@moficodes @jmhbnz did you notice anything telling about RAM usage from the community workflow action you added? We haven't seen much RAM consumed on the CNCF servers when the etcd-io jobs are running, but these two actions run inside the VM and can give a better picture.

Thanks,

Alex

@jmhbnz
Copy link
Member Author

jmhbnz commented Dec 6, 2023

Thanks @alexellis - Yes I believe based on the samples so far we can dramatically reduce our RAM requests. We just wanted to collect samples over a few days then I believe @fykaa will raise the pull request to propose some much lower RAM values for our workflows.

@alexellis
Copy link
Contributor

I just saw there was some commentary on the PR, I think I saw 2GB and 4GB respectively, which is a considerably lower amount than the original 32GB. I'd recommend going no lower than 8GB as a base value. Of course this can also be tuned over time as required.

@vielmetti
Copy link

Glad to see telemetry in place for this. For context, these servers have (relatively speaking) lots of cores compared to what you are used to seeing on most x86 systems, and again relatively speaking less memory per core that you might anticipate. (80 arm64 cores, 256G memory, so if you do the math that's approximately 3G per core available - of course it's not exactly that in practice).

@moficodes
Copy link
Member

@alexellis the ram usage was constantly below the 5gb mark across all arm64 tests. So with a bit of headroom we dropped the ram requirement to 8gb. So far so good.

@jmhbnz
Copy link
Member Author

jmhbnz commented Dec 12, 2023

Closing - The memory tuning has been completed, thanks everyone 🙏🏻

@jmhbnz jmhbnz closed this as completed Dec 12, 2023
@ivanvc
Copy link
Member

ivanvc commented Dec 15, 2023

Hey @jmhbnz, sorry for necro bumping this issue. But, should we do the same for bbolt? When we implemented the ARM64 workflows, we did a best guess of what was a good amount of RAM for the runner.

I can create an issue there if you think this makes sense.

@jmhbnz
Copy link
Member Author

jmhbnz commented Dec 15, 2023

Recent issue no problem at all this is actually a great catch. Let's re-open this issue and ensure we address bbolt, many thanks! 🙏🏻

/reopen
/assign ivanvc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment