Tune memory allocation of arm64 github actions workflows #17045

jmhbnz · 2023-11-30T18:50:24Z

What would you like to be added?

The etcd project currently uses managed arm64 ci runners provided by actuated.dev. This is defined in our github actions workflow files, for example for e2e:

etcd/.github/workflows/e2e-arm64.yaml

Lines 9 to 14 in 51eb29a

    
           runs-on: actuated-arm64-8cpu-32gb 
        
           strategy: 
        
             fail-fast: false 
        
             matrix: 
        
               target: 
        
                 - linux-arm64-e2e

In that example we are requesting 8cpu and 32gb ram, but it's not clear that we currently need all that ram currently.

We can introduce https://github.com/catchpoint/workflow-telemetry-action to our arm64 workflows to gather simple charts of job memory consumption and then tune each workflow:

Why is this needed?

If we can tune the amount of ram our jobs are requesting closer to what they actually need, we will be better citizens within the actuated managed runner ecosystem and potentially allow for more runner slots to be made available which will reduce how often our jobs get stuck in queues.

The text was updated successfully, but these errors were encountered:

moficodes · 2023-11-30T19:17:50Z

@jmhbnz would love to work on this!

fykaa · 2023-11-30T19:18:34Z

can work on this too :)

jmhbnz · 2023-11-30T19:31:16Z

Many thanks @moficodes and @fykaa for volunteering!

There are three files that need to be updated to include the new workflow-telemetry-action:

@moficodes if you can add the initial telemetry action then perhaps @fykaa could follow up with the second pull request after we get some profiling data to update the amount of ram requested for each workflow?

part of etcd-io#17045 Signed-off-by: Mofi Rahman <[email protected]>

alexellis · 2023-12-06T09:18:22Z

Hi folks,

We have our own variant of metering which has a bit less signal noise if you wanted to try it out for comparison:

https://x.com/alexellisuk/status/1732327834922766785?s=20

https://gist.github.com/alexellis/1f33e581c75e11e161fe613c46180771

@moficodes @jmhbnz did you notice anything telling about RAM usage from the community workflow action you added? We haven't seen much RAM consumed on the CNCF servers when the etcd-io jobs are running, but these two actions run inside the VM and can give a better picture.

Thanks,

Alex

jmhbnz · 2023-12-06T09:24:06Z

Thanks @alexellis - Yes I believe based on the samples so far we can dramatically reduce our RAM requests. We just wanted to collect samples over a few days then I believe @fykaa will raise the pull request to propose some much lower RAM values for our workflows.

alexellis · 2023-12-06T09:29:59Z

I just saw there was some commentary on the PR, I think I saw 2GB and 4GB respectively, which is a considerably lower amount than the original 32GB. I'd recommend going no lower than 8GB as a base value. Of course this can also be tuned over time as required.

vielmetti · 2023-12-06T18:56:11Z

Glad to see telemetry in place for this. For context, these servers have (relatively speaking) lots of cores compared to what you are used to seeing on most x86 systems, and again relatively speaking less memory per core that you might anticipate. (80 arm64 cores, 256G memory, so if you do the math that's approximately 3G per core available - of course it's not exactly that in practice).

moficodes · 2023-12-12T19:30:46Z

@alexellis the ram usage was constantly below the 5gb mark across all arm64 tests. So with a bit of headroom we dropped the ram requirement to 8gb. So far so good.

jmhbnz · 2023-12-12T19:33:05Z

Closing - The memory tuning has been completed, thanks everyone 🙏🏻

ivanvc · 2023-12-15T07:13:38Z

Hey @jmhbnz, sorry for necro bumping this issue. But, should we do the same for bbolt? When we implemented the ARM64 workflows, we did a best guess of what was a good amount of RAM for the runner.

I can create an issue there if you think this makes sense.

jmhbnz · 2023-12-15T07:52:44Z

Recent issue no problem at all this is actually a great catch. Let's re-open this issue and ensure we address bbolt, many thanks! 🙏🏻

/reopen
/assign ivanvc

jmhbnz added area/performance area/testing type/feature help wanted good first issue labels Nov 30, 2023

jmhbnz assigned moficodes and fykaa Nov 30, 2023

moficodes added a commit to moficodes/etcd that referenced this issue Nov 30, 2023

add workflow telemetry to collect action metrics

fe43753

part of etcd-io#17045 Signed-off-by: Mofi Rahman <[email protected]>

moficodes mentioned this issue Nov 30, 2023

add workflow telemetry to collect action metrics #17046

Merged

fykaa mentioned this issue Dec 11, 2023

Adjusted RAM Requirements for arm64 Workflows #17089

Merged

jmhbnz closed this as completed Dec 12, 2023

jmhbnz reopened this Dec 15, 2023

k8s-ci-robot assigned ivanvc Dec 15, 2023

ivanvc mentioned this issue Dec 16, 2023

github actions: add workflow telemetry to collect metrics etcd-io/bbolt#640

Merged

jmhbnz mentioned this issue Dec 17, 2023

Remove workflow telemetry github action #17131

Merged

jmhbnz mentioned this issue Dec 27, 2023

Tune arm64 memory allocation and disable telemetry etcd-io/bbolt#652

Merged

ahrtr closed this as completed in etcd-io/bbolt#652 Dec 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune memory allocation of arm64 github actions workflows #17045

Tune memory allocation of arm64 github actions workflows #17045

jmhbnz commented Nov 30, 2023 •

edited

Loading

moficodes commented Nov 30, 2023

fykaa commented Nov 30, 2023

jmhbnz commented Nov 30, 2023 •

edited

Loading

alexellis commented Dec 6, 2023

jmhbnz commented Dec 6, 2023

alexellis commented Dec 6, 2023

vielmetti commented Dec 6, 2023

moficodes commented Dec 12, 2023

jmhbnz commented Dec 12, 2023 •

edited

Loading

ivanvc commented Dec 15, 2023

jmhbnz commented Dec 15, 2023

Tune memory allocation of arm64 github actions workflows #17045

Tune memory allocation of arm64 github actions workflows #17045

Comments

jmhbnz commented Nov 30, 2023 • edited Loading

What would you like to be added?

Why is this needed?

moficodes commented Nov 30, 2023

fykaa commented Nov 30, 2023

jmhbnz commented Nov 30, 2023 • edited Loading

alexellis commented Dec 6, 2023

jmhbnz commented Dec 6, 2023

alexellis commented Dec 6, 2023

vielmetti commented Dec 6, 2023

moficodes commented Dec 12, 2023

jmhbnz commented Dec 12, 2023 • edited Loading

ivanvc commented Dec 15, 2023

jmhbnz commented Dec 15, 2023

jmhbnz commented Nov 30, 2023 •

edited

Loading

jmhbnz commented Nov 30, 2023 •

edited

Loading

jmhbnz commented Dec 12, 2023 •

edited

Loading