Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ws-daemon] Fix CPU limit annotation #9479

Closed
wants to merge 3 commits into from
Closed

[ws-daemon] Fix CPU limit annotation #9479

wants to merge 3 commits into from

Conversation

utam0k
Copy link
Contributor

@utam0k utam0k commented Apr 22, 2022

Description

Fix CPU limit annotation.
I actually annotated it manually and it worked.

Related Issue(s)

Fixes #9407

How to test

Annotate as follows and confirm that cpu.max of cgroup is set.
Note: you have to enable CPU limit configuration in ws-daemon.

$ kubectl annotate --overwrite pods $ws-pod gitpod.io/cpuLimit=50000m

Release Notes

[ws-daemon] Fix CPU limit annotation

Documentation

@utam0k utam0k requested a review from a team April 22, 2022 07:37
@utam0k
Copy link
Contributor Author

utam0k commented Apr 22, 2022

/hold until figure out why we revert last time.

@github-actions github-actions bot added the team: workspace Issue belongs to the Workspace team label Apr 22, 2022
@utam0k
Copy link
Contributor Author

utam0k commented Apr 22, 2022

@aledbf @sagor999 Do you remember why you reverted last time? I felt like this was working fine.
#9382


// if we didn't get the max bandwidth, but were throttled last time
// and there's still some bandwidth left to give, let's act as if had
// never spent any CPU time and assume the workspace will spend their
// entire bandwidth at once.
var burst bool
if totalBandwidth < d.TotalBandwidth && ws.Throttled() {
limit = d.BurstLimiter.Limit(ws.Usage())
limiter := d.BurstLimiter
if w := wsidx[id]; w.BaseLimit > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be w.BurstLimiter > 0?

@sagor999
Copy link
Contributor

@utam0k it was reverted because it was breaking CPU limiting for all workspaces. CPU limiting was not working at all with this change, not clear why.

@kylos101
Copy link
Contributor

@utam0k my recollection is that it did not work with cgroup v2 and gen34.
We are currently using cgroup v1 in production with gen41.
How did you do your test (above)? Did you test with cgroup v2 in workspace-preview?

@utam0k
Copy link
Contributor Author

utam0k commented Apr 24, 2022

@utam0k my recollection is that it did not work with cgroup v2 and gen34. We are currently using cgroup v1 in production with gen41. How did you do your test (above)? Did you test with cgroup v2 in workspace-preview?

@kylos101
That's right. I used the workspace-preview with cgroup v2 for the tests.

@utam0k
Copy link
Contributor Author

utam0k commented Apr 24, 2022

@sagor999 @kylos101
Sorry, I haven't figured out how to break it yet, so let me confirm a few things.

  1. What are the situations of the breakage? When annotated?
  2. How was it broken? CPU limits not working, cgroup not reflecting values as expected?
  3. Has the situation recovered with the revert?

@kylos101
Copy link
Contributor

kylos101 commented Apr 25, 2022

@utam0k I just updated #9407 slightly, my apologies for being unclear.

@sagor999 @kylos101 Sorry, I haven't figured out how to break it yet, so let me confirm a few things.

  1. What are the situations of the breakage? When annotated?
    Ignore annotations for a moment.

The problem was that cpu limiting was not working effectively after this PR was deployed to production. While we observed that CPU limiting was not working, I do not recall if this annotation was set on workspaces. @Furisto do you recall?

  1. How was it broken? CPU limits not working, cgroup not reflecting values as expected?

More than one workspace was getting 6 cores. cgroup was reflecting this, confirming that we were allocating too much.

  1. Has the situation recovered with the revert?

Yes

@utam0k
Copy link
Contributor Author

utam0k commented Apr 25, 2022

I was able to experience this once during my trial and error. However, I felt the dispatcher did not seem to be fully operational. I will investigate a little more, but the recovery of the state may not be due to the revert the PR but to restarting ws-daemon at the time. I will investigate a little more overhead. However, there are many riddles...

@kylos101
Copy link
Contributor

Flipping this to draft, @utam0k , as you are still investigating.

@kylos101 kylos101 marked this pull request as draft April 25, 2022 15:29
@utam0k
Copy link
Contributor Author

utam0k commented Apr 25, 2022

Please close this PR once because I investigate this branch.

@utam0k utam0k closed this Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why did fixing CPU limit annotation break CPU limiting with cgroup v2?
5 participants