Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use floor(quota) rather than ceil(quota) #13

Merged
merged 1 commit into from
Feb 23, 2018
Merged

Use floor(quota) rather than ceil(quota) #13

merged 1 commit into from
Feb 23, 2018

Conversation

jcorbin
Copy link
Contributor

@jcorbin jcorbin commented Feb 16, 2018

Apologies, this was an oversight on my part back when we wrote the x/runtime precursor that I helped to review. I'd since been working on the (it turns out false) assumption that this had been floor all along. Recent investigation into pervasive low-grade throttling tho led me to this discovery.

In short:

  • as far as the Go scheduler is concerned, there's no such thing as a fractional CPU
  • so let's say you have your quota set to N + p for some integer N and some 0.0 < p < 1.0
  • the only safe assumption then is that you're using that p value as a hedge for something like "systems effects" or "c libraries"
    • in that latter case, what you really might want to be able give maxprocs some value K of reserved CPUs for some parasite like a C library or sidecar; but that's probably more of an option for our Fx wrapper?

@codecov
Copy link

codecov bot commented Feb 16, 2018

Codecov Report

Merging #13 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #13   +/-   ##
=======================================
  Coverage   93.58%   93.58%           
=======================================
  Files           7        7           
  Lines         187      187           
=======================================
  Hits          175      175           
  Misses         10       10           
  Partials        2        2

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6fecebd...58c8fec. Read the comment docs.

Copy link
Contributor

@prashantv prashantv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests?

I don't have strong opinions here, but it seems like this will end up always underutilizing CPU, but avoid throttling. the current approach throttles, but doesn't underutilize the machine.

@jcorbin
Copy link
Contributor Author

jcorbin commented Feb 16, 2018

What sort of testing strategy do you envision?

@akshayjshah
Copy link
Contributor

akshayjshah commented Feb 16, 2018

This doesn't seem clearly more correct than the current behavior - we're basically choosing between pervasive underutilization and pervasive low-level throttling. Given our current internal environment, I think we should bias towards improving utilization from the top-level automaxprocs package. (@jasonlai likely has more informed opinions on that, though.)

However, I completely buy your argument that we need a way to reserve CPU for sidecars and other miscellaneous things in the container. What do you think of adding some more options to the maxprocs package? We could add an option to supply a func(float64) float64 to replace math.Ceil, or we could add an option to reserve some CPU shares for sidecars, or we could do both. We can also expose config-based methods to set those options in our maxprocs-fx integration internally.

Honestly, I'd have been more open to this change pre-1.0. It doesn't break the Go package API, but it'll be an unpleasant change for anyone with a fractional CPU quota that's not bothered by occasional throttling. Unless it's clearly better in the general case, I'd prefer to introduce more knobs & levers instead of changing the default.

Copy link
Contributor

@akshayjshah akshayjshah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss a bit. Happy to chat in person and capture the output here post-hoc if that's easier.

@jcorbin jcorbin mentioned this pull request Feb 16, 2018
Copy link
Collaborator

@jasonlai jasonlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshayjshah has it correct that I chose pervasive low-level throttling over pervasive underutilization.

Practically, the latter is more suitable for our use cases here since we have more than just the Go service itself inside a container, like Mesos executors and/or sidecars. And Go runtime itself also imposes some overhead, enough to justify a margin for underutilization. So I'm approving the diff here to unblock you.

However, I do have some concerns honestly about the impact of underutilization on throughput by rounding down. I assume this could be non-trivial particularly when the CPU quota is less than 3, making the trimmed fractional part proportionally large. Perhaps we should thinking about rounding to the nearest interest with math.Round instead, so the behaviors of existing services won't be impacted too much?

That said, I really like #14 that gives people an option to tune the margin that they see fit. Let's move on to that.

Copy link
Contributor

@prashantv prashantv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Compute folks have signed off on it, I don't have any issues with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants