Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Quality of Service #114

Closed
vishh opened this issue Aug 25, 2015 · 6 comments
Closed

Support for Quality of Service #114

vishh opened this issue Aug 25, 2015 · 6 comments

Comments

@vishh
Copy link
Contributor

vishh commented Aug 25, 2015

The API has to let users express the notion of priority among containers. Consider the case when a web server is run alongside a logging container. The web-server is more important than the logging side-car container. When there is a resource crunch, the user doesn't mind killing one of the lower priority containers, which is logging in this case. Ideally, if all containers were to be run with limits, this might not be necessary. In reality though, setting limits is hard and users tend to over provision resources and that leads to poor resource utilization.

Since this notion of priority can be expressed in different ways, I propose letting users handle cgroup management and only expose the following knobs:

  1. CgroupsPath - This lets users manage cgroup hierarchies.
  2. OomScoreAdj - Apply a custom 'oom_score_adj' to the init process of the container.
@wking
Copy link
Contributor

wking commented Aug 26, 2015

On Tue, Aug 25, 2015 at 04:48:58PM -0700, Vish Kannan wrote:

The API has to let users express the notion of priority among
containers. Consider the case when a web server is run alongside a
logging container…

We need to expose the underlying knobs to let the runtime-caller
manage this sort of thing when launching a container, but I think
explicitly handling relationships between containers is out-of-scope
for this spec 1.

  1. CgroupsPath - This lets users manage cgroup hierarchies.

We can do this with namespaces now 2. Similar handling in
linux.resources for joining existing cgroups (in addition to creating
new ones, which is the focus of the current Resources 3) sounds good
to me.

  1. OomScoreAdj - Apply a custom 'oom_score_adj' to the init
    process of the container.

We have disableOOMKiller, which landed without docs in #51. But yeah,
there appears to be no way to set oom_score_adj 4 directly.
Personally, I think this sort of thing is better handled via a
host-injected pre-start hook, since it's more of a deploy-time
decision than a bundle-author-time decision. But I'm still unclear on
the intendend environment for those hooks 5.

 From: W. Trevor King
 Subject: Re: appc + oci harmonization progress
 Date: Tue, 18 Aug 2015 11:20:12 -0700
 Cc: [email protected]
 Message-ID: <[email protected]>

@vishh
Copy link
Contributor Author

vishh commented Sep 2, 2015

As requested during the last meeting, following is the rationale for this feature:

Setting resource limits is tricky for containers. More often than not, users end up allocating more resources than what is necessary.
To prevent this, we can let users specify a minimum requirement, and let them burst under load. When the application bursts, the node can become overcommitted, since we run containers based on the minimum resource requirement. Whenever there is system memory pressure, we do not want the containers that burst to affect the containers that have a fixed limit.
Essentially, we are treating containers that have limits to be more important than the ones that just provide a minimum requirement.

To further improve node utilization, users can run batch tasks like map-reduce, which end up using resources that are not used by other category of tasks. Whenever the higher priority tasks need more resources, these batch tasks will be killed. These batch tasks are expected to tolerate failures.

A common pattern will be to run latency sensitive user facing applications with preset limits, thereby guaranteeing them resources.
Some parts of the application that can tolerate some of amount of failures can be run with minimum requirement and be allowed to burst.
Map-reduce can run as batch tasks.
With the features mentioned above, users should be able to run all these tasks together on the same node and guarantee isolation and performance to a certain degree.

cc @philips

@vishh
Copy link
Contributor Author

vishh commented Sep 2, 2015

@wking: +1 for not handling relationship between containers. It is not necessary and higher layers can manage that.
Using a pre-start hook for oom_score_adj sounds like a good idea as well.

@vishh
Copy link
Contributor Author

vishh commented Oct 12, 2015

I take back my previous comment. How can we set oom_score_adj on pre-start hooks if the intention is to set the oom_score_adj value on the container's init process?

@wking
Copy link
Contributor

wking commented Oct 12, 2015

On Mon, Oct 12, 2015 at 12:39:39PM -0700, Vish Kannan wrote:

I take back my previous comment. How can we set oom_score_adj on
pre-start hooks if the intention is to set the oom_score_adj value
on the container's init process?

What is the problem you expect? Everything you need for this should
be in my #115 example 1, which worked (as far as I can tell). Do
you see something wrong with that example?

@vishh
Copy link
Contributor Author

vishh commented Oct 12, 2015

#115 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants