Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stage one of cgroup RFC #1627

Merged
merged 4 commits into from
Nov 5, 2021
Merged

Conversation

fearful-symmetry
Copy link
Contributor

This is the Stage 1 component of the RFC process. Sorry for the delay between the stages, I got a bit swept up with other work for a bit.

Hopefully the proposed/example cgroup.yml file is correct, I wasn't completely sure how things should be formatted.

@fearful-symmetry fearful-symmetry requested a review from a team October 12, 2021 23:33
@kgeller
Copy link
Contributor

kgeller commented Oct 13, 2021

Hi @fearful-symmetry ! No worries at all, there is truly no rush when it comes to the RFC process. It can go as quickly or as slowly as those involved need.

A couple notes:

  1. If you could identify sponsors/subject matter experts to also review and participate, that would be great. We in ECS simply try to shepherd, and let the experts in these areas form the specifics.

  2. I know the initial plan was to have a top level cgroup fieldset, but what do you think about putting this inside of process? We currently have a process.pgid, defined as Identifier of the group of processes the process belongs to. Now, I am far from an expert on cgroups, but that sounds like something similar?

  3. In your description, you mention a couple times how we currently are only including cgroup metric fields that are common between v1 and v2. What happens if//when v1 gets deprecated -- do we think we will add more fields? Similarly, what happens if/when a v3 comes along? Ie. is this current list of fields the ones we want to be tracking because it's what we want or is it because those are the common ones at the current time?
    The goal of ECS in general is for all different sources can effectively 'speak the same language' and I want to make sure that we go into this with the intention, and not so much of a how do we bridge this v1 v2 gap if that makes sense. I want to make sure that we have long lasting longevity and not just a stop-gap solution.

@fearful-symmetry
Copy link
Contributor Author

fearful-symmetry commented Oct 13, 2021

If you could identify sponsors/subject matter experts to also review and participate

I think I've become the default SME on cgroups, but I've pinged Jaime anyway, since I think he's tinkered with this before. @kgeller

We currently have a process.pgid, defined as Identifier of the group of processes the process belongs to. Now, I am far from an expert on cgroups, but that sounds like something similar?

So, linux process groups (used largely for signalling) are conceptually different from cgroups (used for controlling resource access and usage), so that might be confusing.

What happens if//when v1 gets deprecated -- do we think we will add more fields?

I think that's an ECS philosophy question, and I'm not sure of the answer. My thinking was to include fields that are the most likely to be used for monitoring and alerting, and not to cram in everything that I technically could.

The need for some kind of standard here springs from the fact that we report cgroup metrics in a relatively transparent fashion, mirroring the hierarchies and controller names used by that particular cgroup. These names and hierarchies vary a great deal between V1 and V2, so if (theoretically) a V3 comes along, we'll probably have a similar need to report "valuable" metrics in a consistent fashion.

"hybrid" cgroup systems are probably going to be around for a while, since a cgroup V2-only system requires both a V2-only OS and a V2-only application. Right now all the big Linux LTS distros are either hybrid, or V1 only, and there will be a decent bit of pressure to keep up "hybrid" OS releases, as a V2-only OS will require changes to applications, config management, etc. Right now, the only distros shipping with V2-only are "bleeding edge" releases like Fedora. So, by this, measure, it is technically a stop-gap, but my sysadmin experience tells me that V1 is going to be around for a long, long time.

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could identify sponsors/subject matter experts to also review and participate

I think I've become the default SME on cgroups, but I've pinged Jaime anyway, since I think he's tinkered with this before. @kgeller

@fearful-symmetry maybe you should add us to the "People" section of the RFC.

We currently have a process.pgid, defined as Identifier of the group of processes the process belongs to. Now, I am far from an expert on cgroups, but that sounds like something similar?

So, linux process groups (used largely for signalling) are conceptually different from cgroups (used for controlling resource access and usage), so that might be confusing.

I am on the fence here. On one side I think these metrics are usually going to be reported along with process data, so they could fit under the process group for practical reasons. But on the other side, they are not 1:1 related to process, many processes could belong to the same cgroup, and actually there could be a collector collecting only cgroup metrics, without associating them to any process. They could be also collected for containers, without any association with processes. Probably putting them outside of process is the safest path.

Or, would it make sense to think on a new fieldset for process groups, and put these metrics there? Though this could be confusing and can be difficult to find common fields. As Alex mentioned, only in Linux there are already different kinds of groups of processes, if we start thinking about different OSs the thing can go out of hand quickly.

As a side topic, perhaps we can also think on having some reusable fieldset for common resources, we already added cpu and memory metrics to host, and now these ones to cgroups. Pinging @kaiyan-sheng here that she has been thinking about this for inventary purpouses.

rfcs/text/0028-cgroups.md Show resolved Hide resolved
rfcs/text/0028/cgroups.yml Show resolved Hide resolved
@jsoriano
Copy link
Member

jsoriano commented Oct 14, 2021

but my sysadmin experience tells me that V1 is going to be around for a long, long time.

+1, I wouldn't make decisions thinking on the removal of v1 🙂

@kgeller
Copy link
Contributor

kgeller commented Oct 15, 2021

@fearful-symmetry maybe you should add us to the "People" section of the RFC.

That would be awesome! Otherwise I can add that in when getting everything ready for merge.

I think that's an ECS philosophy question, and I'm not sure of the answer. My thinking was to include fields that are the most likely to be used for monitoring and alerting, and not to cram in everything that I technically could.

I think this is perfect. Adding in those most-likely-to-be-used fields is exactly what we want. I just had hesitation that we were only going to be adding the fields convenient from v1 & v2 overlap, but that is not the case, so yay!


@fearful-symmetry @jsoriano thank you both for the insight into where these fields should sit. I totally see where you're coming from and can see cgroups getting their own top level field set. If @kaiyan-sheng has other inventory ideas from the work with other metrics, that would be cool to get insight on as well.

@kaiyan-sheng
Copy link
Contributor

@jsoriano @fearful-symmetry @kgeller We currently have these basic monitoring metrics (like CPU, disk...) added under host, and we are also working on adding them under container. Will it be good to have the common fields defined as separate field sets and reuse them if it's applicable?

@fearful-symmetry
Copy link
Contributor Author

Will it be good to have the common fields defined as separate field sets and reuse them if it's applicable?

Perhaps? We might want a few "basic" ones like memory, cpu, etc, but I don't think it should be a requirement.

@fearful-symmetry
Copy link
Contributor Author

Made a couple of the suggested changes. Apologies for the delay, I got sucked into bug squashing again.

As far as merging/combining this with other fields and metrics, my concern is the weird way that cgroups connect to other things. A cgroup can be linked to a single process, a group of processes, a container, etc. We risk "over linking" the data if we start connecting things to certain cgroups without being 100% sure that that connection makes sense.

@kaiyan-sheng
Copy link
Contributor

@fearful-symmetry Sounds good to me to keep cgroups fields separate for now. Thanks!

@fearful-symmetry
Copy link
Contributor Author

@kgeller anything else we need here?

@kgeller
Copy link
Contributor

kgeller commented Nov 5, 2021

@fearful-symmetry I think we are all set from an ECS perspective, and are good to merge as long as your sponsor(s) / SME(s) are on board!

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good from my side, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants