Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct/cg: refactor/improve/rename GetHugePageSize -> HugePageSizes #3234

Merged
merged 4 commits into from
Nov 29, 2021

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Oct 6, 2021

(this is based on and currently includes PR #3233)

  1. Since GetHugePageSize do not have any external users (checked by
    sourcegraph), and no internal user ever uses its second return value
    (the error), let's drop it.

  2. Rename GetHugePageSize -> HugePageSizes (drop the Get prefix as per
    Go guidelines, add suffix since we return many sizes).

  3. Implement lazy init for HugePageSizes

    I have noticed that libct/cg/fs allocates 8K during init on every runc
    execution:

    init github.com/opencontainers/runc/libcontainer/cgroups/fs @1.5 ms, 0.028 ms clock, 8512 bytes, 13 allocs

    Apparently this is caused by the global HugePageSizes variable initialization,
    which is only used from GetStats (i.e. it is never used by the runc binary itself).

    Remove it, and use HugePageSizes() directly instead. Make it init-once,
    so that GetStats (which, I guess, is periodically called by kubernetes)
    does not re-read huge page sizes over and over.

    This also removes 12 allocs and 8K from libct/cg/fs init section:

    $ time GODEBUG=inittrace=1 ./runc --help 2>&1 | grep cgroups/fs
    init github.com/opencontainers/runc/libcontainer/cgroups/fs @1.5 ms, 0.003 ms clock, 16 bytes, 1 allocs

  4. libct/cg: TestGetHugePageSizeImpl: use t.Run

    Move test case comments to doc strings, and use t.Run.

@kolyshkin kolyshkin added this to the 1.2.0 milestone Oct 6, 2021
@kolyshkin
Copy link
Contributor Author

This is definitely post-1.1 material

@kolyshkin
Copy link
Contributor Author

$ time GODEBUG=inittrace=1 ./runc --help 2>&1 | grep cgroups/fs

Speaking of this, I have a WIP set of patches to remove some init-time allocations.

thaJeztah
thaJeztah previously approved these changes Oct 6, 2021
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

left one suggestion (could be in a follow-up)

return
}

hugePageSizes, _ = getHugePageSizeFromFilenames(files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the only place the error returned from getHugePageSizeFromFilenames() is used, is in a unit test. That test already checks for warnings; wondering if we should change the errors to warnings (as we won't be handling them);

// check for warnings
if c.isWarn && warns.Len() == 0 {
t.Errorf("input %v, expected a warning, got none", c.input)
}

While at it, wondering if we should change that test to use the logrus testhook, like

hook := test.NewGlobal()
defer hook.Reset()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH we can simplify the whole thing, letting getHugePageSizeFromFilenames() always return errors, and log them as warnings in HugePageSizes()!

....

Implemented, PTAL @thaJeztah

@kolyshkin
Copy link
Contributor Author

Rebased on top of current git tip to repro #3266

1. Since GetHugePageSize do not have any external users (checked by
   sourcegraph), and no internal user ever uses its second return value
   (the error), let's drop it.

2. Rename GetHugePageSize -> HugePageSizes (drop the Get prefix as per
   Go guidelines, add suffix since we return many sizes).

Signed-off-by: Kir Kolyshkin <[email protected]>
I have noticed that libct/cg/fs allocates 8K during init on every runc
execution:

> init github.com/opencontainers/runc/libcontainer/cgroups/fs @1.5 ms, 0.028 ms clock, 8512 bytes, 13 allocs

Apparently this is caused by global HugePageSizes variable init, which
is only used from GetStats (i.e. it is never used by runc itself).

Remove it, and use HugePageSizes() directly instead. Make it init-once,
so that GetStats (which, I guess, is periodically called by kubernetes)
does not re-read huge page sizes over and over.

This also removes 12 allocs and 8K from libct/cg/fs init section:

> $ time GODEBUG=inittrace=1 ./runc --help 2>&1 | grep cgroups/fs
> init github.com/opencontainers/runc/libcontainer/cgroups/fs @1.5 ms, 0.003 ms clock, 16 bytes, 1 allocs

Signed-off-by: Kir Kolyshkin <[email protected]>
1. Instead of distinguishing between errors and warnings, let's treat all
   errors as warnings, thus simplifying the code. This changes the
   function behaviour for input like hugepages-BadNumberKb --
   previously, the error from Atoi("BadNumber") was considered fatal,
   now it's just another warnings.

2. Move the warning logging to HugePageSizes, thus simplifying the test
   case, which no longer needs to read what logrus writes. Note that we
   do not want to log all the warnings (as chances are very low we'll
   get any, and if we do this means the code need to be updated), only
   the first one.

Signed-off-by: Kir Kolyshkin <[email protected]>
thaJeztah
thaJeztah previously approved these changes Nov 19, 2021
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

is the failure still the thing we're debugging?

libcontainer/cgroups/utils_test.go Outdated Show resolved Hide resolved
Move test case comments to doc strings, and use t.Run.

Suggested-by:  Sebastiaan van Stijn <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

kolyshkin commented Nov 19, 2021

is the failure still the thing we're debugging?

No, it was a one time thing, totally unrelated, and fixed by #3273

@kolyshkin
Copy link
Contributor Author

Since 1.1 is delayed I think this can be included.

@kolyshkin kolyshkin modified the milestones: 1.2.0, 1.1.0 Nov 19, 2021
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kolyshkin
Copy link
Contributor Author

@AkihiroSuda PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants