Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

versions: Bump to kernel 4.19.24 #1111

Merged
merged 1 commit into from
Feb 22, 2019

Conversation

sboeuf
Copy link

@sboeuf sboeuf commented Jan 10, 2019

We need to bump the kernel version from 4.14.67 to 4.19.24 in order
to follow the recent kernel config bump.

Fixes #618
Fixes #1029

Signed-off-by: Sebastien Boeuf [email protected]

@sboeuf
Copy link
Author

sboeuf commented Jan 10, 2019

/test

@sboeuf
Copy link
Author

sboeuf commented Jan 10, 2019

/cc @jcvenegas @grahamwhaley @chavafg

@sboeuf
Copy link
Author

sboeuf commented Jan 10, 2019

@grahamwhaley @chavafg
First issue has been solved by PR kata-containers/tests/pull/1044, but the memory footprint grew...

10:42:13 Report Summary:
10:42:13 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
10:42:13 | P/F |         NAME         |  FLR  |  MEAN  |  CEIL  |  GAP  |  MIN   |  MAX   |  RNG  | COV  | ITS |
10:42:13 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
10:42:13 | P   | boot-times           | 93.0% | 101.2% | 107.0% | 14.0% | 98.2%  | 127.9% | 30.2% | 6.1% |  20 |
10:42:13 | *F* | memory-footprint     | 85.0% | 127.5% | 115.0% | 30.0% | 127.5% | 127.5% | 0.0%  | 0.0% |   1 |
10:42:13 | P   | memory-footprint-ksm | 95.0% | 101.2% | 105.0% | 10.0% | 101.2% | 101.2% | 0.0%  | 0.0% |   1 |
10:42:13 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
10:42:13 Fails: 1, Passes 2

@jcvenegas
Copy link
Member

@sboeuf great seems that all now is tested as expected, @grahamwhaley so will need to find somebody to get find the regression in the kernel. We could check quickly the difference on the configs from 4.14 to 4.19 to make sure we did not let something added last time.

@Pennyzct
Copy link
Contributor

any update here? :)

@grahamwhaley
Copy link
Contributor

Hi @Pennyzct, all. Yeah, I was running up the metrics report comparing kernels 4.14.67 against 4.19.10.
Summary then:

  • Memory footprint goes up. 17+% for non-KSM, and ~3.5% for KSM mode.
  • As you'd then expect, container density drops, maybe by ~10% (is workload size dependant)
  • Memory usage inside the container stays about the same.
  • 4.19 is fractionally better on boot time (~2.5% drop)
    • interestingly, it looks like we get to the kernel quicker, but spend more time in the kernel.
  • storage and networking look to be unaffected

So, we get a boot time win, but take a fair hit on footprint/density.

I'll see if I can grab the kernel config files used/generated in the packaging build process to see if we've picked up any new default CONFIG=y type items that may be affecting us.

@grahamwhaley
Copy link
Contributor

I grabbed the configs for 4.14 and 4.19 (as configured/built by our tests/.ci and packaging scripts), sorted them, diffed them, and grep'd for =y - that gives us 57 lines of enabled feature difference between the two. Heh, I just realised we keep copies of both over at https://github.com/kata-containers/packaging/tree/master/kernel/configs .

As ever, I think the only way currently to understand what the config files and differences mean is to go through the diff one line at a time by hand. But, this does raise back up the 'kernel config fragment' discussion as per kata-containers/packaging#234

I'll have a pass at hand view/diffing the configs and see if anything pops out.

@Pennyzct
Copy link
Contributor

@grahamwhaley thanks for the detailed update!!! It looks like that quite a bit of work need to be done, especially view/diffing the 57 enabled difference. btw, kernel config fragment sounds nice, and if we're doing this, I will try to find all difference between arm64 and amd64. ;)

@jodh-intel
Copy link
Contributor

Any update on this folks?

@grahamwhaley
Copy link
Contributor

Nominally stuck due to footprint increase, and now potentially tied to kata-containers/packaging#314 I think.
Decision needs to be made if we wait for kata-containers/packaging#314 to complete/land, or if we forge ahead and accept the 4.19 impact.
tbh, from the work of kata-containers/packaging#314 we could probably tweak a few things out of the full 4.19 config in this PR and see if that makes a difference.

@jodh-intel
Copy link
Contributor

jodh-intel commented Jan 23, 2019

+1 for using the config fragments to track down what is causing the bloat.

Approved with PullApprove

@jodh-intel
Copy link
Contributor

Added dnm due to dependency.

Copy link
Member

@gnawux gnawux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @grahamwhaley

Due to https://nvd.nist.gov/vuln/detail/CVE-2018-16880

We should not merge this before the fix is merged into 4.19.y. (we guess it would be 4.19.19)

update: should be 4.19.20

@gnawux
Copy link
Member

gnawux commented Feb 3, 2019

update: The fix for CVE-2018-16880 is merged into 4.19.20-rc1 and will be released with 4.19.20.

ref: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/log/?h=linux-4.19.y

@jodh-intel jodh-intel added the security Potential or actual security issue label Feb 4, 2019
@sboeuf
Copy link
Author

sboeuf commented Feb 4, 2019

Thanks @gnawux for tracking this!

@grahamwhaley
Copy link
Contributor

Looks like 4.19.20 is out - https://www.kernel.org/
We also need to land this kernel update for general security fixes across the board.
Even if we have some performance impacts, given the work I looked at for kernel config fragment builds, I think we just have to take those as inevitable with a significant kernel version bump, and work later to see if we can bring them back down. Security trumps size/speed here.

@sboeuf - can you refresh this PR to 4.19.20 pls, and let's run the CIs and get this in...

@sboeuf
Copy link
Author

sboeuf commented Feb 12, 2019

@grahamwhaley ok I'll take care of that today!

@sboeuf
Copy link
Author

sboeuf commented Feb 12, 2019

@grahamwhaley updated!

gnawux
gnawux previously approved these changes Feb 12, 2019
@gnawux gnawux changed the title versions: Bump to kernel 4.19.10 versions: Bump to kernel 4.19.20 Feb 12, 2019
@gnawux
Copy link
Member

gnawux commented Feb 12, 2019

Thank you @sboeuf

@sboeuf
Copy link
Author

sboeuf commented Feb 12, 2019

/test

@grahamwhaley
Copy link
Contributor

/test

we had a bunch of CI fails, but it looks like a variety of network/service unavail issues, maybe not this PR. Try again...

@grahamwhaley
Copy link
Contributor

I think we have the DNM label due to #973 (making kernel versions arch dependant) - but, I'd like us to land this one first and then tackle #973 again if we need to. @jodh-intel wdyt - can we drop DNM from here and try to land this (once CIs are happy etc., of course....)

@grahamwhaley
Copy link
Contributor

Hi @Pennyzct - ARM CI failed to install Docker on this PR (first run after the ARM CI fix merge??). I'll leave it for you to maybe diagnose, and let you restart it if you need:

Setting up docker-ce (18.06.2~ce~3-0~ubuntu) ...
15:31:00 Job for docker.service failed because the control process exited with error code.
15:31:00 See "systemctl status docker.service" and "journalctl -xe" for details.
15:31:00 invoke-rc.d: initscript docker, action "start" failed.
15:31:00 ● docker.service - Docker Application Container Engine
15:31:00    Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
15:31:00   Drop-In: /etc/systemd/system/docker.service.d
15:31:00            └─kata-containers.conf
15:31:00    Active: activating (auto-restart) (Result: exit-code) since Fri 2019-02-15 23:31:00 CST; 21ms ago
15:31:00      Docs: https://docs.docker.com
15:31:00   Process: 88707 ExecStart=/usr/bin/dockerd -D --add-runtime kata-runtime=/usr/local/bin/kata-runtime --default-runtime=kata-runtime --storage-driver=overlay2 (code=exited, status=1/FAILURE)
15:31:00  Main PID: 88707 (code=exited, status=1/FAILURE)
15:31:00    CGroup: /system.slice/docker.service
15:31:00 
15:31:00 Feb 15 23:31:00 testing-1 systemd[1]: docker.service: Failed with result 'exit-code'.
15:31:00 Feb 15 23:31:00 testing-1 systemd[1]: Failed to start Docker Application Container Engine.
15:31:00 dpkg: error processing package docker-ce (--configure):
15:31:00  installed docker-ce package post-installation script subprocess returned error exit status 1
15:31:00 Errors were encountered while processing:
15:31:00  docker-ce
15:31:01 E: Sub-process /usr/bin/dpkg returned an error code (1)
15:31:01 Build step 'Execute shell' marked build as failure

@grahamwhaley
Copy link
Contributor

Mostly good. k8s fail on 16.04 - @sboeuf @chavafg - something we know about already??, before we re-nudge that one:

ok 1 Liveness probe
1..1
not ok 1 Running with postStart and preStop handlers
# (in test file k8s-attach-handlers.bats, line 26)
#   `kubectl wait --for=condition=Ready pod "$pod_name"' failed
# error: the server doesn't have a resource type "runtimeclass"
# pod/handlers created
# error: timed out waiting for the condition
# pod "handlers" deleted
Makefile:73: recipe for target 'kubernetes' failed
make: *** [kubernetes] Error 1

@sboeuf
Copy link
Author

sboeuf commented Feb 15, 2019

@grahamwhaley nothing that I'm aware of. Maybe @chavafg or @GabyCT have more inputs?

@chavafg
Copy link
Contributor

chavafg commented Feb 15, 2019

seems like it is good now, btw I have seen that failure on some other jobs, I'll investigate, but definitely not related to this.

@Pennyzct
Copy link
Contributor

Pennyzct commented Feb 18, 2019

@grahamwhaley ahhhh. So sorry, my bad.😭 We logged into the ARM CI and tried to fix the mount issue tests/#1191 and it kinds of accidentally broke the docker cgroup mounting point.
/test

@Pennyzct
Copy link
Contributor

I couldn't retrigger ARM CI last time, and i will try it again. ;)
/test

@grahamwhaley
Copy link
Contributor

@Pennyzct recursive unmount PR merged - I've re-triggered the ARM build - but, we might need two builds to clear the mounts... we'll see...

@sboeuf
Copy link
Author

sboeuf commented Feb 18, 2019

@grahamwhaley @Pennyzct
The error for ARM CI is now:

ERROR: failed to find default config /home/jenkins/workspace/kata-containers-runtime-ARM-18.04-PR/go/src/github.com/kata-containers/packaging/kernel/configs/arm64_kata_kvm_4.19.x

We're missing a config file for 4.19.X for ARM. @Pennyzct could you please submit one?

@Pennyzct
Copy link
Contributor

Pennyzct commented Feb 19, 2019

Hi~ @sboeuf @grahamwhaley I and my colleague @jongwu will fire an issue to add a 4.19.X config file for ARM asap. Furthermore, extra kernel patches like 0003-memory-hotplug-by-probe.patch were only applied for 4.14.X and also need to be updated.
One doubt here, do I need to keep the old 4.14.X config as additional option? ;)

@grahamwhaley
Copy link
Contributor

@Pennyzct you should probably leave the 4.14 config for now, as the current CIs will test against that (until this PR lands). We can have a config file cleanup later on.

@egernst
Copy link
Member

egernst commented Feb 20, 2019

This PR fixes #618

@sboeuf
Copy link
Author

sboeuf commented Feb 20, 2019

/test

We need to bump the kernel version from 4.14.67 to 4.19.24 in order
to follow the recent kernel config bump.

Fixes kata-containers#618
Fixes kata-containers#1029

Signed-off-by: Sebastien Boeuf <[email protected]>
@sboeuf sboeuf changed the title versions: Bump to kernel 4.19.23 versions: Bump to kernel 4.19.24 Feb 20, 2019
@sboeuf
Copy link
Author

sboeuf commented Feb 20, 2019

/test

@grahamwhaley
Copy link
Contributor

Waiting on kata-containers/packaging#338 I believe... and then we can refire the ARM (and the failing fedora which is unrelated to that PR) I think...

@grahamwhaley
Copy link
Contributor

ARM config file just merged in packaging - going to nudge the ARM CI to see what happens...

@sboeuf
Copy link
Author

sboeuf commented Feb 21, 2019

@egernst @grahamwhaley ARM CI passed, should we merge now?

@grahamwhaley
Copy link
Contributor

Yes. yes yes - let's merge.... let me just check there are no 'do not do this' messages in my inbox, and then I'll come do it...

@grahamwhaley
Copy link
Contributor

I hear no objections - we know why the metrics CI fails ... let's merge it
/cc @egernst

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
security Potential or actual security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants