Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

virtiofs: Allow memory hotplug with virtiofs #1810

Merged
merged 1 commit into from
Jul 17, 2019

Conversation

ganeshmaharaj
Copy link
Contributor

Kata with virtio-fs fails to do memory hotplugging. This is caused by
the fact that hot plugged memory is always backed by
'memory-backend-ram' while virtio-fs expects it to be backed by file and
shared for it to be able to use the system the way it is intended. This
chnage allows using file based memory backend for virtio-fs, hugepages
or when the user prefers to use a file backed memory

Fixes: #1745
Signed-off-by: Ganesh Maharaj Mahalingam [email protected]

@ganeshmaharaj ganeshmaharaj added the do-not-merge PR has problems or depends on another label Jun 18, 2019
@ganeshmaharaj
Copy link
Contributor Author

This PR depends on kata-containers/govmm#96 and once that lands, we can vendor in the changes. Getting this out here for review while we wait for govmm change to land.

@ganeshmaharaj ganeshmaharaj force-pushed the virtiofs-hotplug branch 2 times, most recently from b87a7a1 to bc8614a Compare June 18, 2019 18:01
@ganeshmaharaj ganeshmaharaj requested a review from a team as a code owner June 18, 2019 18:01
share = true
} else if q.config.SharedFS == config.VirtioFS || q.config.FileBackedMemRootDir != "" {
target = q.qemuConfig.Memory.Path
memory_back = "memory-backend-file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default virtio-fs case right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. this is the default case.

if q.qemuConfig.Knobs.HugePages {
// we are setting all the bits that govmm sets when hugepages are enabled.
// https://github.com/intel/govmm/blob/master/qemu/qemu.go#L1677
target = "/dev/hugepages"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just go away from explicit huge pages and use the memory-backend-file method long term.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Even if this can be the most performant way to use virtio-fs, it can have expectation on the host that most people don't have, therefore I think only people who know what they're doing should enable it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today from master and 1.7.x release, virtio-fs uses /dev/shm and is no longer using hugepages. this change is only making sure we are compatible with hugepages and memory hotplug.

@mcastelino
Copy link
Contributor

@ganeshmaharaj @egernst with this change we can use virtio-fs freely with kubernetes right? and also support dynamic updates to container memory sizes.

@ganeshmaharaj ganeshmaharaj removed the do-not-merge PR has problems or depends on another label Jun 18, 2019
@ganeshmaharaj
Copy link
Contributor Author

/test-nemu

@ganeshmaharaj
Copy link
Contributor Author

/test

@codecov
Copy link

codecov bot commented Jun 18, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@3bd4bb6). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master    #1810   +/-   ##
=========================================
  Coverage          ?   52.46%           
=========================================
  Files             ?      108           
  Lines             ?    13963           
  Branches          ?        0           
=========================================
  Hits              ?     7325           
  Misses            ?     5768           
  Partials          ?      870

@ganeshmaharaj
Copy link
Contributor Author

@GabyCT @chavafg the cri-containerd tests are failing with

[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
E0618 23:54:02.402221  107987 reset.go:192] [reset] Failed to remove containers: failed to stop running pod 81d595cdda922ca7f36fcde5cfb1edccadf4e09c9953f377a2d3a3c66d1ecc5f: output: time="2019-06-18T23:53:53Z" level=fatal msg="stopping the pod sandbox \"81d595cdda922ca7f36fcde5cfb1edccadf4e09c9953f377a2d3a3c66d1ecc5f\" failed: rpc error: code = Unknown desc = failed to stop sandbox container \"81d595cdda922ca7f36fcde5cfb1edccadf4e09c9953f377a2d3a3c66d1ecc5f\" in '\\x01' state: wait sandbox container \"81d595cdda922ca7f36fcde5cfb1edccadf4e09c9953f377a2d3a3c66d1ecc5f\" stop timeout"
, error: exit status 1
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

81d595cdda922ca7f36fcde5cfb1edccadf4e09c9953f377a2d3a3c66d1ecc5f
ERROR: 1 pods left and found at /var/lib/vc/sbs
Makefile:115: recipe for target 'kubernetes' failed
make: *** [kubernetes] Error 1
Failed at 23: sudo -E PATH="$PATH" CRI_RUNTIME="containerd" bash -c "make kubernetes"

seems like a new issue with the teardown process of k8s. anything that stands out at either of you from that error log?

@chavafg
Copy link
Contributor

chavafg commented Jun 19, 2019

@ganeshmaharaj
on the containerd-cri job a pod was found at /var/lib/vc/sbs, need to check with @GabyCT if maybe this could be because of a recent test that was added 2 days ago.

In addition, on the initrd job I see that there was a failure on a docker test:

[3] • Failure [44.578 seconds]
[3] run
[3] /tmp/jenkins/workspace/kata-containers-runtime-ubuntu-18-04-PR-initrd/go/src/github.com/kata-containers/tests/integration/docker/run_test.go:101
[3]   hot plug block devices
[3]   /tmp/jenkins/workspace/kata-containers-runtime-ubuntu-18-04-PR-initrd/go/src/github.com/kata-containers/tests/integration/docker/run_test.go:146
[3]     should be attached [It]
[3]     /tmp/jenkins/workspace/kata-containers-runtime-ubuntu-18-04-PR-initrd/go/src/github.com/kata-containers/tests/integration/docker/run_test.go:147
[3] 
[3]     Expected
[3]         <int>: 125
[3]     to be zero-valued
[3] [docker run --cidfile /tmp/cid199379043/Truo8V6ZhsBnrGsbnz6idB5gr5Zpn3 --runtime kata-runtime --device /dev/loop0 --device /dev/loop1 --device /dev/loop2 --device /dev/loop3 --device /dev/loop4 --device /dev/loop5 --device /dev/loop6 --device /dev/loop7 --device /dev/loop8 --device /dev/loop9 --rm --name Truo8V6ZhsBnrGsbnz6idB5gr5Zpn3 busybox stat /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4 /dev/loop5 /dev/loop6 /dev/loop7 /dev/loop8 /dev/loop9]
[3] Timeout: 120 seconds
[3] Exit Code: 125
[3] Stdout: 
[3] Stderr: docker: Error response from daemon: OCI runtime create failed: rpc error: code = DeadlineExceeded desc = Timeout reached after 3s waiting for device 0000:00:02.0/0000:01:09.0: unknown.
[3] 
[3] Running command '/usr/bin/docker [docker ps -a -f name=Truo8V6ZhsBnrGsbnz6idB5gr5Zpn3 --format {{.Status}}]'
[3] [docker ps -a -f name=Truo8V6ZhsBnrGsbnz6idB5gr5Zpn3 --format {{.Status}}]
[3] Timeout: 120 seconds
[3] Exit Code: 0
[3] Stdout: 
[3] Stderr: 

@egernst
Copy link
Member

egernst commented Jun 19, 2019

opened an issue for the test failure, as I've seen this on other PRs as well @ganeshmaharaj @chavafg: kata-containers/tests#1745 /cc @GabyCT

@jodh-intel
Copy link
Contributor

Any update on this @ganeshmaharaj?

chavafg added a commit to chavafg/tests-1 that referenced this pull request Jun 25, 2019
we still have some issues running memory hotplug using
virtiofs, so many of the cri-o and k8s tests do not run
as expected.
This should be resolved on kata-containers/runtime#1810,
but in the meantime run only docker related tests.

Signed-off-by: Salvador Fuentes <[email protected]>
chavafg added a commit to chavafg/tests-1 that referenced this pull request Jun 25, 2019
we still have some issues running memory hotplug using
virtiofs, so many of the cri-o and k8s tests do not run
as expected.
This should be resolved on kata-containers/runtime#1810,
but in the meantime run only docker related tests.

Signed-off-by: Salvador Fuentes <[email protected]>
@awprice
Copy link
Contributor

awprice commented Jun 26, 2019

I've tested this PR, and can now specify memory limits/requests in Kubernetes and they work properly with virtio-fs. Looking forward to this being released.

@raravena80
Copy link
Member

@ganeshmaharaj any updates?

@ganeshmaharaj
Copy link
Contributor Author

@jodh-intel @raravena80 the patch is good to merge. I am currently out with very limited access to a PC for the next few weeks. I am not able to make sense of my the initrd job fails to run a container. The other crio issue seems to no longer exist.

@chavafg
Copy link
Contributor

chavafg commented Jun 28, 2019

@ganeshmaharaj The initrd job fails consistently on the run hot plug block devices test

@ganeshmaharaj
Copy link
Contributor Author

@chavafg ohh.. I am curious why that's failing. This change shouldn't affect that. If anyone can get to this sooner than 2 weeks, please do. Will be able to look at this again in a couple of weeks when I am back with decent connectivity.

@egernst
Copy link
Member

egernst commented Jul 1, 2019

may be worth splitting out the govmm vendoring, then, since the runtime changes shouldn't impact this now-consistently-failing-for-this-PR test.

@egernst
Copy link
Member

egernst commented Jul 1, 2019

Nack: the vendoring commit + adjustments to use it should be standalone, ie, should be able to bisect. I see compiler errors with just the first 3 commits.

@ganeshmaharaj
Copy link
Contributor Author

@ganeshmaharaj The initrd job fails consistently on the run hot plug block devices test

@chavafg Just rebased the branch and locally tested initrd and i am not able to reproduce the issue. This is the logs i got from my test.

ganeshma@virtiofs:~$ docker run -it --rm --runtime kata-runtime --device /dev/loop0 busybox stat /dev/loop0
  File: /dev/loop0
  Size: 0               Blocks: 0          IO Block: 4096   block special file
Device: 22h/34d Inode: 1141        Links: 1     Device type: fe,0
Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    6/    disk)
Access: 2019-07-08 06:42:23.000000000
Modify: 2019-07-08 06:42:23.000000000
Change: 2019-07-08 06:42:23.000000000

Setup
Ran .ci/setup.sh from tests repository with the following env variables.

OPENSHIFT=no
KUBERNETES=no
TEST_INITRD=yes

Nack: the vendoring commit + adjustments to use it should be standalone, ie, should be able to bisect. I see compiler errors with just the first 3 commits.

@egernst fixed the patch for this comment. Now all commits are bisectable.

@ganeshmaharaj
Copy link
Contributor Author

/test-initrd

@ganeshmaharaj
Copy link
Contributor Author

/test

@ganeshmaharaj
Copy link
Contributor Author

@chavafg that issue happens only when i try to hotplug more than 8 devices. In the failing case the error logs look like this

Jul 08 09:40:06 virtio-fs kata-proxy[18094]: time="2019-07-08T09:40:06.420554653Z" level=info msg="[    1.557931] pci 0000:01:09.0: BAR 0: no space for [io  size 0x0080]\n" name=kata-proxy pid=18094 sandbox=4a2ac9f659baddbb81d932d06845e509a49884887e2c58bee83fcb8befba06e
9 source=agent
Jul 08 09:40:06 virtio-fs kata-proxy[18094]: time="2019-07-08T09:40:06.420991944Z" level=info msg="[    1.558349] pci 0000:01:09.0: BAR 0: failed to assign [io  size 0x0080]\n" name=kata-proxy pid=18094 sandbox=4a2ac9f659baddbb81d932d06845e509a49884887e2c58bee83fcb8befb
a06e9 source=agent
Jul 08 09:40:06 virtio-fs kata-proxy[18094]: time="2019-07-08T09:40:06.421556459Z" level=info msg="[    1.558922] virtio-pci 0000:01:09.0: enabling device (0000 -> 0002)\n" name=kata-proxy pid=18094 sandbox=4a2ac9f659baddbb81d932d06845e509a49884887e2c58bee83fcb8befba06e
9 source=agent
Jul 08 09:40:06 virtio-fs kata-proxy[18094]: time="2019-07-08T09:40:06.437137938Z" level=info msg="[    1.574434] virtio-pci 0000:01:09.0: virtio_pci: leaving for legacy driver\n" name=kata-proxy pid=18094 sandbox=4a2ac9f659baddbb81d932d06845e509a49884887e2c58bee83fcb8b
efba06e9 source=agent
Jul 08 09:40:06 virtio-fs kata-proxy[18094]: time="2019-07-08T09:40:06.452432996Z" level=info msg="[    1.589597] virtio-pci: probe of 0000:01:09.0 failed with error -12\n" name=kata-proxy pid=18094 sandbox=4a2ac9f659baddbb81d932d06845e509a49884887e2c58bee83fcb8befba06e
9 source=agent
Jul 08 09:40:06 virtio-fs kata-proxy[18094]: time="2019-07-08T09:40:06.453150422Z" level=info msg="[    1.590290] probe of 0000:01:09.0 returned 0 after 31509 usecs\n" name=kata-proxy pid=18094 sandbox=4a2ac9f659baddbb81d932d06845e509a49884887e2c58bee83fcb8befba06e9 sou
rce=agent

whereas without the patch, partial logs show up but the device is plugged

Jul 08 10:12:09 virtiofs kata-proxy[20206]: time="2019-07-08T10:12:09.607255566Z" level=info msg="[    1.966531] pci 0000:01:09.0: BAR 0: no space for [io  size 0x0080]\n" name=kata-proxy pid=20206 sandbox=b09b3b11d4d3a3d01aa54b06c4d1cd493f5a0f9aadac845dc3e843e7a653a985
 source=agent
Jul 08 10:12:09 virtiofs kata-proxy[20206]: time="2019-07-08T10:12:09.607541745Z" level=info msg="[    1.966922] pci 0000:01:09.0: BAR 0: failed to assign [io  size 0x0080]\n" name=kata-proxy pid=20206 sandbox=b09b3b11d4d3a3d01aa54b06c4d1cd493f5a0f9aadac845dc3e843e7a653
a985 source=agent
Jul 08 10:12:09 virtiofs kata-proxy[20206]: time="2019-07-08T10:12:09.608042943Z" level=info msg="[    1.967379] virtio-pci 0000:01:09.0: enabling device (0000 -> 0002)\n" name=kata-proxy pid=20206 sandbox=b09b3b11d4d3a3d01aa54b06c4d1cd493f5a0f9aadac845dc3e843e7a653a985
 source=agent
Jul 08 10:12:09 virtiofs kata-proxy[20206]: time="2019-07-08T10:12:09.651096121Z" level=info msg="[    2.010187] virtio_blk virtio12: [vdi] 20971520 512-byte logical blocks (10.7 GB/10.0 GiB)\n" name=kata-proxy pid=20206 sandbox=b09b3b11d4d3a3d01aa54b06c4d1cd493f5a0f9aa
dac845dc3e843e7a653a985 source=agent

@chavafg
Copy link
Contributor

chavafg commented Jul 8, 2019

any idea @devimc?

@ganeshmaharaj
Copy link
Contributor Author

this change makes it work.

diff --git a/vendor/github.com/intel/govmm/qemu/qmp.go b/vendor/github.com/intel/govmm/qemu/qmp.go
index 359e236..371ed04 100644
--- a/vendor/github.com/intel/govmm/qemu/qmp.go
+++ b/vendor/github.com/intel/govmm/qemu/qmp.go
@@ -1083,9 +1083,9 @@ func (q *QMP) ExecutePCIDeviceAdd(ctx context.Context, blockdevID, devID, driver
        if isVirtioPCI[DeviceDriver(driver)] {
                args["romfile"] = romfile

-               if disableModern {
-                       args["disable-modern"] = disableModern
-               }
+       //      if disableModern {
+       //              args["disable-modern"] = disableModern
+       //      }
        }

        return q.executeCommand(ctx, "device_add", args, nil)

@devimc
Copy link

devimc commented Jul 8, 2019

@ganeshmaharaj we need disable-modern on azure

@ganeshmaharaj
Copy link
Contributor Author

ganeshmaharaj commented Jul 9, 2019

@ganeshmaharaj we need disable-modern on azure

@devimc With the current code in master, disable-modern=off for device passing to docker images with kata. Please correct me if i am wrong, but this is what i am seeing. The call into govmm will never execute the code https://github.com/intel/govmm/blob/master/qemu/qmp.go#L1093-L1099 as the current version of govmm we have does not have this change. https://github.com/intel/govmm/blob/master/qemu/qemu_arch_base.go#L50. This PR is where, with the vendor update disable-modern becomes true. in short, azure has been testing initrd with disable-modern set to off all this while. Am i missing something here?

@ganeshmaharaj
Copy link
Contributor Author

Once #1868 lands, I will rebase this change onto master. @egernst @jodh-intel @grahamwhaley I think these two changes should be backported to 1.7.x branch as virtio-fs feature needs this. WDYT?

@grahamwhaley
Copy link
Contributor

backport sounds sane to me. /cc @gnawux @stefanha for views on backport suitability and the re-enablement of 'modern'.

Kata with virtio-fs fails to do memory hotplugging. This is caused by
the fact that hot plugged memory is always backed by
'memory-backend-ram' while virtio-fs expects it to be backed by file and
shared for it to be able to use the system the way it is intended. This
chnage allows using file based memory backend for virtio-fs, hugepages
or when the user prefers to use a file backed memory

Fixes: kata-containers#1745
Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
@ganeshmaharaj
Copy link
Contributor Author

/test

@ganeshmaharaj
Copy link
Contributor Author

Verified that this patch, after rebase works as expected with virtiofs. @jcvenegas @mcastelino @devimc

@coreyjohnston
Copy link

Looks like #1868 has recently been merged. Are there any other dependancies on this @ganeshmaharaj ? We're super-keen to see this merged and give it a go so we can finally move away from 9p.

@ganeshmaharaj
Copy link
Contributor Author

ganeshmaharaj commented Jul 17, 2019

@coreyjohnston nope.. none that i can think of. This patch should be ready to land. once this hits master, i will backport this 1.7.x tree to make sure we have a stable with this fix. @egernst @devimc @mcastelino @grahamwhaley @jodh-intel any objections?

@chavafg
Copy link
Contributor

chavafg commented Jul 17, 2019

merging as it is already approved and all CI passed, thanks @ganeshmaharaj

@chavafg chavafg merged commit e89195e into kata-containers:master Jul 17, 2019
@ganeshmaharaj ganeshmaharaj deleted the virtiofs-hotplug branch July 17, 2019 17:00
@coreyjohnston
Copy link

Brilliant, thanks @ganeshmaharaj !

@ganeshmaharaj
Copy link
Contributor Author

@coreyjohnston we will be backporting this to 1.8 release, but not to 1.7. Hope that is alright. Since this is still an experimental feature, backporting a vendor change to 1.7 was decided to be much of a risk on the stability front.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

virtio-fs: failed to launch kata using -m to specify memory on docker command