Skip to content
This repository has been archived by the owner on Mar 28, 2018. It is now read-only.

Unable to start a container in CentOS 7 with Docker #950

Closed
syed opened this issue Jun 6, 2017 · 35 comments
Closed

Unable to start a container in CentOS 7 with Docker #950

syed opened this issue Jun 6, 2017 · 35 comments

Comments

@syed
Copy link

syed commented Jun 6, 2017

Hi,

I've basically followed the instructions mentioned at https://github.com/01org/cc-oci-runtime/blob/master/documentation/Installing-Clear-Containers-on-Centos-7.md. However, when try to start a container using

sudo docker run -ti fedora bash

It just hangs. I ran the linux-check-config.sh script to see if there are any problems, but I don't find any

root@host-7: ~ # ./check_clear.sh  container
Checking if host is capable of running Clear Linux* OS for Intel® Architecture in a container

SUCCESS: Intel CPU
SUCCESS: 64-bit CPU (lm)
SUCCESS: Streaming SIMD Extensions v4.1 (sse4_1)
SUCCESS: Virtualisation support (vmx)
SUCCESS: Kernel module kvm
SUCCESS: Kernel module kvm_intel
SUCCESS: Nested KVM support

I can see a qemu-lite process in the ps -e However, I don't see anything on the docker side. I tried to start the cc-proxy in debug mode and saw that it receives a hello but it never responds with a success.

What would be the best way to debug this? Here are more details about my environment

root@host-7: ~ # cat /etc/redhat-release 
CentOS Linux release 7.3.1611 (Core) 
root@host-7: ~ # docker version
Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:
 OS/Arch:      linux/amd64
root@host-7: ~ # cc-oci-runtime version
cc-oci-runtime version: 2.1.9
spec version: 1.0.0-rc1
commit: 2b6b95b10ee776047a5e3e9b967f6d8c05b8d209
@jodh-intel
Copy link
Contributor

Hi @syed - thanks for reporting this issue. A few questions:

  • Can I ask when you ran the ./installation/rhel-setup.sh script?

  • Can you start a container using a busybox docker image?

    $ sudo docker run -ti busybox sh
    
  • Can you create a docker container using the default docker runtime (runc) like this:

    $ sudo docker run -ti --runtime runc busybox sh
    

A few tests to try:

  • Config the proxy to auto-start on boot:
    $ sudo systemctl enable cc-proxy
    
  • Configure the runtime for debug by following https://github.com/01org/cc-oci-runtime/wiki/Debugging
    • In summary, for you this will be:
    $ sudo mkdir -p /etc/cc-oci-runtime/
    $ export logdir=/tmp
    $ cat << EOT | sudo tee -a /etc/cc-oci-runtime/cc-oci-runtime.sh.cfg
    --debug
    --global-log
    $logdir/cc-oci-runtime.log
    --hypervisor-log-dir
    $logdir
    EOT
    $ sudo sed -ie 's/\(cc-oci-runtime\)/\1.sh/g' /etc/systemd/system/docker.service.d/clr-containers.conf
    $ sudo systemctl daemon-reload
    $ sudo systemctl restart docker
    
    • Start a container:
      $ sudo docker run -ti busybox sh
      
    • In another terminal, look at the following files (and ideally attach to this issue if they are not empty):
      • /tmp/cc-oci-runtime.log
      • /tmp/*hypervisor.stout
      • /tmp/*hypervisor.stderr

@syed
Copy link
Author

syed commented Jun 6, 2017

Hi @jodh-intel,

  1. I ran the ./installation/rhel-setup.sh on a fresh install of centos7

  2. I am able to create a container using the runc runtime. However, I had to restart the docker deamon before doing so, as it was unresponsive after I had created a container using the cor runtime.

root@host-7: ~ # sudo docker run -ti --runtime runc busybox sh
/ # ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 sh
    7 root       0:00 ps aux
/ # 

I enabled logging as per your directions. The /tmp/<container-id>-hypervisor.std{out,err} are empty. I am attaching the /tmp/cc-oci-runtime.log

cc-oci-runtime.log.txt

I also started the proxy in debug mode and here are the logs from cc-proxy

I0606 12:05:04.812957    2649 proxy.go:67] [client #3] client connected
I0606 12:05:04.950262    2649 proxy.go:77] [client #3] hello(containerId=c0a1bccaf9530731a7a17837fbbbc50b71da3241e88c61210778c62ed7dfccfd,ctlSerial=/var/run/cc-oci-runtime/c0a1bccaf9530731a7a17837fbbbc50b71da3241e88c61210778c62ed7dfccfd/ga-ctl.sock,ioSerial=/var/run/cc-oci-runtime/c0a1bccaf9530731a7a17837fbbbc50b71da3241e88c61210778c62ed7dfccfd/ga-tty.sock,console=/var/run/cc-oci-runtime/c0a1bccaf9530731a7a17837fbbbc50b71da3241e88c61210778c62ed7dfccfd/console.sock)
I0606 12:07:04.797133    2649 proxy.go:67] [client #4] client connected
I0606 12:07:04.797405    2649 proxy.go:77] [client #4] attach(containerId=c0a1bccaf9530731a7a17837fbbbc50b71da3241e88c61210778c62ed7dfccfd)
I0606 12:07:04.797988    2649 proxy.go:77] [client #4] hyper(cmd=destroypod, data={})

@jodh-intel
Copy link
Contributor

jodh-intel commented Jun 6, 2017

Hi @syed - thanks for the debug info. The runtime log indicates a problem talking to the proxy. From cc-oci-runtime.log.txt:

writing message data to proxy socket: {"id":"hello","data":{"containerId":"c0a1bccaf9530731a7a17837fbbbc50b71da3241e88c61210778c62ed7dfccfd","ctlSerial":"/var/run/cc-oci-runtime/c0a1bccaf9530
731a7a17837fbbbc50b71da3241e88c61210778c62ed7dfccfd/ga-ctl.sock","ioSerial":"/var/run/cc-oci-runtime/c0a1bccaf9530731a7a17837fbbbc50b71da3241e88c61210778c62ed7dfccfd/ga-tty.sock","console":"/
var/run/cc-oci-runtime/c0a1bccaf9530731a7a17837fbbbc50b71da3241e88c61210778c62ed7dfccfd/console.sock"}}
failed to read proxy socket fd

Please could you run the following so we can get more detail from the proxy:

$ sudo systemctl stop cc-proxy.service
$ sudo systemctl stop cc-proxy.socket
$ script -cef 'sudo /usr/local/libexec/cc-proxy -v 4'

Then in another terminal:

  • sudo docker run -ti busybox sh

Finally, control-c the terminal running cc-proxy and attach the typescript file that gets generated.

@syed
Copy link
Author

syed commented Jun 6, 2017

Interesting. I am pretty sure the proxy is getting the message. It just doesn't respond with a success. Here are the logs

root@host-7: ~ # sudo /usr/local/libexec/cc-proxy -v 4
I0606 13:30:36.488874   31655 proxy.go:328] listening on /var/run/cc-oci-runtime/proxy.sock
I0606 13:30:36.488935   31655 proxy.go:372] proxy started
I0606 13:31:15.775085   31655 proxy.go:67] [client #1] client connected
I0606 13:31:15.919073   31655 proxy.go:77] [client #1] hello(containerId=3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d,ctlSerial=/var/run/cc-oci-runtime/3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d/ga-ctl.sock,ioSerial=/var/run/cc-oci-runtime/3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d/ga-tty.sock,console=/var/run/cc-oci-runtime/3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d/console.sock)
I0606 13:33:15.769474   31655 proxy.go:67] [client #2] client connected
I0606 13:33:15.769786   31655 proxy.go:77] [client #2] attach(containerId=3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d)
I0606 13:33:15.770384   31655 proxy.go:77] [client #2] hyper(cmd=destroypod, data={})

The corresponding logs in the cc-oci-runtime.log

2017-06-06T13:31:15.918448Z:31888:cc-oci-runtime:debug:communicating with proxy
2017-06-06T13:31:15.918550Z:31888:cc-oci-runtime:debug:sending message (length 447) to proxy socket
2017-06-06T13:31:15.918646Z:31888:cc-oci-runtime:debug:writing message data to proxy socket: {"id":"hello","data":{"containerId":"3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d","ctlSerial":"/var/run/cc-oci-runtime/3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d/ga-ctl.sock","ioSerial":"/var/run/cc-oci-runtime/3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d/ga-tty.sock","console":"/var/run/cc-oci-runtime/3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d/console.sock"}}
2017-06-06T13:33:15.755012Z:31895:cc-oci-runtime:critical:failed to read proxy socket fd
2017-06-06T13:33:15.763092Z:31973:cc-oci-runtime:debug:cc-oci-runtime 2.1.9 2b6b95b10ee776047a5e3e9b967f6d8c05b8d209 called as: /usr/local/bin/cc-oci-runtime delete 3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d
2017-06-06T13:33:15.768890Z:31973:cc-oci-runtime:debug:connecting to proxy cc-proxy
2017-06-06T13:33:15.769314Z:31973:cc-oci-runtime:debug:connected to proxy socket /var/run/cc-oci-runtime/proxy.sock
2017-06-06T13:33:15.769457Z:31973:cc-oci-runtime:debug:communicating with proxy
2017-06-06T13:33:15.769515Z:31973:cc-oci-runtime:debug:sending message (length 105) to proxy socket
2017-06-06T13:33:15.769545Z:31973:cc-oci-runtime:debug:writing message data to proxy socket: {"id":"attach","data":{"containerId":"3f38dc216b1edc2b02dfa02a32eb984de24a8deacb91b96e06a93d7dab3a7b1d"}}
2017-06-06T13:33:15.770017Z:31973:cc-oci-runtime:debug:proxy msg length: 37
2017-06-06T13:33:15.770077Z:31973:cc-oci-runtime:debug:message read from proxy socket: {"success":true,"data":{"version":1}}
2017-06-06T13:33:15.770194Z:31973:cc-oci-runtime:debug:msg received: {"success":true,"data":{"version":1}}
2017-06-06T13:33:15.770251Z:31973:cc-oci-runtime:debug:communicating with proxy
2017-06-06T13:33:15.770285Z:31973:cc-oci-runtime:debug:sending message (length 58) to proxy socket
2017-06-06T13:33:15.770311Z:31973:cc-oci-runtime:debug:writing message data to proxy socket: {"id":"hyper","data":{"hyperName":"destroypod","data":{}}}

From what I can understand, the cc-oci-runtime sends the hello message to the proxy. The proxy receives it but doesn't return a response. The cc-oci-runtime times out, sends a destroy message to the proxy to clean up.

@dlespiau
Copy link
Contributor

dlespiau commented Jun 6, 2017

After the runtime does a hello, the proxy is supposed to receive a message from hyperstart, the agent running inside the VM. That message is sent as soon as the VM has booted and hyperstart is started. Unfortunately it seems to me that qemu isn't properly booting.

Could you confirm /usr/local/bin/qemu-system-x86_64 is qemu lite? how did you install qemu-lite?

@jodh-intel
Copy link
Contributor

You can confirm your qemu has the correct support like this:

$  /usr/local/bin/qemu-system-x86_64 -machine help|grep ^pc-lite

It's a long-shot, but did you capture the output of ./installation/rhel-setup.sh (maybe by running it under script(1)?)

Also, are you running on bare metal, or in a virtualised environment? If virtualised, how much memory is available?

Another question: what do you get as output from these commands:

for module in vhost vhost_net; do modinfo $module; done

@jodh-intel
Copy link
Contributor

Please could you paste the output of:

ls -l /usr/share/clear-containers/

... and attach files:

  • /usr/local/share/defaults/cc-oci-runtime/hypervisor.args
  • /usr/local/share/defaults/cc-oci-runtime/vm.json

Also, do you see any errors that might be related in:

$ sudo journalctl -la

@syed
Copy link
Author

syed commented Jun 6, 2017

@dlespiau @jodh-intel Unfortunately I don't have the output of the installer script. I am running on a baremetal host with 2 sockets of Xeon L5520 giving me 16 cores and 24GB of RAM.

I checked for pc-lite support in qemu and it seems to be present.

root@host-7: ~ # qemu-system-x86_64 --version
QEMU emulator version 2.7.0, Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
root@host-7: ~ # /usr/local/bin/qemu-system-x86_64 -machine help|grep ^pc-lite
pc-lite              Light weight PC (alias of pc-lite-2.7)
pc-lite-2.7          Light weight PC
root@host-7: ~ # 

Here is the output for the vhost module info:

root@host-7: ~ # for module in vhost vhost_net; do modinfo $module; done
filename:       /lib/modules/3.10.0-514.21.1.el7.x86_64/kernel/drivers/vhost/vhost.ko
description:    Host kernel accelerator for virtio
author:         Michael S. Tsirkin
license:        GPL v2
version:        0.0.1
rhelversion:    7.3
srcversion:     3B40EDEB193740AC05D0BCC
depends:        
intree:         Y
vermagic:       3.10.0-514.21.1.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        8E:8A:72:DB:1A:FE:F8:5D:97:E6:C5:B2:28:DB:FC:11:CE:88:E8:8E
sig_hashalgo:   sha256
parm:           max_mem_regions:Maximum number of memory regions in memory map. (default: 64) (ushort)
filename:       /lib/modules/3.10.0-514.21.1.el7.x86_64/kernel/drivers/vhost/vhost_net.ko
alias:          devname:vhost-net
alias:          char-major-10-238
description:    Host kernel accelerator for virtio net
author:         Michael S. Tsirkin
license:        GPL v2
version:        0.0.1
rhelversion:    7.3
srcversion:     A34443770FABC9045984589
depends:        vhost,tun,macvtap
intree:         Y
vermagic:       3.10.0-514.21.1.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        8E:8A:72:DB:1A:FE:F8:5D:97:E6:C5:B2:28:DB:FC:11:CE:88:E8:8E
sig_hashalgo:   sha256
parm:           experimental_zcopytx:Enable Zero Copy TX; 1 -Enable; 0 - Disable (int)
root@host-7: ~ # 

The /usr/share/clear-containers/ directory:

root@host-7: ~ # ls -l /usr/share/clear-containers/
total 249556
-rwxr-xr-x. 1 root root 235929600 Jun  6 13:31 clear-15550-containers.img
lrwxrwxrwx. 1 root root        26 Jun  6 02:57 clear-containers.img -> clear-15550-containers.img
-rwxr-xr-x. 1 root root  15715448 Jun  5 15:53 vmlinux-4.9.30-59.1.container
lrwxrwxrwx. 1 root root        29 Jun  6 02:57 vmlinux.container -> vmlinux-4.9.30-59.1.container
-rwxr-xr-x. 1 root root   3897344 Jun  5 15:53 vmlinuz-4.9.30-59.1.container
lrwxrwxrwx. 1 root root        29 Jun  6 02:57 vmlinuz.container -> vmlinuz-4.9.30-59.1.container
root@host-7: ~ # 

I've attached the hypervisor.args and vm.json files. And I don't see anything in journalctl that might point to this...

hypervisor.args.txt
vm.json.txt

@jodh-intel
Copy link
Contributor

Hi @syed - that all looks fine and your config files seem to be unmodified.

Could you try running this (assuming you only have a single qemu process running):

$ sudo cat /proc/$(pidof qemu-system-x86_64)/stack

@syed
Copy link
Author

syed commented Jun 6, 2017

@jodh-intel I do have a qemu process running.

root      2759  0.0  0.0      0     0 ?        S    03:54   0:00  \_ [vhost-2732]
root      2136  0.0  0.0 112648   948 pts/4    S+   15:24   0:00  |   \_ grep 2732
root      2732  103  0.0 2526360 23948 ?       Rsl  03:54 713:19 /usr/local//bin/qemu-system-x86_64 -name 9fbbc682b584 -machine pc-lite,accel=kvm,kernel_irqchip,nvdimm -device nvdimm,memdev=mem0,id=nv0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-15550-containers.img,size=235929600 -m 2G,slots=2,maxmem=3G -kernel /usr/share/clear-containers/vmlinux-4.9.30-59.1.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug init=/usr/lib/systemd/systemd systemd.unit=cc-agent.target iommu=off quiet systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket systemd.show_status=false cryptomgr.notests net.ifnames=0 ip=::::::deeb49fca99f::off:: -device virtio-9p-pci,fsdev=workload9p,mount_tag=rootfs -fsdev local,id=workload9p,path=/var/lib/docker/devicemapper/mnt/d2daa9c80e1c98bf6b67e8b2bdac7b692355b665aa9195ad8501e62d1e1799c5/rootfs,security_model=none -smp 2,sockets=1,cores=2,threads=1 -cpu host -rtc base=utc,driftfix=slew -no-user-config -nodefaults -global kvm-pit.lost_tick_policy=discard -device virtio-serial-pci,id=virtio-serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,path=/var/run/cc-oci-runtime/deeb49fca99fbc7bfc15451c67e2261afcb2abc39df9713d4128d3a4732dc7a8/console.sock,server,nowait,id=charconsole0,signal=off -chardev socket,id=procsock,path=/var/run/cc-oci-runtime/deeb49fca99fbc7bfc15451c67e2261afcb2abc39df9713d4128d3a4732dc7a8/process.sock,server,nowait -chardev socket,id=charch0,path=/var/run/cc-oci-runtime/deeb49fca99fbc7bfc15451c67e2261afcb2abc39df9713d4128d3a4732dc7a8/ga-ctl.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch1,path=/var/run/cc-oci-runtime/deeb49fca99fbc7bfc15451c67e2261afcb2abc39df9713d4128d3a4732dc7a8/ga-tty.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -uuid d791687a-0586-41fa-b2a4-9fbbc682b584 -qmp unix:/var/run/cc-oci-runtime/deeb49fca99fbc7bfc15451c67e2261afcb2abc39df9713d4128d3a4732dc7a8/hypervisor.sock,server,nowait -nographic -vga none -netdev tap,ifname=ceth0,script=no,downscript=no,id=ceth0,vhost=on -device driver=virtio-net-pci,netdev=ceth0,mac=02:42:ac:11:00:02

Strangely, the stack looks empty

root@host-7: ~ # cat /proc/2732/stack
[<ffffffffffffffff>] 0xffffffffffffffff

@devimc
Copy link
Contributor

devimc commented Jun 6, 2017

@syed are you using a custom linux kernel in the host?

@syed
Copy link
Author

syed commented Jun 6, 2017

@devimc AFAIK, no. I'm using the one which is shipped by default

root@host-7: ~ # uname -a
Linux host-7.maas 3.10.0-514.21.1.el7.x86_64 #1 SMP Thu May 25 17:04:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
root@host-7: ~ # 

@jodh-intel
Copy link
Contributor

@syed - Could you try disabling quiet from the kernel command-line like this:

$ sudo sed -i.orig -e 's/\(\<quiet\>\)/XXX\1/g' /usr/local/share/defaults/cc-oci-runtime/vm.json

Then, capture the debug proxy output again:

$ script -efc 'sudo /usr/local/libexec/cc-proxy -v 4'

And try creating a container once more. You should see more output from the proxy this time.

@syed
Copy link
Author

syed commented Jun 6, 2017

@jodh-intel ... I don't see any output on the proxy

root@host-7: ~ # sudo /usr/local/libexec/cc-proxy -v 4
I0606 17:06:33.258012    4664 proxy.go:328] listening on /var/run/cc-oci-runtime/proxy.sock
I0606 17:06:33.258145    4664 proxy.go:372] proxy started
I0606 17:07:00.316121    4664 proxy.go:67] [client #1] client connected
I0606 17:07:00.471221    4664 proxy.go:77] [client #1] hello(containerId=571e5438e23a93754d4ee110ec188f7cf7236a36574174249a49bf071591ac03,ctlSerial=/var/run/cc-oci-runtime/571e5438e23a93754d4ee110ec188f7cf7236a36574174249a49bf071591ac03/ga-ctl.sock,ioSerial=/var/run/cc-oci-runtime/571e5438e23a93754d4ee110ec188f7cf7236a36574174249a49bf071591ac03/ga-tty.sock,console=/var/run/cc-oci-runtime/571e5438e23a93754d4ee110ec188f7cf7236a36574174249a49bf071591ac03/console.sock)

Here is the qemu process for the container

root      4862  101  0.0 2526360 23768 ?       Rsl  17:06   1:15              \_ /usr/local//bin/qemu-system-x86_64 -name 10b012aa394e -machine pc-lite,accel=kvm,kernel_irqchip,nvdimm -device nvdimm,memdev=mem0,id=nv0 -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-15550-containers.img,size=235929600 -m 2G,slots=2,maxmem=3G -kernel /usr/share/clear-containers/vmlinux-4.9.30-59.1.container -append root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k panic=1 console=hvc0 console=hvc1 initcall_debug init=/usr/lib/systemd/systemd systemd.unit=cc-agent.target iommu=off XXXquiet systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket systemd.show_status=false cryptomgr.notests net.ifnames=0 ip=::::::571e5438e23a::off:: -device virtio-9p-pci,fsdev=workload9p,mount_tag=rootfs -fsdev local,id=workload9p,path=/var/lib/docker/devicemapper/mnt/0dd62a5ce1271f7762f57cfd4300f25051f740944f75b8820ca965cc2f2f3954/rootfs,security_model=none -smp 2,sockets=1,cores=2,threads=1 -cpu host -rtc base=utc,driftfix=slew -no-user-config -nodefaults -global kvm-pit.lost_tick_policy=discard -device virtio-serial-pci,id=virtio-serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,path=/var/run/cc-oci-runtime/571e5438e23a93754d4ee110ec188f7cf7236a36574174249a49bf071591ac03/console.sock,server,nowait,id=charconsole0,signal=off -chardev socket,id=procsock,path=/var/run/cc-oci-runtime/571e5438e23a93754d4ee110ec188f7cf7236a36574174249a49bf071591ac03/process.sock,server,nowait -chardev socket,id=charch0,path=/var/run/cc-oci-runtime/571e5438e23a93754d4ee110ec188f7cf7236a36574174249a49bf071591ac03/ga-ctl.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charch0,id=channel0,name=sh.hyper.channel.0 -chardev socket,id=charch1,path=/var/run/cc-oci-runtime/571e5438e23a93754d4ee110ec188f7cf7236a36574174249a49bf071591ac03/ga-tty.sock,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charch1,id=channel1,name=sh.hyper.channel.1 -uuid b6c26f36-1ffd-4a1f-901a-10b012aa394e -qmp unix:/var/run/cc-oci-runtime/571e5438e23a93754d4ee110ec188f7cf7236a36574174249a49bf071591ac03/hypervisor.sock,server,nowait -nographic -vga none -netdev tap,ifname=ceth0,script=no,downscript=no,id=ceth0,vhost=on -device driver=virtio-net-pci,netdev=ceth0,mac=02:42:ac:11:00:02

@eadamsintel
Copy link

@syed I ran into the same problem. I noticed that my /etc/systemd/system/docker.service.d/clr-containers.conf file had two "/" in the path to the cc-oci-runtime binary. When I removed the extra / the cor runtime functioned properly for me. It should be "cor=/usr/local/bin/cc-oci-runtime" Can you check your conf file to see if the path to the cor binary is correct?

@syed
Copy link
Author

syed commented Jun 6, 2017

@eadamsintel I removed the extra / from /etc/systemd/system/docker.service.d/clr-containers.conf It now looks like this:

root@host-7: ~ # cat /etc/systemd/system/docker.service.d/clr-containers.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -D --add-runtime cor=/usr/local/bin/cc-oci-runtime.sh --default-runtime=cor
root@host-7: ~ # 

I restarted Docker, I even rebooted the host after this but the Docker daemon just hangs if I start a container.

@syed
Copy link
Author

syed commented Jun 6, 2017

@jodh-intel is there something that I can use to check if the qemu is working correctly? Can I start a simple VM from command line using the clear container kernel?

@syed
Copy link
Author

syed commented Jun 6, 2017

At this point, I have

/usr/local/bin/qemu-system-x86_64 -name syed-test \
        -machine pc-lite,accel=kvm,kernel_irqchip,nvdimm \
        -object memory-backend-file,id=mem0,mem-path=/usr/share/clear-containers/clear-15550-containers.img,size=235929600 \
        -kernel /usr/share/clear-containers/vmlinux-4.9.30-59.1.container \
        -chardev stdio,id=stdio,mux=on \
        -device virtio-serial-pci \
        -device virtconsole,chardev=stdio \
        -mon chardev=stdio \
        -display none \
        -append 'console=hvc0'\

When I run this, I get no output at all. I also see that the qemu process is taking 100% CPU (all other qemu processes started from docker run are also taking 100% CPU)

@syed
Copy link
Author

syed commented Jun 6, 2017

The qemu-system-x86_64 was built from source

root@host-7: ~ # md5sum /root/go/src/github/01org/cc-oci-runtime/dependencies/qemu-lite-741f430a960b5b67745670e8270db91aeb083c5f/x86_64-softmmu/qemu-system-x86_64
3a0f06e67970d24143271744be5c43c0  /root/go/src/github/01org/cc-oci-runtime/dependencies/qemu-lite-741f430a960b5b67745670e8270db91aeb083c5f/x86_64-softmmu/qemu-system-x86_64
root@host-7: ~ # md5sum /usr/local/bin/qemu-system-x86_64
3a0f06e67970d24143271744be5c43c0  /usr/local/bin/qemu-system-x86_64
root@host-7: ~ # 

@syed
Copy link
Author

syed commented Jun 6, 2017

I repeated the installation steps on Ubuntu 16.04 and I still ran into this issue.

@eadamsintel
Copy link

I ran your QEMU command and it ran at 100% CPU for me as well. The MD5 of my qemu was bb7fb35aaa35990f686053629a24fcb9 for reference and it came from running the rhel script.

I set the / back in my script and restarted docker and had the containerd issue come backup. When I removed it again and restarted Docker I continued to have the containerd issue. I killed the cc-oci-runtime process, restarted docker again, and enabled the cc-proxy and continued to have issues. I rebooted and that fixed it up. There might be some caching of that configuration file or I might have had something else going on that is different from you. My issue was with not being able to connect to containerd and qemu never spawned a process when trying to do a docker run.

@syed
Copy link
Author

syed commented Jun 7, 2017

@eadamsintel @jodh-intel it looks like qemu-system-x86 is stuck in an infinite loop consuming 100% CPU. Doing an strace on the qemu process shows that it is doing a lot of read and futex calls

root@HOST-7: ~ # strace -p 28299 -c
Process 28299 attached
^CProcess 28299 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 89.48    2.011794         422      4770           read
  5.50    0.123613           6     22029      2306 futex
  1.22    0.027533          17      1590           ppoll
  1.13    0.025498          11      2386           ioctl
  0.63    0.014217           4      3180           open
  0.59    0.013362           2      7701           tgkill
  0.56    0.012517           4      3180           munmap
  0.30    0.006680           2      3180           mmap
  0.26    0.005908           2      3180           close
  0.18    0.003991           1      3180           fstat
  0.14    0.003186           1      3180           lseek
------ ----------- ----------- --------- --------- ----------------
100.00    2.248299                 57556      2306 total
root@HOST-7: ~ # 

I am attaching the repeating pattern that I found with strace. From what I can tell, it looks like qemu opens the /usr/share/clear-containers/vmlinux-4.9.30-59.1.container file, does a bunch of ioctl calls and fails to acquire a mutex. I am not sure exactly what is failing here, maybe you guys can help me decipher this.

qemu_loop.txt

@jodh-intel
Copy link
Contributor

Hi @syed - Can you clarify your comments on Ubuntu - are you saying you've followed the doc below and installed Ubuntu + Clear Containers on the same Xeon system you used for CentOS and you see (exactly?) the same behaviour with qemu spinning at 100% CPU?

If so, this sounds like a hardware issue. Do you have access to another physical system to try and rule that out?

@anthonyzxu - do you have any thoughts on this?

@syed
Copy link
Author

syed commented Jun 7, 2017

@jodh-intel I the setup on a VM running inside a KVM host and it is working there. I don't have another physical box at the moment but I will try to free one up and test it on that. If it works, it looks like there is some incompatibility with the processor and Qemu.

@jodh-intel
Copy link
Contributor

Hi @syed - unfortunately, its looking like your Xeon isn't new enough. To prove that, can you run the following:

$ egrep -q "\<aes\>" /proc/cpuinfo && echo success || echo fail

@syed
Copy link
Author

syed commented Jun 7, 2017

@jodh-intel You are absolutely right. Looks like the CPU doesn't have the aes flag

root@host-7: ~ # egrep -q "\<aes\>" /proc/cpuinfo && echo success || echo fail
fail
root@host-7: ~ #

@jodh-intel
Copy link
Contributor

Hi @syed - I'm sorry to hear that, but atleast we eventually got to the bottom of the problem :)

We'll be updating the check-config script [1] for the missing aes check in the next day or so to stop other users hitting this issue. Further, we are looking at extra checks to avoid the need for [1].

Thanks for your patience and hope you are able to source another system to allow you to have the proper Clear Containers experience.


[1] - https://download.clearlinux.org/current/clear-linux-check-config.sh

@syed
Copy link
Author

syed commented Jun 8, 2017

Thanks @jodh-intel and everyone for promptly answering my queries. It's always brings me happiness to see the community willing to help someone who's starting out. Nice to see a healthy project 👍

I'll go ahead and close this issue for now. Thanks again.

@syed syed closed this as completed Jun 8, 2017
@jodh-intel
Copy link
Contributor

Hi @syed - would you mind checking a couple of things that might give us a more precise answer as to why your system can't run Clear Containers?:

$ cat /proc/cpuinfo|awk 'BEGIN { RS = "" ; } {printf ("%s\n", $0); exit(0);}'
$ cat /sys/module/kvm_intel/parameters/unrestricted_guest

Also, just to let you know that the script below has now been updated to perform the extra check (and hopefully save other users from having the bad experience you've had):

@jodh-intel jodh-intel reopened this Jun 8, 2017
@syed
Copy link
Author

syed commented Jun 8, 2017

@jodh-intel Absolutely. Here is the output:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           L5520  @ 2.27GHz
stepping	: 5
microcode	: 0x19
cpu MHz		: 2266.871
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips	: 4533.74
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:
root@host-7: ~ # cat /sys/module/kvm_intel/parameters/unrestricted_guest
N

@jodh-intel
Copy link
Contributor

Hi @syed - Thanks very much - this is useful information. The actual limitation of your system is the unrestricted_guest value I believe (I'll ensure we update the check-config script to include this...).

@jodh-intel
Copy link
Contributor

The check script below has now been updated to check for unrestricted_guest:

Example run on a system that supports Clear Containers:

$ ./clear-linux-check-config.sh container
Checking if host is capable of running Clear Linux* OS for Intel® Architecture in a container

SUCCESS: Intel CPU
SUCCESS: 64-bit CPU (lm)
SUCCESS: Streaming SIMD Extensions v4.1 (sse4_1)
SUCCESS: Virtualisation support (vmx)
SUCCESS: Kernel module kvm
SUCCESS: Kernel module kvm_intel
SUCCESS: Nested KVM support
SUCCESS: Unrestricted guest KVM support
SUCCESS: Kernel module vhost
SUCCESS: Kernel module vhost_net

@jodh-intel
Copy link
Contributor

For what it's worth, the new runtime (still in alpha) now has a built-in command called cc-check to determine if the system is capable of running a Clear Container. See:

@dato
Copy link

dato commented Sep 17, 2017

Hi—

Sorry to barge into this closed issue. I followed all the debugging steps that @jodh-intel provided, which were superb, and everything looks like it should work.

But I still get 100% CPU usage from qemu-lite.

I tried both the old and new checking scripts¹, and both look fine. I’m attaching a bunch of other output files.

Many thanks for your help in advance.

$ ./clear-linux-check-config.sh container
Checking if host is capable of running Clear Linux* OS for Intel® Architecture in a container

SUCCESS: Intel CPU
SUCCESS: 64-bit CPU (lm)
SUCCESS: Streaming SIMD Extensions v4.1 (sse4_1)
SUCCESS: Virtualisation support (vmx)
SUCCESS: Kernel module kvm
SUCCESS: Kernel module kvm_intel
SUCCESS: Nested KVM support
SUCCESS: Unrestricted guest KVM support
SUCCESS: Kernel module vhost
SUCCESS: Kernel module vhost_net

$ ./cc-check¹
Found CPU attribute "Intel Architecture CPU" (GenuineIntel)
Found CPU flag "Virtualization support" (vmx)
Found CPU flag "64Bit CPU" (lm)
Found CPU flag "SSE4.1" (sse4_1)
Found kernel module "Kernel-based Virtual Machine" (kvm)
Found kernel module "Intel KVM" (kvm_intel)
Kernel module "Intel KVM" parameter "unrestricted_guest" has correct value
Kernel module "Intel KVM" parameter "nested" has correct value
Found kernel module "Host kernel accelerator for virtio" (vhost)
Found kernel module "Host kernel accelerator for virtio network" (vhost_net)

(¹) I only compiled the cc-check.go and utils.go files, as I would prefer
getting CC-2.1 to work, rather than trying to compile the whole of 3.0.

Attached files:

@dato
Copy link

dato commented Sep 23, 2017

Hi again—

I didn’t know 3.0 was so closed to being released!

I’ve now installed the new runtime, and it works perfectly.

Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants