Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

qemu: refactor maximum vcpus supported in aarch64 #585

Merged
merged 2 commits into from
Sep 6, 2018

Conversation

Pennyzct
Copy link
Contributor

on aarch64, we support different gic interrupt controllers.
The maximum number of vCPUs depends on the GIC version, or on how many redistributors we can fit into the memory map.

Fixes: #584

Signed-off-by: Penny Zheng [email protected]
Signed-off-by: Wei Chen [email protected]

@opendev-zuul
Copy link

opendev-zuul bot commented Aug 15, 2018

Build failed (third-party-check pipeline) integration testing with
OpenStack. For information on how to proceed, see
http://docs.openstack.org/infra/manual/developers.html#automated-testing

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!
Only a few comments.
And also please extend the unit testing for the function MaxQemuVCPUs().

// MaxQemuVCPUs returns the maximum number of vCPUs supported
func MaxQemuVCPUs() uint32 {
bytes, err := ioutil.ReadFile(interruptFile)
if err != nil {
qemuArmLogger().WithError(err).Error("Failed to read /proc/interrrupts")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/interrrupts/interrupts/ You've added one extra r ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the variable interruptFile here instead?

qemuArmLogger().WithError(err).Error("Failed to read /proc/interrrupts")
}
for gicType, vCPUs := range gicList {
pattern := regexp.MustCompile(`\b` + gicType + `\b`)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex seems a bit too much here, maybe something like:

if strings.Contains(string(bytes), gicType) {
    return vCPUs
}

might be simpler and cost less to process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that's better, updated asap.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I added support for arm64 I was unable to start a VM with maxcpus > actual number of physical CPUs. I think it's a limitation of KVM, having said that, I believe that this PR will fail on systems with gicV3 and actual number of physical CPUs < 123. @Pennyzct let me know when this PR is ready, I'd like to take a look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devimc I have updated the PR. ptal.🙂

@amshinde
Copy link
Member

@Pennyzct Looks good. Please add a unit test as well. You can override interruptFile in the tests.

@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 167392 KB
Proxy: 4173 KB
Shim: 9067 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003572 KB

@opendev-zuul
Copy link

opendev-zuul bot commented Aug 17, 2018

Build failed (third-party-check pipeline) integration testing with
OpenStack. For information on how to proceed, see
http://docs.openstack.org/infra/manual/developers.html#automated-testing

@codecov
Copy link

codecov bot commented Aug 17, 2018

Codecov Report

Merging #585 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #585   +/-   ##
=======================================
  Coverage   65.34%   65.34%           
=======================================
  Files          85       85           
  Lines        9846     9846           
=======================================
  Hits         6434     6434           
  Misses       2756     2756           
  Partials      656      656

@Pennyzct
Copy link
Contributor Author

Hi~ @amshinde @sboeuf I have updated the PR, ptal.😁

Copy link
Member

@bergwolf bergwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one comment about the source of the GIC version vs. vcpu number mapping.

//on aarch64, we support different gic interrupt controllers
//maximum number of vCPUs depends on the GIC version, or on how
//many redistributors we can fit into the memory map.
var gicList = map[string]uint32{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are plain magic mapping. Is there any documentation that we can refer to in the public domain? If so, please add it to the above comments.

Copy link
Contributor Author

@Pennyzct Pennyzct Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was deducing it from a few codes in github.com/qemu/qemu/hw/arm/virt.c. Related lines are line L135 and line L1306. so maybe we can refer this file in comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.. That is really dark magic. Please refer to these places so that we know where to look at if they need change in future.

And qemu doesn't seem to have gic version 4 ( virt_get_gic_version ). Is it added to be future proof?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf Yes, currently, the QEMU doesn't support GICv4, but we can use gic-version=3 for QEMU on host with GICv4. Just like we have done in runv for kvmtool. Because in some ways, GICv3 and GICv4 are the same. I also raised an issue about gic-version parameters of QEMU #614

}
for gicType, vCPUs := range gicList {
if strings.Contains(string(bytes), gicType) {
return vCPUs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR in a system with GICv3 and 96 cores. This function returns 123, unfortunately it won't be honoured since the maximum number of vCPUs can't be exceeded

runtime/cli/config.go

Lines 215 to 232 in 14bcd69

func (h hypervisor) defaultMaxVCPUs() uint32 {
numcpus := uint32(goruntime.NumCPU())
maxvcpus := vc.MaxQemuVCPUs()
reqVCPUs := h.DefaultMaxVCPUs
//don't exceed the number of physical CPUs. If a default is not provided, use the
// numbers of physical CPUs
if reqVCPUs >= numcpus || reqVCPUs == 0 {
reqVCPUs = numcpus
}
// Don't exceed the maximum number of vCPUs supported by hypervisor
if reqVCPUs > maxvcpus {
return maxvcpus
}
return reqVCPUs
}

My concern with this patch is that the actual number of physical cores will be exceeded and the memory footprint will be big (again) see 07db945

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pennyzct have you tested this change in a system with GIC >= 3 and physical CPUs > 123 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi~ @devimc That's how I found the problem and pulled this pr😊. I was installing and running kata in new ThunderX II which contains 224 physical cpu cores.

:~# lscpu
Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                224
On-line CPU(s) list:   0-223
Thread(s) per core:    4
Core(s) per socket:    28
Socket(s):             2
NUMA node(s):          2
NUMA node0 CPU(s):     0-111
NUMA node1 CPU(s):     112-223

Err occurred and outputs were as follows:
docker: Error response from daemon: OCI runtime create failed: qemu-system-aarch64: Number of SMP CPUs requested (224) exceeds max CPUs supported by machine 'mach-virt' (123): unknown
when applying my patch, vCPUs will be reduced into 123 in func defaultMaxVCPUs as you mentioned above and kata will run successfully.

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Aug 21, 2018

Based on upstream discussion, wei @Weichen81 and I issued #614 and #615 to discuss the missing gicv4 scenario and max vCPU varying according to qemu version. This pr may need them landed firstly. 😊

@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 169588 KB
Proxy: 4094 KB
Shim: 8768 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003728 KB

@opendev-zuul
Copy link

opendev-zuul bot commented Aug 27, 2018

Build failed (third-party-check pipeline) integration testing with
OpenStack. For information on how to proceed, see
http://docs.openstack.org/infra/manual/developers.html#automated-testing

@Pennyzct
Copy link
Contributor Author

updated after #616 got merged. ptal. @bergwolf @devimc @amshinde @sboeuf 😊

return uint32(runtime.NumCPU())
if hostGICVersion != 0 {
return gicList[hostGICVersion]
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else is not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update asap😃


assert.Equal(d.expectedResult, vCPUs)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update asap😊.

@jodh-intel
Copy link
Contributor

Hi @Pennyzct - once you've addressed @devimc's comment I think we can get this merged! 😄

@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 165727 KB
Proxy: 4089 KB
Shim: 9101 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003736 KB

@opendev-zuul
Copy link

opendev-zuul bot commented Aug 29, 2018

Build failed (third-party-check pipeline) integration testing with
OpenStack. For information on how to proceed, see
http://docs.openstack.org/infra/manual/developers.html#automated-testing

@Pennyzct
Copy link
Contributor Author

updated. ptal. @devimc @jodh-intel 😊

@jodh-intel
Copy link
Contributor

jodh-intel commented Aug 29, 2018

lgtm

Approved with PullApprove

@jodh-intel
Copy link
Contributor

Nasty 16.04 CI failure which seems to be hotplug-related. Using a Clear Linux image (kernel vmlinuz-4.14.51.10-135.container):

[    4.310460] CPU1 has been hot-added
[    4.311993] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    4.315868] CPU2 has been hot-added
[    4.321958] sd 0:0:0:1: [sdb] Attached SCSI disk
[    4.342716] smpboot: Booting Node 0 Processor 1 APIC 0x3
[    4.349186] smpboot: CPU 1 Converting physical 3 to logical package 1
[    4.356619] KVM setup async PF for cpu 1
[    4.357332] kvm-stealtime: cpu 1, msr 7fc94e84041d0c3594a9633180a185e27cb867d-7b0fc6496de6722f-hostname\\\"
.kubernetes.cri-o.SandboxID\\\"
[    4.361521] Will online and init hotplugged CPU: 1
[    4.368734] smpboot: Booting Node 0 Processor 2 APIC 0x2
[    1.595849] kvm-clock: cpu 2, msr 0:7ffdb081, secondary cpu clock
[    4.412607] KVM setup async PF for cpu 2
[    4.413179] kvm-stealtime: cpu 2, msr 7fd14e80
[    4.417536] Will online and init hotplugged CPU: 2
[    4.432537] run queue from wrong CPU 1, hctx active
[    4.433525] CPU: 1 PID: 148 Comm: kworker/1:0H Not tainted 4.14.51-135.container #1
[    4.434480] Workqueue: kblockd blk_mq_run_work_fn
[    4.434480] Call Trace:
[    4.434480]  dump_stack+0x5c/0x81
[    4.434480]  __blk_mq_run_hw_queue+0xc5/0xd0
[    4.434480]  process_one_work+0x110/0x320
[    4.434480]  worker_thread+0x42/0x430
[    4.434480]  kthread+0xf2/0x130
[    4.434480]  ? process_one_work+0x320/0x320
[    4.434480]  ? kthread_create_on_node+0x40/0x40
[    4.434480]  ret_from_fork+0x35/0x40

Note: There's a bit of corruption on the kvm-stealtime line - I think it should say:

kvm-stealtime: cpu 1, msr 7fc94e80

/cc @devimc, @grahamwhaley.

@jodh-intel
Copy link
Contributor

lgtm

@jcvenegas
Copy link
Member

@jodh-intel about the kernel trace. Considering that happen when CPU hotplug is happening and taking a look to the code. Seems that fits in the first case that is documented, so I think that is expected(?).

@mcastelino @devimc @grahamwhaley

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/block/blk-mq.c?h=v4.14.67#n1170

	 * We should be running this queue from one of the CPUs that
	 * are mapped to it.
	 *
	 * There are at least two related races now between setting
	 * hctx->next_cpu from blk_mq_hctx_next_cpu() and running
	 * __blk_mq_run_hw_queue():
	 *
	 * - hctx->next_cpu is found offline in blk_mq_hctx_next_cpu(),
	 *   but later it becomes online, then this warning is harmless
	 *   at all
	 *

@jodh-intel
Copy link
Contributor

@jcvenegas - good find! It seems like the kernel is dumping debug info to help the devs debug the races referred to there but atleast it's not a "real" crash :)

@opendev-zuul
Copy link

opendev-zuul bot commented Sep 3, 2018

Build succeeded (third-party-check pipeline).

on aarch64, we support different gic interrupt controllers.
The maximum number of vCPUs depends on the GIC version, or on how
many redistributors we can fit into the memory map.

Fixes: kata-containers#584

Signed-off-by: Penny Zheng <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
we should add unit test for func MaxQemuVCPUS in qemu_amd64_test.go

Signed-off-by: Penny Zheng <[email protected]>
Signed-off-by: Wei Chen <[email protected]>
@Pennyzct
Copy link
Contributor Author

Pennyzct commented Sep 4, 2018

just re-based it on the latest master branch. 😊. @jodh-intel

@jodh-intel
Copy link
Contributor

Thanks @Pennyzct! Still...

lgtm

Let's wait and see what @devimc says later today... 😄

@katacontainersbot
Copy link
Contributor

PSS Measurement:
Qemu: 158035 KB
Proxy: 4204 KB
Shim: 8735 KB

Memory inside container:
Total Memory: 2043464 KB
Free Memory: 2003712 KB

@opendev-zuul
Copy link

opendev-zuul bot commented Sep 4, 2018

Build failed (third-party-check pipeline) integration testing with
OpenStack. For information on how to proceed, see
http://docs.openstack.org/infra/manual/developers.html#automated-testing

@jcvenegas
Copy link
Member

@devimc could you give it a last review to this PR?

Copy link

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@devimc devimc merged commit 2f7a60a into kata-containers:master Sep 6, 2018
@sboeuf sboeuf added the enhancement Improvement to an existing feature label Sep 12, 2018
egernst pushed a commit to egernst/runtime that referenced this pull request Feb 9, 2021
agent: sandbox_pause should not take arguments
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants