Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

runtime: support memory hotplug via probe interface on aarch64 #1152

Merged
merged 4 commits into from
Apr 8, 2019

Conversation

Pennyzct
Copy link
Contributor

@Pennyzct Pennyzct commented Jan 18, 2019

Description of problem

There are 2 phases in Memory Hotplug. 1) Physical Memory Hotplug phase. 2) Logical Memory Hotplug phase.
The First phase is to communicate hardware/firmware and make/erase environment for hotplugged memory.
If firmware supports notification of connection of new memory to OS, this phase is triggered automatically. ACPI can notify this event. This also what kata supports, memory hotplug via acpi-driven.
But if not, there is another option, probe operation by hand.
And since memory hotplug via acpi is missing on qemu-system-aarch64, we hope to support the other probe solution.
the whole detailed info about probe interface could be found here.
This whole implementation of memory hotplug via probe interface is divided into two phases, the first is to check if guest kernel is capable of memory hot-add via probe interface, which is located at /sys/devices/system/memory/probe.
The second is let kata-agent tell guest kernel the physical address of new memory block.
echo start_address_of_new_memory > /sys/devices/system/memory/probe

Expected result

root@entos-thunderx2-desktop:~# docker run --runtime kata-runtime -it --rm -m 3G ubuntu
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
root@9ebdb553ba9b:/# lsmem
RANGE                                 SIZE  STATE REMOVABLE     BLOCK
0x0000000040000000-0x00000000bfffffff   2G online        no       1-2
0x0000020000000000-0x00000200bfffffff   3G online        no 2048-2050

Memory block size:         1G
Total online memory:       5G
Total offline memory:      0B
root@9ebdb553ba9b:/# ls /sys/devices/system/memory/
auto_online_blocks  block_size_bytes  memory1  memory2  memory2048  memory2049  memory2050  power  probe  uevent

@raravena80
Copy link
Member

@Pennyzct ping, any updates? Thx. Is this waiting for review?

@ttx
Copy link
Member

ttx commented Feb 15, 2019

/zuul-recheck

func (k *kataAgent) memHotplugByProbe(addr uint64, sizeMB uint32, memorySectionSizeMB uint32) error {
// hot-added memory device should be sliced into the size of memory section, which is the basic unit for
// memory hotplug
numSection := uint64(sizeMB / memorySectionSizeMB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about returning an error if memorySectionSizeMB is zero? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes,yes,yes.;) definitely need to avoid divided by zero, update asap. ;)

@grahamwhaley
Copy link
Contributor

Hi @Pennyzct - is this PR still alive/valid ?

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Mar 6, 2019

Hi~ @grahamwhaley yes,yes,yes. So sorry for the long-time hanging up, I constantly got sidetracked.😭. since related agent/#443 commit has already been merged, i will dep ensure to include the change. update asap. ;)

@Pennyzct Pennyzct force-pushed the memory_hotplug branch 2 times, most recently from 9ae365b to b8621a8 Compare March 6, 2019 09:21
@Pennyzct
Copy link
Contributor Author

Pennyzct commented Mar 6, 2019

Hi~ all. When I dep ensure -update github.com/kata-conatiners/agent, do i need to git add the files under vendor/? Since Gopkg.lock and Gopkg.toml is not enough, the travis ci is still failing.

@jodh-intel
Copy link
Contributor

Hi @Pennyzct - yes, you need to git add vendor/ after a dep ensure. I'll update the contributor guide as that isn't currently specified...

@jodh-intel
Copy link
Contributor

PR raised - kata-containers/community#89.

@umarcor
Copy link

umarcor commented Mar 6, 2019

@Pennyzct, @jodh-intel just because of curiosity, why are you using dep instead of go mod?

@grahamwhaley
Copy link
Contributor

@umarcor - at this point, it may be mostly due to historical reasons - the kata, and forerunner, code bases have been around a few years now... I think go mod came in with golang 1.11 ?, and we still have our baseline golang supported version as 1.10.4 :-)

Now, if you'd like to propose a move to go mod, and submit a PR ;-) But, first probably best if it was discussed. /cc @sboeuf

@jodh-intel
Copy link
Contributor

Yes - at the time we introduced it, dep was the best tool available.

@sboeuf
Copy link

sboeuf commented Mar 6, 2019

@umarcor @grahamwhaley @jodh-intel

I recently discussed this with @mcastelino and I think it is time for us to move to go modules. We need to open issues on every Kata repositories for this. And then let's start submit PRs.

@umarcor is it something you're interested in helping with?

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Pennyzct !

lgtm

Ping @kata-containers/runtime.

@Pennyzct
Copy link
Contributor Author

/test

@jodh-intel
Copy link
Contributor

Hi @Pennyzct - the CI's are failing on this:

virtcontainers/kata_agent_test.go:262:32: cannot use g (type *gRPCProxy) as type "github.com/kata-containers/runtime/vendor/github.com/kata-containers/agent/protocols/grpc".AgentServiceServer in argument to "github.com/kata-containers/runtime/vendor/github.com/kata-containers/agent/protocols/grpc".RegisterAgentServiceServer:
	*gRPCProxy does not implement "github.com/kata-containers/runtime/vendor/github.com/kata-containers/agent/protocols/grpc".AgentServiceServer (missing MemHotplugByProbe method)
FAIL	github.com/kata-containers/runtime/virtcontainers [build failed]

It looks like you need to add a dummy implementation of MemHotplugByProbe() to virtcontainers/kata_agent_test.go.

@Pennyzct
Copy link
Contributor Author

/test

@jodh-intel
Copy link
Contributor

/retest

@Pennyzct
Copy link
Contributor Author

02:33:23 time="2019-03-13T18:33:23Z" level=warning msg="Failed Maxval (59001.600000 < 66545.530000) for [memory-footprint-ksm]"
02:33:23 time="2019-03-13T18:33:23Z" level=warning msg="Check for [memory-footprint-ksm] failed [Failed]"
02:33:23 time="2019-03-13T18:33:23Z" level=warning msg=" with [[*F* memory-footprint-ksm 95.0% 118.4% 105.0% 10.0% 118.4% 118.4% 0.0% 0.0% 1]]"
02:33:23 time="2019-03-13T18:33:23Z" level=warning msg="Overall we failed"
02:33:23 
02:33:23 Report Summary:
02:33:23 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
02:33:23 | P/F |         NAME         |  FLR  |  MEAN  |  CEIL  |  GAP  |  MIN   |  MAX   |  RNG  | COV  | ITS |
02:33:23 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
02:33:23 | P   | boot-times           | 95.0% | 100.8% | 105.0% | 10.0% | 95.3%  | 109.4% | 14.8% | 3.2% |  20 |
02:33:23 | P   | memory-footprint     | 95.0% | 100.3% | 105.0% | 10.0% | 100.3% | 100.3% | 0.0%  | 0.0% |   1 |
02:33:23 | *F* | memory-footprint-ksm | 95.0% | 118.4% | 105.0% | 10.0% | 118.4% | 118.4% | 0.0%  | 0.0% |   1 |
02:33:23 +-----+----------------------+-------+--------+--------+-------+--------+--------+-------+------+-----+
02:33:23 Fails: 1, Passes 2
02:33:23 Failed
02:33:23 checkmetrics FAILED (1)
02:33:23 restoring KSM settings

Hi~ @grahamwhaley @jodh-intel metrix ci failed. Do we need to re-trigger or just skip it?

@jodh-intel
Copy link
Contributor

Hi @Pennyzct - This PR is possibly being affected by the need to "reset" the metrics thresholds. If so, I think we can merge. Hoever, I've re-triggered to see how that compares to the last run...

/cc @chavafg.

@raravena80
Copy link
Member

Hi @Pennyzct any updates? Thx!

This was referenced Mar 26, 2019
In order to support memory hotplug via probe interface in kata-runtime,
firstly, we need to verify whether guest kernel is capable of that.

Fixes: kata-containers#1149

Signed-off-by: Penny Zheng <[email protected]>
If kata-runtime supports memory hotplug via probe interface, we need to reconstruct
memoryDevice to store relevant status, which are addr and probe. addr specifies the
physical address of the memory device, and probe determines it is hotplugged via
acpi-driven or probe interface.

Fixes: kata-containers#1149

Signed-off-by: Penny Zheng <[email protected]>
we need to notify guest kernel about memory hot-added event via probe interface.
hot-added memory deivce should be sliced into the size of memory section.

Fixes: kata-containers#1149

Signed-off-by: Penny Zheng <[email protected]>
We need to change the constraints of kata agent into the memory-hotplug
related commit, to keep track of the revisons in kata agent.

Fixes: kata-containers#1149

Signed-off-by: Penny Zheng <[email protected]>
@Pennyzct
Copy link
Contributor Author

Pennyzct commented Apr 4, 2019

/test

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Apr 4, 2019

Hi~ guys~ @kata-containers/runtime Do you remember this one? (I indeed totally forget it. 😓) I have already re-based. pls review~thanks. ;)

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Apr 8, 2019

Hi~ @grahamwhaley @jodh-intel @devimc vsock CI is failing.

Running command '/usr/local/bin/kata-runtime [kata-runtime --log=/tmp/bundle160734754/log run --bundle=/tmp/bundle160734754 --console= --pid-file=/tmp/bundle160734754/pid --detach muijkDXHgC48j6oaHysi]'
command failed error 'exit status 1'
[kata-runtime --log=/tmp/bundle160734754/log run --bundle=/tmp/bundle160734754 --console= --pid-file=/tmp/bundle160734754/pid --detach muijkDXHgC48j6oaHysi]
Timeout: 120 seconds
Exit Code: 1
Stdout: 
Stderr: Failed to check if grpc server is working: context deadline exceeded

FWIT, context deadline exceeded is related to this issue #1319, I also addressed a comment there about same error on ARM machine.
I had no clue why initrd CI machine failed. I will let you guys to decide to re-trigger or what~ ;)

@grahamwhaley
Copy link
Contributor

@chavafg - that vsock issue is pretty old - did we fix/bypass it? If so, I wonder why we hit it again here?

@devimc
Copy link

devimc commented Apr 8, 2019

@Pennyzct that error is not related to this pr, I'll take a look later,

@devimc devimc merged commit 34e2064 into kata-containers:master Apr 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants