Skip to content
This repository has been archived by the owner on Jun 28, 2024. It is now read-only.

docker: Run tests in parallel #1257

Merged
merged 1 commit into from
Mar 6, 2019

Conversation

chavafg
Copy link
Contributor

@chavafg chavafg commented Feb 28, 2019

Most of our docker tests can be executed in parallel.
Use the ginkgo -p option to achieve this.
The ones that cannot be run in parallel are now tagged in
the test name as [Serial Test] to run them after we run
the tests that can be run in parallel.
This should reduce at least 10 minutes of CI time.

Fixes: #1256.

Signed-off-by: Salvador Fuentes [email protected]

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chavafg - nice find!!

Copy link
Contributor

@grahamwhaley grahamwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
🤞

@grahamwhaley
Copy link
Contributor

/test

@chavafg
Copy link
Contributor Author

chavafg commented Feb 28, 2019

/test

Makefile Outdated Show resolved Hide resolved
@chavafg
Copy link
Contributor Author

chavafg commented Feb 28, 2019

/test

@GabyCT
Copy link
Contributor

GabyCT commented Feb 28, 2019

👍

@chavafg
Copy link
Contributor Author

chavafg commented Feb 28, 2019

Got 1 failure on the fedora-vsocks job:

[Fail] Hot plug CPUs container with CPU constraint [It] should have 1 CPUs 

And lot of failures on the firecracker job.
Will investigate more.

@chavafg chavafg changed the title docker: Run tests in parallel WIP: docker: Run tests in parallel Feb 28, 2019
@chavafg
Copy link
Contributor Author

chavafg commented Feb 28, 2019

/test

@chavafg
Copy link
Contributor Author

chavafg commented Mar 4, 2019

/test

@chavafg
Copy link
Contributor Author

chavafg commented Mar 4, 2019

/test

@chavafg
Copy link
Contributor Author

chavafg commented Mar 5, 2019

So I needed to decrease the number of parallel ginkgo processes to run the docker tests to $(nproc) - 2.
And this way it seems that the docker tests are stable enough for the CI.

I currently see errors on the ARM CI:

Mar 05 07:57:07 testing-1 kata-runtime[95037]: time="2019-03-05T07:57:07.178632617+08:00" level=error msg="Unable to launch /usr/bin/qemu-system-aarch64: exit status 1" arch=arm64 command=create container=118ad7
2c248fd88f1c83e71f6512365a3625eadf455a4eab470ebce7cfa102a9 name=kata-runtime pid=95037 source=virtcontainers subsystem=qmp
Mar 05 07:57:07 testing-1 kata-runtime[95037]: time="2019-03-05T07:57:07.17886441+08:00" level=error msg="qemu-system-aarch64: warning: Invalid CPU topology deprecated: sockets (1) * cores (1) * threads (1) != m
axcpus (96)\nqemu-system-aarch64: -device virtio-blk,disable-modern=false,drive=image-7f02c55e0ce3289f,scsi=off,config-wce=off,romfile=: Failed to get \"write\" lock\nIs another process using the image [/usr/sha
re/kata-containers/kata-containers-2019-03-05-07:36:42.203313919+0800-osbuilder-da9f541-agent-a2037c0]?\n" arch=arm64 command=create container=118ad72c248fd88f1c83e71f6512365a3625eadf455a4eab470ebce7cfa102a9 nam
e=kata-runtime pid=95037 source=virtcontainers subsystem=qmp

Any idea @Pennyzct ?

@Pennyzct
Copy link
Contributor

Pennyzct commented Mar 5, 2019

Hi~ @chavafg I have been working on this for a while. Briefly speaking, when using virtio-blk representing guest rootfs, not NVDIMM, only one container could be launched simultaneously. write lock error will stop you from launching another.
You could find details from this issue runtime/843.
However, thanks to kernel updated to v4.19.X, with a few modifications, arm64 could support NVDIMM, related PR on the way. ;)

@grahamwhaley
Copy link
Contributor

@chavafg - do you know why you need $(nproc) - 2 ? If we know we are running out of memory or something, fine, that would make sense. If we only know the tests fail without -2, then that would make me nervous that we have some sort of race/dependancy that we don't fully understand, that will come back and bite us as we add more tests or they take different amounts of time and overlap in different ways.

Maybe it is to do with launching multiple containers in parallel and over-subscribing the vCPUs or similar - but I think we need to know a little detail.

@chavafg
Copy link
Contributor Author

chavafg commented Mar 5, 2019

Thanks @Pennyzct :)

@grahamwhaley hard to tell, I don't get failures all the time, but when I got them, they are because we are hitting the 60s timeout of each test, without errors on the kata-runtime log.
Doing some local testing, using all CPUs, I see a 50-70 % usage.

I also got this error once on the CI:

<string>: docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing: unknown.

but haven't been able to reproduce locally.

anyway using $(nproc) - 2 is still saving time on the CI, so I think we should have it. I'll rework the PR to not run the ARM tests in parallel until the PRs of Penny land.

Most of our docker tests can be executed in parallel.
Use the `ginkgo -nodes` option to achieve this.
The ones that cannot be run in parallel are now tagged in
the test name as [Serial Test] to run them after we run
the tests that can be run in parallel.

After running several tests locally on different distros,
the number of processes that we can spawn in parallel for
these tests is `$(nproc) - 2`. If using `$(nproc)` or
`$(nproc -1)`, some jobs become unstable.

For firecracker, run all tests serialized.

Fixes: kata-containers#1256.

Signed-off-by: Salvador Fuentes <[email protected]>
@chavafg
Copy link
Contributor Author

chavafg commented Mar 5, 2019

/test

@chavafg chavafg changed the title WIP: docker: Run tests in parallel docker: Run tests in parallel Mar 5, 2019
@grahamwhaley
Copy link
Contributor

it's green - OK @chavafg , if you are happy that running in parallel will not make the tests less stable (and tbh, if they are, it is probably you who will end up chasing it down ;-) ), then let's merge it!

@grahamwhaley grahamwhaley merged commit 662b7a2 into kata-containers:master Mar 6, 2019
@chavafg chavafg deleted the topic/parallel-docker branch April 8, 2019 16:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants