Try to fix EventService test failures #154

yitsushi · 2021-10-20T14:10:21Z

I was able to reproduce the issue with a helping had from cgroup.
Limiting to one CPU with a heavily quota (0.01% of my CPU) revealed the
issue.

Note before I describe what happened:
Containerd does not send messages to subscribers retrospectively, all
subscribers will receive events from the point they subscribed.

The original test created published N events and after that created
subscribers. The connection between the test and containerd is much
slower than the execution of the test, so when containrd wants to send
out the events to all subscribers, they are already there to receive
events. That's why it works on my machine and that's why it can pass on
Github Actions sometimes.

However, on a slow machine, with only one vcpu, the test and containerd
are racing for their own cpu share. In this scenario, the events are
already published before the subscribers are ready to receive them.

Solution:
Create subscribers first and then publish events.

Disclaimer:
There is a chance, all I wrote above is not entire correct, but that's
how I understand it. It does not really matter if we only check the
logic in the test. Originally it was publish->subscribe, but it's not
correct, we need subscribers before we publish events.

Fixes #115

Checklist:

squashed commits into logical changes
includes documentation

yitsushi · 2021-10-20T14:18:43Z

It does not fix it XD it does a painful crash after 2 minutes. Obviously I can't see this on local machine.

codecov-commenter · 2021-10-20T14:30:57Z

Codecov Report

Merging #154 (e6ce829) into main (12b503c) will decrease coverage by 0.45%.
The diff coverage is 28.04%.

❗ Current head e6ce829 differs from pull request most recent head e645e2e. Consider uploading reports for the commit e645e2e to get more accurate results

@@            Coverage Diff             @@
##             main     #154      +/-   ##
==========================================
- Coverage   40.09%   39.64%   -0.46%     
==========================================
  Files          32       36       +4     
  Lines        1444     1695     +251     
==========================================
+ Hits          579      672      +93     
- Misses        789      926     +137     
- Partials       76       97      +21

Impacted Files	Coverage Δ
core/application/app.go	`100.00% <ø> (ø)`
core/application/errors.go	`0.00% <0.00%> (ø)`
core/application/reconcile.go	`0.00% <0.00%> (ø)`
core/plans/microvm_update.go	`0.00% <0.00%> (ø)`
infrastructure/containerd/convert.go	`51.21% <0.00%> (ø)`
infrastructure/containerd/errors.go	`0.00% <0.00%> (ø)`
infrastructure/containerd/lease.go	`51.85% <0.00%> (-18.15%)`	⬇️
infrastructure/firecracker/config.go	`0.00% <0.00%> (ø)`
infrastructure/firecracker/create.go	`0.00% <0.00%> (ø)`
infrastructure/firecracker/provider.go	`0.00% <0.00%> (ø)`
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2941cce...e645e2e. Read the comment docs.

yitsushi · 2021-10-20T17:44:32Z

squash and better description: coming soon

yitsushi · 2021-10-20T18:14:36Z

Nope still not... why

yitsushi · 2021-10-20T19:07:27Z

Now it seems it's happy.

I was able to reproduce the issue with a helping had from cgroup. Limiting to one CPU with a heavily quota (0.01% of my CPU) revealed the issue. Note before I describe what happened: Containerd does not send messages to subscribers retrospectively, all subscribers will receive events from the point they subscribed. The original test created published N events and after that created subscribers. The connection between the test and containerd is much slower than the execution of the test, so when containrd wants to send out the events to all subscribers, they are already there to receive events. That's why it works on my machine and that's why it can pass on Github Actions sometimes. However, on a slow machine, with only one vcpu, the test and containerd are racing for their own cpu share. In this scenario, the events are already published before the subscribers are ready to receive them. Solution: Create subscribers first and then publish events. Disclaimer: There is a chance, all I wrote above is not entire correct, but that's how I understand it. It does not really matter if we only check the logic in the test. Originally it was publish->subscribe, but it's not correct, we need subscribers before we publish events. Fixes liquidmetal-dev#115

yitsushi · 2021-10-21T08:29:59Z

So the sleep when we subscribe to events, it can be slow (miliseconds) to establish the connection to containerd, the wg waits only for the go routine that send out the request to subscribe, but it can't track if it's ready or not. That's why the small sleep is in there, 10 microseconds should be enough.

richardcase

LGTM - just the question around assertion packages.

richardcase · 2021-10-21T08:44:04Z

infrastructure/containerd/event_service_test.go


 	. "github.com/onsi/gomega"
+	"github.com/stretchr/testify/assert"


We don't need 2 assertion packages. If the consensus is testtify/assert then i'm good with that. I like gomega but not so much that i wouldn't be happy using something else.

I used it because it was already in there (no addition in go.mod)

My only issue with the way gomega is used, I don't like the . imports because they can mess with the environment, they are packages and should be used as packages and not a pack of function injected into our test module.

Updated the test to use only Gomega.

yitsushi added kind/bug Something isn't working area/testing Indicates an issue related to test labels Oct 20, 2021

yitsushi marked this pull request as draft October 20, 2021 14:17

yitsushi marked this pull request as ready for review October 20, 2021 17:43

yitsushi force-pushed the 115-event-service-fix branch from 6f4201b to b70bad3 Compare October 20, 2021 18:06

yitsushi force-pushed the 115-event-service-fix branch from b70bad3 to 2d86efb Compare October 20, 2021 18:56

yitsushi requested a review from richardcase October 20, 2021 19:07

yitsushi force-pushed the 115-event-service-fix branch from 2d86efb to b6f1379 Compare October 20, 2021 19:10

richardcase reviewed Oct 21, 2021

View reviewed changes

Use Gomega instead of assert [from feedback]

e645e2e

richardcase approved these changes Oct 21, 2021

View reviewed changes

yitsushi merged commit ddca2ef into liquidmetal-dev:main Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to fix EventService test failures #154

Try to fix EventService test failures #154

yitsushi commented Oct 20, 2021 •

edited

Loading

yitsushi commented Oct 20, 2021 •

edited

Loading

codecov-commenter commented Oct 20, 2021 •

edited

Loading

yitsushi commented Oct 20, 2021 •

edited

Loading

yitsushi commented Oct 20, 2021

yitsushi commented Oct 20, 2021

yitsushi commented Oct 21, 2021

richardcase left a comment

richardcase Oct 21, 2021

yitsushi Oct 21, 2021

yitsushi Oct 21, 2021

yitsushi Oct 21, 2021 •

edited

Loading


		. "github.com/onsi/gomega"
		"github.com/stretchr/testify/assert"

Try to fix EventService test failures #154

Try to fix EventService test failures #154

Conversation

yitsushi commented Oct 20, 2021 • edited Loading

yitsushi commented Oct 20, 2021 • edited Loading

codecov-commenter commented Oct 20, 2021 • edited Loading

Codecov Report

yitsushi commented Oct 20, 2021 • edited Loading

yitsushi commented Oct 20, 2021

yitsushi commented Oct 20, 2021

yitsushi commented Oct 21, 2021

richardcase left a comment

Choose a reason for hiding this comment

richardcase Oct 21, 2021

Choose a reason for hiding this comment

yitsushi Oct 21, 2021

Choose a reason for hiding this comment

yitsushi Oct 21, 2021

Choose a reason for hiding this comment

yitsushi Oct 21, 2021 • edited Loading

Choose a reason for hiding this comment

yitsushi commented Oct 20, 2021 •

edited

Loading

yitsushi commented Oct 20, 2021 •

edited

Loading

codecov-commenter commented Oct 20, 2021 •

edited

Loading

yitsushi commented Oct 20, 2021 •

edited

Loading

yitsushi Oct 21, 2021 •

edited

Loading