Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventStream index not respected #12538

Closed
tgross opened this issue Apr 11, 2022 · 4 comments · Fixed by #12548
Closed

EventStream index not respected #12538

tgross opened this issue Apr 11, 2022 · 4 comments · Fixed by #12548
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/events Issues related to events type/bug

Comments

@tgross
Copy link
Member

tgross commented Apr 11, 2022

In our nightly E2E suite we're seeing failures in TestE2E/Events/*events.EventsTest/TestStartIndex in the last assertion (ref events.go#L254) that we don't get any events older than the provided index. This fails consistently both in the E2E infra and running against a local cluster.

With a little printf debugging...

patch
diff --git a/e2e/events/events.go b/e2e/events/events.go
index 28c0614a1..10bd02e39 100644
--- a/e2e/events/events.go
+++ b/e2e/events/events.go
@@ -234,10 +234,13 @@ func (tc *EventsTest) TestStartIndex(f *framework.F) {
        err = e2eutil.Register(jobID2, "events/input/deploy.nomad")
        require.NoError(t, err)

+       fmt.Printf("startIndex: %d\n", startIndex)
+
        // ensure there is a deployment promotion event
        foundUnexpected := false
        testutil.WaitForResult(func() (bool, error) {
                for _, e := range jobEvents {
+                       fmt.Printf("[%d] %s - %s\n", e.Index, e.Type, e.Key)
                        if e.Type == "JobRegistered" {
                                if e.Index <= startIndex {
                                        foundUnexpected = true

... we get the following output:

$ go test -v . -suite Events -run 'TestE2E/Events/\*events\.EventsTest/TestStartIndex'
=== RUN   TestE2E
=== RUN   TestE2E/Events
=== RUN   TestE2E/Events/*events.EventsTest
=== RUN   TestE2E/Events/*events.EventsTest/TestStartIndex
startIndex: 122
[121] JobRegistered - deployment-2d23e072
[122] JobRegistered - deployment2-2d23e072
    events.go:257:
                Error Trace:    events.go:257
                Error:          Should be false
                Test:           TestE2E/Events/*events.EventsTest/TestStartIndex
                Messages:       found events from earlier-than-expected indices
=== RUN   TestE2E/TaskEvents
    framework.go:133: skipping suite 'TaskEvents': only running suite "Events"
--- FAIL: TestE2E (1.17s)
    --- FAIL: TestE2E/Events (1.17s)
        --- FAIL: TestE2E/Events/*events.EventsTest (0.94s)
            --- FAIL: TestE2E/Events/*events.EventsTest/TestStartIndex (0.91s)
    --- SKIP: TestE2E/TaskEvents (0.00s)
FAIL
FAIL    github.com/hashicorp/nomad/e2e  1.873s
FAIL

If I add a sleep between registering the initial.nomad job and querying it so we can get the start index, the test passes and shows several PlanResult events.

@tgross tgross added type/bug theme/events Issues related to events labels Apr 11, 2022
@tgross tgross self-assigned this Apr 11, 2022
@tgross
Copy link
Member Author

tgross commented Apr 11, 2022

With git bisect I found the test failure was introduced by 8863d1e (#12436), but then when I tried to document a working example I found it to be failing on Nomad 1.2.6 as well. This suggests that there might be a reliable timing issue here (i.e. a race condition with a wide window) rather than a originating in #12436.

If we look at the Events API docs, we see:

Specifies the index to start streaming events from. If the requested index is no longer in the buffer the stream will start at the next available index.

Which isn't what the test is asserting. The docs look to be in agreement with event_buffer.go#L155-L164:

// StarStartAtClosest returns the closest bufferItem to a requested starting
// index as well as the offset between the requested index and returned one.
func (b *eventBuffer) StartAtClosest(index uint64) (*bufferItem, int) {
	item := b.Head()
	if index < item.Events.Index {
		return item, int(item.Events.Index) - int(index)
	}
	if item.Events.Index == index {
		return item, 0
	}

So this looks like a test bug -- note that the test checks both >= and <= in the assertion, which is a good sign that this was just a typo. PR incoming with a fix.

@tgross
Copy link
Member Author

tgross commented Apr 11, 2022

Nope, sorry, it's an actual bug in the code. With that fix I'm still getting events that are before the start index:

$ go test -v . -suite Events -run 'TestE2E/Events/\*events\.EventsTest/TestStartIndex'
=== RUN   TestE2E
=== RUN   TestE2E/Events
=== RUN   TestE2E/Events/*events.EventsTest
=== RUN   TestE2E/Events/*events.EventsTest/TestStartIndex
startIndex 93
[92] JobRegistered deployment-7855ad90
[93] JobRegistered deployment2-7855ad90
[94] PlanResult deployment2-7855ad90
    events.go:257:
                Error Trace:    events.go:257
                Error:          Should be false
                Test:           TestE2E/Events/*events.EventsTest/TestStartIndex
                Messages:       found events from earlier-than-expected indices
=== RUN   TestE2E/TaskEvents
    framework.go:133: skipping suite 'TaskEvents': only running suite "Events"
--- FAIL: TestE2E (1.07s)
    --- FAIL: TestE2E/Events (1.06s)
        --- FAIL: TestE2E/Events/*events.EventsTest (0.92s)
            --- FAIL: TestE2E/Events/*events.EventsTest/TestStartIndex (0.91s)
    --- SKIP: TestE2E/TaskEvents (0.00s)
FAIL
FAIL    github.com/hashicorp/nomad/e2e  1.673s
FAIL

Going to mark this as accepted for a follow-up.

@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Apr 11, 2022
@tgross tgross assigned tgross and unassigned tgross Apr 11, 2022
@tgross
Copy link
Member Author

tgross commented Apr 11, 2022

Ok, so it was a testing issue, and I've opened #12548 to fix that.

@github-actions
Copy link

github-actions bot commented Oct 9, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/events Issues related to events type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant