-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frontend acknowledge backfill #1293
Frontend acknowledge backfill #1293
Conversation
Store Backfills in a sorted set to retrieve all expired backfills by time.
It has the same input as output of GetExpiredBackfillIDs function. Some backfillIDs can be acknowledged in between,But this should not affect the work of cleanup service.
Used CreateBackfill implementation from another PR.
CreateBackfill() should also query AcknowledgeBackfill in Redis, just to set initial initial timestamp. |
Return back missing test after the merge.
c9c664e
to
ec34b3b
Compare
Needed in 2 tests - AcknowledgeBackfill and ProposedBackfillCreate.
panic(err) | ||
} | ||
return result | ||
actual, err := om.Frontend().GetBackfill(ctx, &pb.GetBackfillRequest{BackfillId: bfID}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you test GetBackfill here?
The name of the function is createMatchWithBackfill and you are passing backfill as a parameter.
I think if you want to test GetBackfill you should do it out of this method in order to not overload one with a lot of code. createMatchWithBackfill does more than a simple create match step here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a refactor of @yeukovichd test. Now it is used in 3 place, so that we can associate ticket with backfill using a match proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more additional check is not bad. So I will not remove this line.
c93f0b1
to
151e7a8
Compare
test fixes.
151e7a8
to
f43894c
Compare
if req.GetBackfillId() == "" { | ||
return nil, status.Errorf(codes.InvalidArgument, ".BackfillId is required") | ||
} | ||
if req.GetAssignment().Connection == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Assignment is nil you will fail here with invalid memory address or nil pointer dereference
error
|
||
ticketID := createMatchWithBackfill(ctx, om, createdBf, t) | ||
conn := "127.0.0.1:4242" | ||
getBF, err := om.Frontend().AcknowledgeBackfill(ctx, &pb.AcknowledgeBackfillRequest{BackfillId: createdBf.Id, Assignment: &pb.Assignment{Connection: conn, Extensions: map[string]*any.Any{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests with parameters validation are missing. seems you are testing only a happy path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I think parameters validation can form a new test - frontend_service_test.go
051c741
to
7b4f7db
Compare
@@ -132,20 +131,39 @@ func TestBackfillFrontendLifecycle(t *testing.T) { | |||
require.Nil(t, get) | |||
} | |||
|
|||
func TestProposedBackfillCreate(t *testing.T) { | |||
ctx := context.Background() | |||
func TestAcknowledgeBackfill(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this test does not cover AcknowledgeBackfill at all, otherwise, you would notice that you do not call the Acknowledge backfill method from a statestore. This test should be completely rewritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is E2E test, so it is hard to test this.
I know that there is an additional Redis state change - this would be covered in a Unit test.
It covers one of main effects of acknowledgment - all tickets get assigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see why you can not cover AcknowledgeBackfill step in e2e, you still can do that in unit test to check the redis state, that is what I meant. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added UT.
@akremsa Addressed your comments. Tests which are flaky on CI are TestProposedBackfillCreate and TestProposedBackfillUpdate |
9f7440c
to
c0b35d8
Compare
When FetchMatches takes more than 200 ms pendingReleaseTimeout Ticket got expired and no Tickets can be queried thereafter.
if req.GetAssignment() == nil { | ||
return nil, status.Errorf(codes.InvalidArgument, ".Assignment is required") | ||
} | ||
if req.GetAssignment().Connection == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connection is not a required field on assignments.
return nil, err | ||
} | ||
} | ||
return bf, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the tickets are assigned, they need to be removed from the associated ticket list? Otherwise they'll be assigned every acknowledge call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated: now removing only assigned tickets from Associated tickets list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And inside the mutex section
Add Mutex lock, UpdateBackfill accordingly after UpdateAssignments call.
All comments are resolved |
@HazWard @calebatwd would love to have y'all review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good from over here - seems all the comments have been addressed as well.
return nil, status.Errorf(codes.InvalidArgument, ".Assignment is required") | ||
} | ||
|
||
m := s.store.NewMutex(req.GetBackfillId()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am curious what this mutex is protecting? I assume it protects the backfill structure itself from getting updated when we are doing this. However, I see that the UpdateAssignments call is independent of the backfill structure - so you could potentially have another parallel UpdateAssignments stomp on the tickets when this method is doing so too. Is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that on line 406 we execute UpdateBackfill
and Backfill internal protobuf format includes also ticketed associated with this Backfill. So this lock is necessary here.
https://github.com/googleforgames/open-match/pull/1293/files#diff-fbf6a8aad66158048250071262ad0c5fd8c08c9034840c5802784888408881f3R406
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that the lock is necessary here prevent the backfill from being modified from concurrent calls. I am not sure if it is sufficient. Can you explain how it protects the UpdateAssignments call from modifying the tickets while they are being updated from the AcknowledgeBackfill call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tickets are immutable, so we don't need to keep track of their state. They can only be assigned, but searchFields could not be updated, this is the main difference with Backfill which has generation (version) number.
That's why no changes to that end is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not mention updating the searchFields at all.
TicketA is a part of BackfillA. You receive a call to AcknowledgeBackfill for BackfillA - and are updating the assignment on the TicketA by making UpdateAssignments call - Line 389.
Concurrently, the Backend receives AssignTickets method setting Assignment for TicketA to a different assignment.
Is there anything that prevents the AssignTickets method from calling UpdateAssignments call on TicketA concurrently with AcknowledgeBackfill calliong UpdateAssignments for TicketA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is not possible to have one ticket being associated to two backfills or used in a normal match. Evaluator would not make this happen. Only one match would be selected with a ticketA and later on it would be moved into pendingRelease
tickets not visible by QueryTickets
later on. We are working on scale tests it will let us know if concurrent issues like this would arise.
return nil, err | ||
} | ||
|
||
// all assigned tickets should be removed from the associatedTickets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For my understanding)
I see us select all tickets on the backfill in associatedTickets (from call to s.store.GetBackfill), We then assign all those tickets the assignment in s.store.UpdateAssignments - which returns us tickets. We later update the backfill with the delta between associatedTickets and this 'tickets' which are essentially unassigned.
Why does this 'unassigned tickets' exist? didnt we intend to update assignment on all the tickets when calling the UpdateAssignment call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that UpdateAssignment
can return not the full list and some tickets are left unassigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code in UpdateAssignment, any ticket left unassigned is because it was not found. (See: AssignmentFailure_TICKET_NOT_FOUND). So currently, any ticket that is returned as not found from UpdateAssignment is still left on the Backfill? We should atleast log the tickets that we leave unassigned here as UpdateAssignment returns a list of errors along with a list of assigned tickets (currently we ignore the errors list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added logging of the errors on updateAssignment
request.
We decided to remove all tickets after UpdateAssignment()
call, since those unassignedTickets are not found.
We can add this unassigned tickets back in a separate PR if a need arise.
Actually I have chosen probably not the best name for statestore function |
New function name seems more reasonable as it do only Redis timestamp updates, updateAcknowledgmentTimestamp.
Simplified the |
@sawagh I have applied all your comments. PTAL |
We need to deindex tickets and add error logging for all NotFound tickets.
@sawagh I have added error logging with Test case containing one of two tickets got deleted before the AcknowledgeBackfill call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we dont resolve the concurrency comment, I am ok checking this in and tracking that in a separate issue.
Add AcknowledgeBackfill() request handler.
What this PR does / Why we need it:
Create AssignmentGroup for all associated tickets and use Redis stateStore, not backend to assign all tickets.
Which issue(s) this PR fixes:
Work on #1240
Special notes for your reviewer:
Contains part of #1279 as well as #1288 , so check only latest commit when viewing files changes. Namely:
9b5cfde
AcknowledgeBackfill do not update Generation, unlike other UpdateBackfill methods (from GS and Frontend).