Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query backfills #1280

Closed
wants to merge 35 commits into from
Closed

Query backfills #1280

wants to merge 35 commits into from

Conversation

hsorellana
Copy link
Contributor

@hsorellana hsorellana commented Nov 11, 2020

What this PR does / Why we need it:
This PR is WIP which adds the ability to query backfill objects. Still needs a lot of work.

Special notes for your reviewer:
Don't pay attention to Frontend and Statestore func additions, since they are unrelated to this PR but i needed some functionality from those services to get QueryService working

This is related to Backfill feature.

@hsorellana hsorellana changed the title Query bf Query backfills Nov 11, 2020
@google-cla google-cla bot added the cla: yes label Nov 11, 2020
@Laremere
Copy link

We need to be caching results here, and not loading only all backfills for every query. So we need a backfill cache like we have a tickets cache.

Some context: #1123

We used to use in redis filtering for tickets, and the performance was frankly awful. As the number of queries goes up, the load on redis (which is single threaded!) goes up linearly. By joining multiple query calls into one cache update, the load only scales relative to the number of query service instances. Additionally, it reduces the number of times a given ticket/backfill is loaded from redis (per version) to only once per query instance, instead of once per query.

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 17, 2020

We have discussed a need to make an update to the design: include Backfill Version field.
Point to have it: easy way to invalidate Backfill updates from the backend, when the result of the MMF (outdated Backfill queried) has a previous version and a resulting match proposal should not be taken into account, as an attempt to Update Backfill with an outdated version in it would be rejected.

Imagine there is a version 1 Backfill sits in the query. It was used to form a matchProposal and lead to updated Backfill. Redis has a Backfill version 2 now.
Next round of MMF for some reason queryBackfills and received old version 1 Backfill, form a matchProposal but this match even if it passes Evaluator. An update of the Backfill from 1->2nd version would be rejected, since Redis already know version 2. So no Tickets would be associated with this Backfill.

Two options to achieve the aforementioned behaviour:

  1. Include Version along with Generation into Backfill protobuf definition. It would also be contained in the matchProposal as part of the Match
  2. Include Version to the InternalBackfill representation also stored in Redis. (Along with Backfill, TicketIds - Version field). This leads to a need to change Match protobuf - new field backfill_version, along with current Backfill and allocate_gameserver

@hsorellana
Copy link
Contributor Author

ping @Laremere ☝️

@syntxerror syntxerror added this to the v1.2.0 milestone Nov 18, 2020
if !returnedBackfillByQuery(t, tc) {
require.Fail(t, "Expected to find backfill in pool but didn't.")
}
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Laremere i could use some help here. I'm using the currently existing test cases for querying backfills. Do you think are useful scenarios which doesn't provide false positives?? Or should we revisit this??

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're asking? These tests seem good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking if current test cases are useful for testing this. But if you think the test cases are good for querying backfills, i'll leave them untouched

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 20, 2020

I vote for splitting this PR into at least 2 parts. First part should include IndexBackfill which would be used on Create and Update Backfills.
We need to introduce version for Backfills as well as use hset or more complex keys in set to include version, and remove old versions on IndexBackfill.
To switch in query_service.go to:

		if b, ok := bc.backfills[id]; !ok || b.Version != id.Version {
			toFetch = append(toFetch, id)
		}

Copy link
Collaborator

@aLekSer aLekSer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now tests are fine. Please roll back TicketCache and move this to another PR. So that we don't update the core logic.

@hsorellana hsorellana marked this pull request as ready for review November 30, 2020 21:07
Copy link
Collaborator

@aLekSer aLekSer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the PR. The main functionality is working now. Tests are flaky at the moment, but most of the time passes locally. Let's skip them and unblock @yeukovichd MMF and Backend Changes.

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 30, 2020

@Laremere Scott can you please review this PR? It is needed to proceed with MMF changes.


pf, err := filter.NewPoolFilter(pool)
if err != nil {
return err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be an exact status code like in the return statements above? If so, please check other returns in this method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryTickets does not have. But we can make it better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume nothing to be done. NewPoolFilter already included those codes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but what should be done with other returns down below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NewPoolFilter returns status.Error(...) if some error happens, so here that error it is only returned

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 30, 2020

TestBackfillNotFound is failing (hard to reproduce locally):
Step #8 - "Test: Services": --- FAIL: TestBackfillNotFound (3.44s) Step #8 - "Test: Services": --- FAIL: TestBackfillNotFound/QueryBackfill/double_range_minMaxAreNan#02 (0.06s) Step #8 - "Test: Services": query_tickets_test.go:262: Step #8 - "Test: Services": Error Trace: query_tickets_test.go:262

toFetch := []string{}
for id := range indexedBackfills {
if _, ok := bc.backfills[id]; !ok {
toFetch = append(toFetch, id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you call bc.store.GetBackfill right here, instead of iterating through toFetch down below? We can avoid 1 extra for loop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here could be a TODO or something, in order to keep this similar to runRequest of Tickets. Later on, we can optimise it.

Copy link
Collaborator

@aLekSer aLekSer Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are flaky even without this enhancement. The main difference is hset with Generation in it vs usual set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here i have some doubts because, if we keep the same behavior from TicketCache, there are some metrics i need to implement and one of them uses the toFetch variable. So on one hand we could do what you say Alexey, or we keep the same metrics as for tickets.
cc: @Laremere

}
}

for _, backfillToFetch := range toFetch {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried this loop will be noticeably slow, because it could potentially make many separate requests to redis. Especially true because it's expected for backfills to change a lot and need to be refreshed. For tickets, there's a bulk get which is done using a single redis call. I'm not opposed to merging it in this state, and adding that later (as long as it's high priority to add.)


func newBackfillCache(b *appmain.Bindings, cfg config.View) *backfillCache {
bc := &backfillCache{
store: statestore.New(cfg),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both caches should share the same statestore, move this to be an argument to the newCache, and create it once.

@@ -325,3 +371,146 @@ func (tc *ticketCache) update() {
logger.Debugf("Ticket Cache update: Previous %d, Deleted %d, Fetched %d, Current %d", previousCount, deletedCount, len(toFetch), len(tc.tickets))
tc.err = nil
}

type backfillCache struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply copy pasting here created a lot of duplicated code, which I mostly don't like because the common part is tricky concurrency stuff.

I think the best solution here is to refactor the ticketCache into simply cache. Include in the struct (and newCache) an update function with type func(store statestore.Service, state interface{}) error, moving both update functions to static functions which can be passed into the constructor. You'll need a default value for the state as well.

Then change f in request to be f func(interface{}). For the update and request functions/closures, simply do a cast.

It's not pretty, and generics would be really nice here. However all of the casts will trivially be tested by the e2e tests, so I'm not really worried about a bug there. I think the cost of a few casts is worth de-duplicating the concurrency code.
Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also saw lots of duplicate code here, just thought it might be united into one set of functions later on at the refactoring stage. In order to leave beta (backfill) and release (ticket) code separate.
Overall there is only one condition which is changed, it is comparing Generation number in a memory (map) and in HSET in Redis.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further examples to clarify:

type ticketCache struct {
	store statestore.Service

	requests chan *cacheRequest

	// Single item buffered channel.  Holds a value when runQuery can be safely
	// started.  Basically a channel/select friendly mutex around runQuery
	// running.
	startRunRequest chan struct{}

	wg sync.WaitGroup

        update func(statestore.Service, interface{}) error

	// Mutlithreaded unsafe fields, only to be written by update, and read when
	// request given the ok.
        value interface{}
	err     error
}
func (s *queryService) QueryTicketIds(req *pb.QueryTicketIdsRequest, responseServer pb.QueryService_QueryTicketIdsServer) error {
       /// <code removed here for example simplicity>
	var results []string
	err = s.tc.request(ctx, func(value interface{}) {
                tickets := value.(map[string]*pb.Ticket)
		for id, ticket := range tickets {
			if pf.In(ticket) {
				results = append(results, id)
			}
		}
	})
	if err != nil {
		err = errors.Wrap(err, "QueryTicketIds: failed to run request")
		return err
	}
       /// <code removed here for example simplicity>
}
func updateTicketCache(store statestore.Service, value interface{}) error {
  tickets := value.(map[string]*pb.Ticket)
   /// etc....
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered by this PR #1300 . We can close this PR I think.

@@ -66,3 +70,59 @@ func TestGetPageSize(t *testing.T) {
})
}
}

func TestBackfillCache(t *testing.T) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'd skip testing here (as I did with tickets), and just make sure all these cases are covered by e2e.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, the "call create and see backfill in query, then call update and the backfill in query is updated" test case on e2e is super important. Be good to also check with delete properly making it go away. (Do we have that test case for tickets? If not we should...)

Doing those tests would make these tests redudant.

if !returnedBackfillByQuery(t, tc) {
require.Fail(t, "Expected to find backfill in pool but didn't.")
}
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're asking? These tests seem good to me.

})
if err != nil {
err = errors.Wrap(err, "QueryBackfills: failed to run request")
return err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hsorellana I mean should you return here some status codes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouh you're right!

@hsorellana hsorellana closed this Dec 8, 2020
@hsorellana hsorellana deleted the query-bf branch December 16, 2020 19:36
@syntxerror syntxerror added the wontfix This will not be worked on label Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants