Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphql: Updated mutation rewriting to fix OOM issue #5536

Merged
merged 24 commits into from
Jun 16, 2020

Conversation

harshil-goel
Copy link
Contributor

@harshil-goel harshil-goel commented May 28, 2020

Fixes GRAPHQL-451
Fixes #5400

I also wrote a small benchmark to see the profile difference.

On the current release branch, Benchmark fails for 100 elements in the innermost loop. (Starts to fail at 20 elements). The memory profile is for 19 case

Benchmark3LevelDeep5-12             1036           1077422 ns/op
Benchmark3LevelDeep19-12               1        41056766621 ns/op
signal: terminated
Showing nodes accounting for 11704.49MB, 95.88% of 12207.24MB total
Dropped 10 nodes (cum <= 61.04MB)
Showing top 10 nodes out of 31
      flat  flat%   sum%        cum   cum%
 2730.32MB 22.37% 22.37%  8434.67MB 69.10%  encoding/json.mapEncoder.encode
 1914.68MB 15.68% 38.05%  1914.68MB 15.68%  reflect.mapiterinit
 1527.52MB 12.51% 50.56%  1527.52MB 12.51%  reflect.copyVal
 1286.11MB 10.54% 61.10%  3952.80MB 32.38%  reflect.Value.MapKeys
 1182.70MB  9.69% 70.79%  9617.37MB 78.78%  encoding/json.Marshal
 1105.79MB  9.06% 79.85%  1866.95MB 15.29%  github.com/dgraph-io/dgraph/graphql/resolve.squashFragments
  976.04MB  8.00% 87.84%   976.04MB  8.00%  internal/reflectlite.Swapper
  347.59MB  2.85% 90.69%   347.59MB  2.85%  github.com/dgraph-io/dgraph/graphql/resolve.squashIntoList
  328.08MB  2.69% 93.38%   328.08MB  2.69%  github.com/dgraph-io/dgraph/graphql/resolve.squashIntoObject.func1
  305.67MB  2.50% 95.88%   305.67MB  2.50%  strings.(*Builder).grow

As you can see, in the PR, even 100 elements in the innermost loop work. The memory profile (for 19) is attached along.

[Decoder]: Using assembly version of decoder
Benchmark3LevelDeep5-12             4693            229221 ns/op
Benchmark3LevelDeep19-12            1591            834201 ns/op
Benchmark3LevelDeep100-12            286           3938028 ns/op
Benchmark3LevelDeep100-12            343           3321058 ns/op               
Benchmark3LevelDeep1000-12            36          32947170 ns/op               
Benchmark3LevelDeep10000-12            3         337876067 ns/op  
howing nodes accounting for 297.55MB, 70.16% of 424.10MB total
Dropped 47 nodes (cum <= 2.12MB)
Showing top 10 nodes out of 48
      flat  flat%   sum%        cum   cum%
   58.01MB 13.68% 13.68%   270.05MB 63.68%  github.com/dgraph-io/dgraph/graphql/resolve.rewriteObject
   53.51MB 12.62% 26.30%    72.01MB 16.98%  github.com/dgraph-io/dgraph/graphql/resolve.addInverseLink
   37.51MB  8.84% 35.14%    44.01MB 10.38%  github.com/dgraph-io/dgraph/graphql/resolve.xidQuery
   28.50MB  6.72% 41.86%    86.01MB 20.28%  encoding/json.mapEncoder.encode
      27MB  6.37% 48.23%   142.01MB 33.49%  github.com/dgraph-io/dgraph/graphql/resolve.mutationsFromFragments
   25.50MB  6.01% 54.24%    25.50MB  6.01%  reflect.mapiterinit
   19.50MB  4.60% 58.84%    19.50MB  4.60%  github.com/dgraph-io/dgraph/graphql/schema.(*fieldDefinition).Type
      18MB  4.24% 63.09%       18MB  4.24%  github.com/dgraph-io/dgraph/graphql/resolve.newFragment
      15MB  3.54% 66.62%    94.52MB 22.29%  github.com/dgraph-io/dgraph/graphql/resolve.asXIDReference
      15MB  3.54% 70.16%    15.50MB  3.66%  fmt.Sprintf

This change is Reviewable

Docs Preview: Dgraph Preview

@harshil-goel harshil-goel added the area/graphql Issues related to GraphQL support on Dgraph. label May 28, 2020
Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the gist of the change, we are doing 2 mutations instead of 1. The first one creates the nested deep mutation object if it doesn't exist and then we create the actual objects assuming that the deep nested object already exists. We should add some comments mentioning this.

Looks like a nice change! Also looks like earlier the number of mutations would now be at max M + N where M is the number of top level objects and N the number of deep objects whereas we were earlier doing a lot more.

Also, would be good to have some benchmarking tests which try to add random data with 100, 1000, 10000 entities being created for both the xid and the other case.

Reviewed 9 of 9 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @abhimanyusinghgaur, @harshil-goel, and @MichaelJCompton)


graphql/resolve/add_mutation_test.yaml, line 30 at r1 (raw file):

-
  name: "Add deep mutation with variables"

This is fine but it's not the best place for doing load testing. We should also have a benchmark test which tests the time and allocations for adding say 100, 1000 authors and remove this after.


graphql/resolve/add_mutation_test.yaml, line 1169 at r1 (raw file):

  error:
    message: |-
      failed to rewrite mutation payload because duplicate XID found: S1

Was this incorrect before?


graphql/resolve/add_mutation_test.yaml, line 1755 at r1 (raw file):

      }
    }
  dgquery: |-

This might better be situated up there with dgmutation.


graphql/resolve/mutation.go, line 234 at r1 (raw file):

		if req.Query != "" && len(mutResp.GetJson()) != 0 {
			if err := json.Unmarshal(mutResp.GetJson(), &result); err != nil {

result would have the response of the final mutation, is that going to be OK.

Copy link
Contributor

@MichaelJCompton MichaelJCompton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @abhimanyusinghgaur, @harshil-goel, and @pawanrawal)


graphql/resolve/add_mutation_test.yaml, line 30 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

This is fine but it's not the best place for doing load testing. We should also have a benchmark test which tests the time and allocations for adding say 100, 1000 authors and remove this after.

Yeah, I think in general we should have something like all the simple examples just in yaml, then the complex ones should be benchmarks elsewhere.

Also, it might be worth having some larger tests that mint this sort of example up dynamically and just test that there's the right number of queries, right number of mutations, etc.


graphql/resolve/add_mutation_test.yaml, line 100 at r1 (raw file):

  dgquerysec: |-
    query {
      PostSecret4 as PostSecret4(func: eq(PostSecret.title, "ps1")) @filter(type(PostSecret)) {

I'm a bit confused here. How does the first and second query thing work? I'f we've just queried for it, why do we need to go back again, wouldn't we have the answer?


graphql/resolve/add_mutation_test.yaml, line 125 at r1 (raw file):

      }
    }
  dgmutations:

So we run dgquery and dgmutations as a single upsert? So, I'm expecting 1 case in the dgmutations for every possible object creation in the mutation - that's every id reference, xid reference and everything that's definitely a new object?

The we rerun the same queries and use the result in a single upsert. Questions

  • why do we need the condition on the final upsert? Shouldn't the intial case ensure success?
  • why do we need the second set of queries? Don't we know the id's already from the first queries?

graphql/resolve/add_mutation_test.yaml, line 1169 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Was this incorrect before?

Nice - it's removing a duplicate!


graphql/resolve/add_mutation_test.yaml, line 1367 at r1 (raw file):

          "uid":"_:City1"
        }
      cond: "@if(eq(len(District3), 1))"

I really like the way this has made the mutations simpler. But I don't understand why the condition is still needed. It looks to me like the first mutation is set up to ensure that it's there.


graphql/resolve/mutation.go, line 216 at r1 (raw file):

	result := make(map[string]interface{})
	req := &dgoapi.Request{}

is req used outside the loop? Looks like you could just move it inside with the same instantiation as before.


graphql/resolve/mutation.go, line 217 at r1 (raw file):

	req := &dgoapi.Request{
		Query:     dgraph.AsString(upsert.Query),
		CommitNow: true,

why was that still CommitNow: true ... doesn't look right


graphql/resolve/mutation.go, line 219 at r1 (raw file):

	for _, upsert := range upserts {
		if len(upsert.Mutations) == 0 {

when can this happen?


graphql/resolve/mutation.go, line 238 at r1 (raw file):

						schema.GQLWrapf(err, "Couldn't unmarshal response from Dgraph mutation")),
					resolverFailed
			}

I think I understand why you've got the two sets of query now. It's so you can use this cute loop pattern and just run two upserts without having any other logic.

I think that's kinda nice, but it does look to me like another spot that'd be worth thinking about - is it faster to rerun those queries, or to pull out the data from the first query and do it in code. Though, those queries should be super fast because they are always ID or @id lookups.


graphql/resolve/mutation.go, line 250 at r1 (raw file):

	}

	if mutResp != nil {

when can mutResp be nil?


graphql/resolve/mutation_rewriter.go, line 234 at r1 (raw file):

		return mrw.handleMultipleMutations(m)
	}

I think it's time to squash these things together. Can we replace the below by a generic function that handles both cases? What's below looks like one of the cases of the multiple.

What was the original reason for the split


graphql/resolve/mutation_rewriter.go, line 257 at r1 (raw file):

	}

	return []*UpsertMutation{upsert}, schema.GQLWrapf(err, "failed to rewrite mutation payload")

one reason I'm worried about the two cases is that this doesn't look the same as the other case ... why can't this have creations as well?


graphql/resolve/mutation_rewriter.go, line 1007 at r1 (raw file):

Quoted 5 lines of code…
				if xid != nil && !atTopLevel {
					frags = &mutationRes{firstPass: []*mutationFragment{newFragment(val)}}
				} else {
					frags = &mutationRes{secondPass: []*mutationFragment{newFragment(val)}}
				}

so everything in firstPass is the creation steps, everything in secondPass is the reference parts?


graphql/resolve/mutation_rewriter.go, line 1431 at r1 (raw file):

	}

	left = append(left, right...)

is this right? I thought there were reasons why we made sure we didn't write into the existing fragments?

Copy link
Contributor

@abhimanyusinghgaur abhimanyusinghgaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 9 files reviewed, 18 unresolved discussions (waiting on @harshil-goel and @pawanrawal)


graphql/resolve/add_mutation_test.yaml, line 1169 at r1 (raw file):

Previously, MichaelJCompton (Michael Compton) wrote…

Nice - it's removing a duplicate!

Yeah! Earlier if there was an error, squashFragments would squash it, and so it would result in multiple errors for the same thing. But, now its better.


graphql/resolve/add_mutation_test.yaml, line 1691 at r2 (raw file):

  gqlmutation: |
    mutation addAuthor($auth: AddLabInput!) {
      addLab(input: [$auth]) {

its a bit confusing, it should be:

mutation addLab($lab: AddLabInput!) {
  addLab(input: [$lab]) {

And same for the variable name in gqlvariables.


graphql/resolve/add_mutation_test.yaml, line 1738 at r2 (raw file):

                        {
                            "ComputerOwner.computers": {
                                "uid": "_:Computer4"

shouldn't it be:

"uid": "uid(Computer4)"

As Computer4 was already created?
Will it work using a blank node from first pass in second pass?

Also, at present we don't have e2e tests for XIDs which are 3-4 level deep.
So, we should add two kind of e2e test cases each with 3-4 levels:

  1. there is no XID in top level object, but all the nested level objects use xid
  2. There is XID at every level.
    It can be done using the existing Teacher, Student schema in e2e.

graphql/resolve/mutation.go, line 234 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

result would have the response of the final mutation, is that going to be OK.

Also, as result would contain only the last upsert response, so should we skip the json unmarshalling for rest of the upserts?

@harshil-goel harshil-goel marked this pull request as ready for review June 8, 2020 13:13
@harshil-goel harshil-goel requested review from manishrjain and a team as code owners June 8, 2020 13:13
Copy link
Contributor

@abhimanyusinghgaur abhimanyusinghgaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 2 of 13 files reviewed, 18 unresolved discussions (waiting on @harshil-goel, @manishrjain, and @pawanrawal)

Copy link
Contributor

@MichaelJCompton MichaelJCompton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 2 of 13 files reviewed, 17 unresolved discussions (waiting on @harshil-goel, @manishrjain, and @pawanrawal)

@harshil-goel harshil-goel merged commit 5e89162 into release/v20.03 Jun 16, 2020
@harshil-goel harshil-goel deleted the harshil-goel/graphql-oom branch June 16, 2020 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/graphql Issues related to GraphQL support on Dgraph.
Development

Successfully merging this pull request may close these issues.

4 participants