Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize mutation and delta application #2987

Merged
merged 12 commits into from
Feb 8, 2019
Merged

Conversation

manishrjain
Copy link
Contributor

@manishrjain manishrjain commented Feb 7, 2019

Mutation and Txn Commit/Abort application happen in a serial order (among other less frequent operations). So, the latency of each operation has a cascading effect on the overall latency of transaction.

This PR optimizes mutation application by dividing up the task of each mutation proposal among goroutines. For a batch of 2000 NQuads, this would be divided up among 4 goroutines.

For commits, this PR makes 2 changes. It removes relying upon a Badger lookup to determine if the commit is being done at a lower version than the latest version of the key, instead picking this information up from the posting list. Secondly, it switches how txns are run to create fewer larger transactions instead of one txn per key-value.

With this PR, mutate latency goes from original 13.5ms to 7.9ms (>40% drop). Commit latency goes from 39ms to 19.7ms (50% drop). Either of these changes shouldn't have any side-effects and should be NOOPs in terms of functionality.

To make these measurements, this PR also adds metrics around each of these functions.


This change is Reviewable

… information about posting list version to detect if we are writing to a lower version. And create bigger fewer txns. This change leads to a 50% drop in latency (from 39ms to 19.5ms).
}
// tags = append(tags, tag.Upsert(x.KeyError, perr.Error()))
ms := x.SinceInMilliseconds(start)
ostats.RecordWithTags(context.Background(), tags, x.LatencyMs.M(ms))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of ostats.RecordWithTags is not checked (from errcheck)

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 9 files at r1, 3 of 5 files at r2.
Reviewable status: 7 of 8 files reviewed, 4 unresolved discussions (waiting on @manishrjain)


posting/list.go, line 80 at r2 (raw file):

}

func (l *List) MaxVersion() uint64 {

So far MaxVersion is only used inside the posting package so it'd be better to call it maxVersion so that it's not exported.

We can always rename it back to MaxVersion if it turns out other packages need it.


posting/mvcc.go, line 106 at r2 (raw file):

	txn.Unlock()

	// TODO: Simplify this. All we need to do is to get the PL for the key, and if it has the

Is this TODO still accurate?


posting/mvcc.go, line 112 at r2 (raw file):

	var idx int
	for idx < len(keys) {

What's the purpose of having the writer.Update statement wrapped in a for loop?

At first, I thought it was so that if writer.Update fails, you can retry from the point the previous call
of writer.Update stopped but I see that CommitToDisk returns if writer.Update returns an error.

I think a short comment explaining the reason would be helpful.


worker/draft.go, line 33 at r2 (raw file):

	ostats "go.opencensus.io/stats"
	tag "go.opencensus.io/tag"

package is already called tag (https://github.com/census-instrumentation/opencensus-go/blob/master/tag/context.go#L16) so you can just do

"go.opencensus.io/tag"

or perhaps you meant to call this package otag like that rest of the packages imported from opencensus.

Copy link
Contributor Author

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 8 files reviewed, 4 unresolved discussions (waiting on @martinmr)


posting/list.go, line 80 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

So far MaxVersion is only used inside the posting package so it'd be better to call it maxVersion so that it's not exported.

We can always rename it back to MaxVersion if it turns out other packages need it.

Done.


posting/mvcc.go, line 106 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Is this TODO still accurate?

Nope. Removed.


posting/mvcc.go, line 112 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

What's the purpose of having the writer.Update statement wrapped in a for loop?

At first, I thought it was so that if writer.Update fails, you can retry from the point the previous call
of writer.Update stopped but I see that CommitToDisk returns if writer.Update returns an error.

I think a short comment explaining the reason would be helpful.

Add a comment. writer.Update could return early if txn gets too big. In that case, we would still have to process the remaining keys.


worker/draft.go, line 33 at r2 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

package is already called tag (https://github.com/census-instrumentation/opencensus-go/blob/master/tag/context.go#L16) so you can just do

"go.opencensus.io/tag"

or perhaps you meant to call this package otag like that rest of the packages imported from opencensus.

Done. Looks like it was auto-imported like this.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: golangcibot is still complaining about the return value of ostats.RecordWithTags not being used but I'll let you decide what to do with that warning.

Reviewable status: 4 of 8 files reviewed, all discussions resolved (waiting on @martinmr)

@manishrjain manishrjain merged commit 2197155 into master Feb 8, 2019
@manishrjain manishrjain deleted the mrjn/mutate-concurrently branch February 8, 2019 00:37
@manishrjain
Copy link
Contributor Author

No need to check that error.

manishrjain added a commit that referenced this pull request May 2, 2019
Mutation and Txn Commit/Abort application happen in a serial order (among other less frequent operations). So, the latency of each operation has a cascading effect on the overall latency of transaction.

This PR optimizes mutation application by dividing up the task of each mutation proposal among goroutines. For a batch of 2000 NQuads, this would be divided up among 4 goroutines.

For commits, this PR makes 2 changes. It removes relying upon a Badger lookup to determine if the commit is being done at a lower version than the latest version of the key, instead, it picks this information up from the posting list. Secondly, it switches how txns are run to create fewer larger transactions instead of one txn per key-value.

With this PR, mutate latency goes from original 13.5ms to 7.9ms (>40% drop). Commit latency goes from 39ms to 19.7ms (50% drop). Either of these changes shouldn't have any side-effects and should be NOOPs in terms of functionality.

To make these measurements, this PR also adds metrics around each of these functions.

BREAKING: With these changes, the mutations within a single call are rearranged. So, no assumptions must be made about the order in which they get executed.

Changes:
* Allow mutations to happen concurrently
* Optimize how txns get committed. In particular, use the already known information about posting list version to detect if we are writing to a lower version. And create bigger fewer txns. This change leads to a 50% drop in latency (from 39ms to 19.5ms).
* Bring >= back.
* Unlock in defer dumbass.
* Address Martin's comments
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
Mutation and Txn Commit/Abort application happen in a serial order (among other less frequent operations). So, the latency of each operation has a cascading effect on the overall latency of transaction.

This PR optimizes mutation application by dividing up the task of each mutation proposal among goroutines. For a batch of 2000 NQuads, this would be divided up among 4 goroutines.

For commits, this PR makes 2 changes. It removes relying upon a Badger lookup to determine if the commit is being done at a lower version than the latest version of the key, instead, it picks this information up from the posting list. Secondly, it switches how txns are run to create fewer larger transactions instead of one txn per key-value.

With this PR, mutate latency goes from original 13.5ms to 7.9ms (>40% drop). Commit latency goes from 39ms to 19.7ms (50% drop). Either of these changes shouldn't have any side-effects and should be NOOPs in terms of functionality.

To make these measurements, this PR also adds metrics around each of these functions.

BREAKING: With these changes, the mutations within a single call are rearranged. So, no assumptions must be made about the order in which they get executed.

Changes:
* Allow mutations to happen concurrently
* Optimize how txns get committed. In particular, use the already known information about posting list version to detect if we are writing to a lower version. And create bigger fewer txns. This change leads to a 50% drop in latency (from 39ms to 19.5ms).
* Bring >= back.
* Unlock in defer dumbass.
* Address Martin's comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants