[dbnode] Concurrent time series indexing within a single batch #2146

notbdu · 2020-02-13T03:16:12Z

What this PR does / why we need it:
Improves indexing perf.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

None

Does this PR require updating code package or user-facing documentation?:

None

notbdu · 2020-02-13T03:17:02Z

After <-> Before cpu flame graphs

robskillington · 2020-02-13T12:41:31Z

src/m3ninx/index/segment/builder/builder.go

 )

 var (
 	errDocNotFound = errors.New("doc not found")
 )

+const (
+	// Slightly buffer the work to avoid blocking main thread.
+	indexQueueSize = 2 << 7


Nice, yeah perhaps we even increase this to 1024?

nit: Also I think I've seen the actual number usually present as a comment, i.e. indexQueueSize = 2 << 7 // 128

src/m3ninx/index/segment/builder/builder.go

robskillington · 2020-02-13T12:43:41Z

src/m3ninx/index/segment/builder/builder.go

+				NoFinalizeKey: true,
+			})
+		}
+		b.Unlock()


We actually should be ok to avoid locking here yeah if we're sharding by name?

Ah, I thought that the go runtime would complain about concurrent map access even if we were accessing diff keys. Lemme try w/o.

robskillington · 2020-02-13T12:47:15Z

src/m3ninx/index/segment/builder/builder.go

+			job.batchErr.AddWithLock(index.BatchError{Err: err, Idx: job.idx})
+		}
+		if newField {
+			b.Lock()


Guess can't avoid this lock however since unique fields are shared. We could always create a uniqueFields per worker though and iterate over a list of lists possible in the NewOrderedBytesSliceIter iterator (would need a special implementation of sort.Interface to handle the fact it's a slice of slices).

I'll look into this.

Added sharding for both unique fields and fields map. Removed all locks w/in the index worker.

codecov · 2020-02-14T23:07:29Z

Codecov Report

Merging #2146 into master will increase coverage by 0.8%.
The diff coverage is 81.6%.

@@           Coverage Diff            @@
##           master   #2146     +/-   ##
========================================
+ Coverage    71.4%   72.2%   +0.8%     
========================================
  Files        1018    1019      +1     
  Lines       88346   88430     +84     
========================================
+ Hits        63085   63890    +805     
+ Misses      20940   20245    -695     
+ Partials     4321    4295     -26

Flag	Coverage Δ
#aggregator	`82% <ø> (ø)`	⬆️
#cluster	`85.2% <ø> (-0.2%)`	⬇️
#collector	`82.8% <ø> (ø)`	⬆️
#dbnode	`79% <38.4%> (+2%)`	⬆️
#m3em	`74.4% <ø> (ø)`	⬆️
#m3ninx	`74.6% <86%> (+0.3%)`	⬆️
#m3nsch	`51.1% <ø> (ø)`	⬆️
#metrics	`17.6% <ø> (ø)`	⬆️
#msg	`74.9% <ø> (ø)`	⬆️
#query	`68.1% <ø> (ø)`	⬆️
#x	`83.4% <ø> (+0.2%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 987db51...ac934c2. Read the comment docs.

arnikola · 2020-02-24T13:53:42Z

src/m3ninx/index/batch.go

+	e.Lock()
+	defer e.Unlock()
+	if err.Err == nil {
+		return
+	}
+	e.errs = append(e.errs, err)


nit: can probably be a little bit more efficient as:

if err.Err == nil { return } e.Lock() e.errs = append(e.errs, err) e.Unlock()

Also, seems to only be used in one place; might be more straightforward to do the locking there rather than adding a mutex on the struct? Take it or leave it though

Just moved the locking code around.

arnikola · 2020-02-24T14:00:27Z

src/m3ninx/index/segment/builder/builder_test.go

@@ -78,6 +78,7 @@ var (

 func TestBuilderFields(t *testing.T) {
 	builder, err := NewBuilderFromDocuments(testOptions)
+	defer builder.Close()


Since this is a test, instead of deferring close here, maybe add require.NoError(t, builder.Close()) at the end of the test?

arnikola · 2020-02-24T14:00:38Z

src/m3ninx/index/segment/builder/builder_test.go

@@ -105,6 +106,7 @@ func TestBuilderFields(t *testing.T) {

 func TestBuilderTerms(t *testing.T) {
 	builder, err := NewBuilderFromDocuments(testOptions)
+	defer builder.Close()


Since this is a test, instead of deferring close here, maybe add require.NoError(t, builder.Close()) at the end of the test?

arnikola · 2020-02-24T14:03:13Z

src/m3ninx/index/segment/builder/sharded_fields_map.go

+	data []*fieldsMap
+}
+
+func newShardedFieldsMap(


nit: may be a good idea to add sanity checks for shardInitialCapacity and numShards

arnikola · 2020-02-24T14:06:02Z

src/m3ninx/index/segment/builder/sharded_fields_map.go

+		t, found := fieldMap.Get(k)
+		if found {
+			return t, found
+		}


nit: maybe more go-ish as

if t, found := fieldMap.Get(k); found { return t, true }

arnikola · 2020-02-24T14:07:13Z

src/m3ninx/index/segment/builder/sharded_fields_map.go

+	s.data[shard].SetUnsafe(k, v, opts)
+}
+
+// ResetTerms keeps fields around but resets the terms set for each one.


nit: mismatch between comment and func name

arnikola · 2020-02-24T14:11:33Z

src/m3ninx/index/segment/builder/bytes_slice_iter.go

@@ -25,29 +25,32 @@ import (
 	"sort"

 	"github.com/m3db/m3/src/m3ninx/index/segment"
+
+	"github.com/twotwotwo/sorts"


This is BSD3, pretty sure that has Apache2 compat, but just looking to doublecheck? (Also, think it may need to be added to our glide file?)

edit: we're good

arnikola · 2020-02-24T14:13:52Z

src/m3ninx/index/segment/builder/bytes_slice_iter.go

@@ -86,6 +90,51 @@ func (b *OrderedBytesSliceIter) Close() error {
 	return nil
 }

+type sortableSliceOfSliceOfByteSlices struct {


nit: think we typically postpend this with the direction (Asc or Desc)

arnikola · 2020-02-24T14:31:25Z

src/m3ninx/index/segment/builder/builder.go

-func NewBuilderFromDocuments(opts Options) (segment.DocumentsBuilder, error) {
-	return &builder{
+func NewBuilderFromDocuments(opts Options) (segment.CloseableDocumentsBuilder, error) {
+	concurrency := runtime.NumCPU()


Can we make this come from Options and default to runtime.NumCPU()?

arnikola · 2020-02-24T14:36:41Z

src/m3ninx/index/segment/builder/builder.go

+		if shardInitialCapcity > 0 {
+			shardInitialCapcity /= concurrency
+		}
+		shardUniqueFields := make([][]byte, 0, shardInitialCapcity)


This may be more performant if you allocate the entire [][]byte as a full block, then index into it; i.e.

shardUniqueFields := make([][]byte, 0, concurrency * shardInitialCapcity) for i := 0; i < concurrency; i++ { //... b.uniqueFields = append(b.uniqueFields, shardUniqueFields[i*concurrency:(i+1)*concurrency]) // ... }

Wouldn't be able to grow a shard in this case. Or it would be difficult to do so.

arnikola · 2020-02-24T14:51:34Z

src/m3ninx/index/segment/builder/builder.go

+		}
+		shardUniqueFields := make([][]byte, 0, shardInitialCapcity)
+		b.uniqueFields = append(b.uniqueFields, shardUniqueFields)
+		b.fields = newShardedFieldsMap(concurrency, shardInitialCapcity)


May be possible to bulk-allocate these similarly to b.uniqueFields?

Same comment as before. Growing a shard would be very painful.

arnikola · 2020-02-24T15:32:05Z

src/m3ninx/index/segment/builder/builder.go

+			job.batchErr.AddWithLock(index.BatchError{Err: err, Idx: job.idx})
+		}
+		if newField {
+			b.uniqueFields[job.shard] = append(b.uniqueFields[job.shard], job.field.Name)


Should this happen if the post failed?

Ah, good call, moving around the logic here.

arnikola · 2020-02-24T15:36:51Z

src/m3ninx/index/segment/builder/builder.go

+		if newField {
+			b.uniqueFields[job.shard] = append(b.uniqueFields[job.shard], job.field.Name)
+		}
+		b.wg.Done()


I may be misreading this, but it looks like there's a disconnect between the wg.Add and the wg.Done, where a single wait group may be used by multiple inserts and it's not clear that they complete on time? Maybe rather than adding a wait group on builder, may be better to explicitly init one and pass it into index and add it to the indexJob?

robskillington · 2020-02-28T20:45:12Z

src/dbnode/storage/index.go

@@ -748,6 +748,7 @@ func (i *nsIndex) Flush(

 	builderOpts := i.opts.IndexOptions().SegmentBuilderOptions()
 	builder, err := builder.NewBuilderFromDocuments(builderOpts)
+	defer builder.Close()


Need to move this after the err != nil { return err } since if err != nil then builder will be nil and calling builder.Close() will cause a nil ptr panic.

robskillington · 2020-02-28T20:46:04Z

src/dbnode/storage/index/block.go

-		})
+	// Free segment builder resources.
+	if b.compact.segmentBuilder != nil {
+		b.compact.segmentBuilder.Close()


Need to check the error returned from segment builder too here and emit an invariant error if cannot close.

robskillington · 2020-02-28T20:46:37Z

src/m3ninx/index/segment/builder/builder.go


 	"github.com/m3db/m3/src/m3ninx/doc"
 	"github.com/m3db/m3/src/m3ninx/index"
 	"github.com/m3db/m3/src/m3ninx/index/segment"
 	"github.com/m3db/m3/src/m3ninx/postings"
 	"github.com/m3db/m3/src/m3ninx/util"
+	"go.uber.org/atomic"


nit: This should be grouped next to other third party imports such as "github.com/cespare/xxhash"

src/m3ninx/index/segment/builder/builder.go

robskillington · 2020-02-28T20:57:31Z

src/m3ninx/index/segment/builder/sharded_fields_map.go

+
+func (s *shardedFieldsMap) Get(k []byte) (*terms, bool) {
+	for _, fieldMap := range s.data {
+		if t, found := fieldMap.Get(k); found {


Hm, can we perhaps just hash the k to work out which map to find?

Otherwise we have to do 32 map accesses (on 32 core machine) vs checking just 1 map. This is done for every single field during a compaction to.

Or just make the Get(k) take a shard too, i.e. Get(shard int, k []byte)

Ah, I already have a ShardedGet method. The (b *builder) Terms(field []byte) call doesn't already have shard pre-computed so I added this regular Get. But you are right it looks like it's cheaper to compute the hash vs multiple map accesses.

Here are some benchmark results:

notbdu @ Bos-MacBook-Pro.local m3 (bdu/concurrent-index) $ go test -v -bench . ./bench_test.go goos: darwin goarch: amd64 BenchmarkMapAccess-8 20000000 83.9 ns/op BenchmarkHashing-8 200000000 6.21 ns/op

…ately.

robskillington · 2020-03-04T23:35:36Z

src/dbnode/storage/index.go

@@ -751,6 +751,7 @@ func (i *nsIndex) Flush(
 	if err != nil {
 		return err
 	}
+	defer builder.Close()


Good catch.

robskillington · 2020-03-04T23:38:12Z

src/m3ninx/index/segment/builder/builder.go

+		go b.indexWorker(indexQueue)
+
+		// Give each shard a fraction of the configured initial capacity.
+		shardInitialCapcity := opts.InitialCapacity()


shardInitialCapcity should be shardInitialCapacity?

robskillington · 2020-03-04T23:43:57Z

src/m3ninx/index/segment/builder/builder.go

@@ -236,7 +298,9 @@ func (b *builder) Fields() (segment.FieldsIterator, error) {
 }

 func (b *builder) Terms(field []byte) (segment.TermsIterator, error) {
-	terms, ok := b.fields.Get(field)
+	// NB(bodu): The # of indexQueues and field map shards are equal.
+	shard := int(xxhash.Sum64(field) % uint64(len(b.indexQueues)))


Can we make this a method on the builder to reuse this code? I see it replicated on this line and line 230 i.e.

func (b *builder) shardForField(field []byte) { return int(xxhash.Sum64(field) % uint64(len(b.indexQueues))) }

src/m3ninx/index/segment/builder/builder.go

robskillington

LGTM other than remaining comments

Co-Authored-By: Rob Skillington <[email protected]>

notbdu requested a review from robskillington February 13, 2020 03:16

robskillington reviewed Feb 13, 2020

View reviewed changes

src/m3ninx/index/segment/builder/builder.go Show resolved Hide resolved

robskillington reviewed Feb 13, 2020

View reviewed changes

notbdu force-pushed the bdu/concurrent-index branch from e78beca to 619d427 Compare February 13, 2020 22:27

notbdu force-pushed the bdu/concurrent-index branch from bbbc8c6 to f61d08c Compare February 19, 2020 16:49

arnikola reviewed Feb 24, 2020

View reviewed changes

notbdu force-pushed the bdu/concurrent-index branch from 569a4fb to 86cfdd5 Compare February 26, 2020 22:41

robskillington changed the title ~~Concurrent indexing.~~ [dbnode] Concurrent batch time series indexing Feb 28, 2020

robskillington changed the title ~~[dbnode] Concurrent batch time series indexing~~ [dbnode] Concurrent time series indexing within a single batch Feb 28, 2020

robskillington reviewed Feb 28, 2020

View reviewed changes

src/m3ninx/index/segment/builder/builder.go Show resolved Hide resolved

robskillington reviewed Feb 28, 2020

View reviewed changes

notbdu added 8 commits February 28, 2020 16:42

Concurrent indexing.

dbaeab1

Shard unique fields and add custom 2d sort.

9662f7a

Shard fields map and remove locks for concurrency.

c93ab02

Add header comment.

81692ed

Address PR comments.

8be4138

Remove wg on builder.

97d56ff

Need to defer inside of a function or the Close gets evaluated immedi…

f8a84d8

…ately.

Address PR comments.

77ffa29

notbdu force-pushed the bdu/concurrent-index branch from 6f350ea to 77ffa29 Compare February 28, 2020 21:42

robskillington reviewed Mar 4, 2020

View reviewed changes

src/m3ninx/index/segment/builder/builder.go Outdated Show resolved Hide resolved

robskillington approved these changes Mar 4, 2020

View reviewed changes

notbdu and others added 3 commits March 4, 2020 19:11

Merge branch 'master' into bdu/concurrent-index

6eab81a

Update src/m3ninx/index/segment/builder/builder.go

bbb3a84

Co-Authored-By: Rob Skillington <[email protected]>

Address PR comments.

ac934c2

notbdu merged commit 7228d90 into master Mar 5, 2020

notbdu deleted the bdu/concurrent-index branch March 5, 2020 07:35

[dbnode] Concurrent time series indexing within a single batch #2146

[dbnode] Concurrent time series indexing within a single batch #2146

Conversation

notbdu commented Feb 13, 2020

notbdu commented Feb 13, 2020

robskillington Feb 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Feb 14, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

notbdu Feb 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Mar 4, 2020 • edited Loading

Choose a reason for hiding this comment

robskillington left a comment

Choose a reason for hiding this comment

robskillington Feb 13, 2020 •

edited

Loading

codecov bot commented Feb 14, 2020 •

edited

Loading

notbdu Feb 24, 2020 •

edited

Loading

robskillington Mar 4, 2020 •

edited

Loading