New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[dbnode] Streaming writer #2618

Merged

linasm merged 13 commits into master from gg/large-tiles-writer

Sep 18, 2020

Collaborator

gediminasgu commented Sep 9, 2020

Writer for Large Tiles. Basically it's the same writer just in place of caching entries in memory it writes index file at the same time when writing data to disk.

Special notes for your reviewer:
The later usage of this writer can be found here: https://github.com/m3db/m3/blob/linasm/fix-large-tiles-resource-management/src/dbnode/storage/shard.go#L2701

NONE

NONE


          [dbnode] Large tiles writer

8ccb4b8

gediminasgu added the area:tiles label

gediminasgu mentioned this pull request

[dbnode] Large tiles writer tests #2608

Closed


          minor refactorings

7706fd8

arnikola reviewed

View reviewed changes

src/dbnode/persist/fs/write.go Outdated Show resolved Hide resolved

arnikola reviewed

View reviewed changes

src/dbnode/persist/fs/write.go Outdated Show resolved Hide resolved

ryanhall07 reviewed

View reviewed changes

src/dbnode/persist/fs/write_large_tiles.go Outdated Show resolved Hide resolved

mmihic reviewed

View reviewed changes

src/dbnode/persist/fs/write_large_tiles.go Outdated Show resolved Hide resolved

gediminasgu and others added 4 commits

September 11, 2020 14:11


          minor refactoring

d6a22b7


          Skip tagsEncoder, use encodedTags directly


          Rename LargeTilesWriter to StreamingWriter

564e2d6


          Merge branch 'master' into gg/large-tiles-writer

ee1d37c

linasm requested a review from arnikola

September 14, 2020 07:26

linasm and others added 3 commits

September 14, 2020 15:02


          Merge branch 'master' into gg/large-tiles-writer

b5fbcd6


          Merge branch 'master' into gg/large-tiles-writer

25cec74


          Add FIXME wrt stegment.Tail.Finalize

030a7c7

linasm reviewed

View reviewed changes

src/dbnode/persist/fs/streaming_write.go Outdated

Comment on lines 130 to 139

+              	stream, ok := encoder.Stream(ctx)
+              	if !ok {
+              		// None of the datapoints passed the predicate.
+              		return nil
+              	}
+              	defer stream.Finalize()
+              	segment, err := stream.Segment()
+              	if err != nil {
+              		return err
+              	}

Collaborator

linasm Sep 15, 2020

Wouldn't it be better just to do segment := encoder.Discard() here?

Collaborator

linasm Sep 17, 2020

Will change it.


          Merge branch 'master' into gg/large-tiles-writer

9c6d582

robskillington reviewed

View reviewed changes

src/dbnode/persist/fs/streaming_write.go Outdated

+              	}
+              	m, k := bloom.EstimateFalsePositiveRate(
+              		opts.PlannedRecordsCount,

Collaborator

robskillington Sep 17, 2020

Perhaps check that this is not zero? Otherwise you will get a really bad "m" and "k", I would make this being non-zero a requirement.

Collaborator

linasm Sep 17, 2020

Probably there can exist empty (zero-record) filesets? Then I guess in case of 0 I'll just bump it to 1 in order not to burden the call site with this requirement.

Collaborator

linasm Sep 17, 2020

Also making the same change in the old writer (the changed test suite pointed this out to me), hope it's fine.

robskillington reviewed

View reviewed changes

src/dbnode/persist/fs/streaming_write.go Outdated

+              func (w *streamingWriter) Write(
+              	ctx context.Context,
+              	encoder encoding.Encoder,

Collaborator

robskillington Sep 17, 2020

I would just take a []byte here for encoded data so that the caller can control the lifecycle of the data, etc.

Collaborator

robskillington Sep 17, 2020

Would structure the args similar to the other fs.Writer so that Head/Tail can be passed in but also allow a single buffer to be written:

func (w *streamingWriter) Write(
  id ident.ID,
  encodedTags []byte,
  data []byte,
) error {
  w.singleBufferArray[0] = data
  return w.WriteAll(id, encodedTags, w.singleBufferArray)
}

func (w *streamingWriter) WriteAll(
  id ident.ID,
  encodedTags []byte,
  data [][]byte,
) error {
 // .. write multiple data buffers
}

Collaborator

linasm Sep 17, 2020

Got it. I think we need just the WriteAll though.

robskillington reviewed

View reviewed changes

src/dbnode/persist/fs/streaming_write.go Outdated

+              	}
+              	// Warning: metadata is not set here and should not be used anywhere below.
+              	entry := &indexEntry{

Collaborator

robskillington Sep 17, 2020

Would avoid using a pointer here, just use a struct and return by value so that this isn't a heap allocation?

Collaborator

robskillington Sep 17, 2020

If you need to be able to return nil to imply empty, just return a bool instead? i.e. (indexEntry, bool, error) as return type and check the bool.

Collaborator

linasm Sep 17, 2020

👍

robskillington reviewed

View reviewed changes

src/dbnode/persist/fs/streaming_write.go Outdated

+              		return nil, nil
+              	}
+              	// Warning: metadata is not set here and should not be used anywhere below.

Collaborator

robskillington Sep 17, 2020

This seems dangerous, can we remove metadata from the indexEntry altogether? If the default fs.Writer needs can we add a container type, i.e.

type indexEntryAndMetadata struct {
  entry indexEntry
  metadata // ... metadata type
}

Collaborator

linasm Sep 17, 2020

Good point, this makes things cleaner all over the place.

linasm and others added 2 commits

September 17, 2020 22:18


          Address PR feedback

8ba1de2


          Merge branch 'master' into gg/large-tiles-writer

772b395

linasm requested a review from robskillington

September 17, 2020 19:18

linasm changed the title ~~[dbnode] Large tiles writer~~ [dbnode] Streaming writer


          Merge branch 'master' into gg/large-tiles-writer

ff028c6

robskillington approved these changes

View reviewed changes

Collaborator

robskillington left a comment

LGTM

linasm merged commit 3476b4e into master

linasm deleted the gg/large-tiles-writer branch

September 18, 2020 14:24

ChrisChinchilla pushed a commit that referenced this pull request


          Squashed commit of the following:

00c8613

commit 4f3778dded83ad4d81aaeaad62608f6c7a0f9461
Author: ChrisChinchilla <[email protected]>
Date:   Mon Oct 5 12:53:40 2020 +0200

    fix code inclusion

    Signed-off-by: ChrisChinchilla <[email protected]>

commit daefb435243ed98041744c520d52ac4a92421694
Author: ChrisChinchilla <[email protected]>
Date:   Mon Oct 5 12:53:34 2020 +0200

    Add redirects

    Signed-off-by: ChrisChinchilla <[email protected]>

commit 612cfbb319b30177501530f9cd7bdd626ea6b107
Author: ChrisChinchilla <[email protected]>
Date:   Mon Oct 5 12:09:40 2020 +0200

    Remove versioning for now

    Signed-off-by: ChrisChinchilla <[email protected]>

commit cda48e18dce956a4e684dd4d6d7943df808e6846
Author: ChrisChinchilla <[email protected]>
Date:   Fri Oct 2 18:09:35 2020 +0200

    Switch netlify directory

    Signed-off-by: ChrisChinchilla <[email protected]>

commit c5d0b1decc95d042a565cbe97d25c9eda94549a9
Author: ChrisChinchilla <[email protected]>
Date:   Fri Oct 2 18:00:34 2020 +0200

    Move theme to module

    Signed-off-by: ChrisChinchilla <[email protected]>

commit 855d8c7af1bf45978ed180469ecb1b753ec1abdb
Author: ChrisChinchilla <[email protected]>
Date:   Thu Oct 1 14:59:14 2020 +0200

    Netlify dev

    Signed-off-by: ChrisChinchilla <[email protected]>

commit a8a8eda0f2f3b8fa1b3aefa2316942ff692dc8e5
Author: ChrisChinchilla <[email protected]>
Date:   Thu Oct 1 14:03:34 2020 +0200

    Update Hugo version

    Signed-off-by: ChrisChinchilla <[email protected]>

commit f19ffadb7caad18663dad99ec5230e8357125954
Author: ChrisChinchilla <[email protected]>
Date:   Thu Oct 1 13:54:41 2020 +0200

    Random file to fix odd git issues

    Signed-off-by: ChrisChinchilla <[email protected]>

commit a6413db7891f2f741510e4b36300e3cb93904f1d
Author: ChrisChinchilla <[email protected]>
Date:   Thu Oct 1 13:11:42 2020 +0200

    Update versions

    Signed-off-by: ChrisChinchilla <[email protected]>

commit 9919f004633ace0f88bb9cb721a6373416f28781
Author: ChrisChinchilla <[email protected]>
Date:   Thu Oct 1 12:45:25 2020 +0200

    Convert docs theme to module

    Signed-off-by: ChrisChinchilla <[email protected]>

commit 3a1e013338b6671eb601df3f5cbdd89d770ca171
Author: ChrisChinchilla <[email protected]>
Date:   Thu Oct 1 12:27:31 2020 +0200

    Remove subtree again, not working

    Signed-off-by: ChrisChinchilla <[email protected]>

commit 282ca2d870eb656ba15e20ade3d8aec6493523fe
Author: ChrisChinchilla <[email protected]>
Date:   Thu Oct 1 12:20:11 2020 +0200

    Add versions to config

    Signed-off-by: ChrisChinchilla <[email protected]>

commit 73f03d34990b740d5bb1c57df1c5fbb328d0f701
Author: ChrisChinchilla <[email protected]>
Date:   Wed Sep 30 18:29:35 2020 +0200

    Consolidate all commits

commit f2ebf5c
Author: teddywahle <[email protected]>
Date:   Mon Sep 21 12:33:37 2020 -0700

    [query] Implemented the Graphite `integralByInterval` function  (#2596)

commit a66fb7d
Author: arnikola <[email protected]>
Date:   Mon Sep 21 14:33:57 2020 -0400

    [dbnode] Tile iterators for wide aggregations (#2646)

commit 9ea5682
Author: teddywahle <[email protected]>
Date:   Sun Sep 20 21:50:31 2020 -0700

    [query] Implemented the Graphite `divideSeriesLists` function (#2585)

commit 35cac59
Author: Rob Skillington <[email protected]>
Date:   Mon Sep 21 00:21:58 2020 -0400

    [coordinator] Update OpenAPI specs for namespace update endpoint (#2629)

commit ef83ec4
Author: Rob Skillington <[email protected]>
Date:   Sun Sep 20 21:57:26 2020 -0400

    [changelog] Add changelog for 0.15.15 (#2649)

commit 091f833
Author: Rob Skillington <[email protected]>
Date:   Fri Sep 18 11:30:57 2020 -0400

    [coordinator] Allow configuration of tag validation (#2647)

commit 3476b4e
Author: Gediminas Guoba <[email protected]>
Date:   Fri Sep 18 17:24:09 2020 +0300

    [dbnode] Streaming writer (#2618)

    * [dbnode] Large tiles writer

    * minor refactorings

    * minor refactoring

    * Skip tagsEncoder, use encodedTags directly

    * Rename LargeTilesWriter to StreamingWriter

    * Add FIXME wrt stegment.Tail.Finalize

    * Address PR feedback

    Co-authored-by: Linas Medziunas <[email protected]>
    Co-authored-by: Linas Medžiūnas <[email protected]>

commit 88164cf
Author: Ryan Hall <[email protected]>
Date:   Thu Sep 17 16:38:04 2020 -0700

    Only read the commit log once during bootstrapping (#2645)

    * Only read the commit log once during bootstrapping

    A recent refactoring of cold writes ( #2508) introduced a regression that increases the
    chances the commit log is read twice while bootstrapping. The referenced PR changed the commitlog
    bootstrapper to read all requested time ranges, even if a range had been fulfilled by a previous
    bootstrapper. This was necessary since the commitlog may have cold writes that were never commmited
    to a fileset. The fileystem bootstrapper would report a time range as fulfilled, but might be missing
    cold writes only in the commit log.

    It should be noted this bug was always theoretically possible, but unlikely since the commitlog bootstrapper
    typically wouldn't run in the first pass (cold time ranges) since the filesystem would fulfill all cold ranges
    and short circuit the first pass of the boostrapper.

    This change only reads the commit log on the first pass of the boostrapper and caches the result to skip reading
    it in subsequent passes. It doesn't actually matter which pass we read the commit log, the first was just chosen
    arbitrarily.

    Other attempts at fixing this bug attempted to disable the entire commit log bootstrapper during a pass, but that's
    not possible since the commit log bootstrapper is actually 2 bootstrappers in one, both the the commit log and snapshot
    files. To minimize the refactoring changes we still want to only read the snapshot files of the requested ranges.

commit 3d2915f
Author: nate <[email protected]>
Date:   Thu Sep 17 13:03:48 2020 -0400

    [dbnode] Make caching after block retrieval a configuration option (#2613)

commit 0ef7aba
Author: nate <[email protected]>
Date:   Thu Sep 17 12:33:53 2020 -0400

    [docs] Add documentation on fileset migrations (#2630)

Signed-off-by: ChrisChinchilla <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

linasm linasm left review comments

ryanhall07 ryanhall07 left review comments

mmihic mmihic left review comments

robskillington robskillington approved these changes

arnikola Awaiting requested review from arnikola

Labels