Remove CacheAllMetadata caching policy #1110

justinjc · 2018-10-17T20:09:46Z

No description provided.

prateek · 2018-10-17T20:12:55Z

src/dbnode/storage/block/types.go

@@ -168,10 +168,6 @@ type DatabaseBlock interface {
 	// merged during Stream().
 	HasMergeTarget() bool

-	// IsRetrieved returns whether the block is already retrieved. Only


Hm, shouldn't this be required for some cache policies (e.g. LRU/RecentlyRead)?

I trusted the comment! I believe the other caching policies use IsCachedBlock() and "retrieved" means we have metadata but we may not have the data for it.

Yeah I think IsRetrieved was specific to that policy (you can see the same logic implemented in the IsCachedBlock function. I think we definitely want to get rid of this function and move towards using IsCachedBlock or honestly at this point we can delete IsCachedBlock as well and only use WasRetrievedFromDisk since they will have the exact same logic at that point. The only place we use IsCachedBlock is in series.FetchBlocksMetadata and its just an optimization there (and not a particularly accurate one at that since it doesn't work in the recently_read policy)

Also, the only place where i see IsRetrieved being used outside the context of the CacheAllMetadata policy is in the series buffer:

func (b *dbBufferBucket) merge() (mergeResult, error) { if !b.needsMerge() { // Save unnecessary work return mergeResult{}, nil } merges := 0 bopts := b.opts.DatabaseBlockOptions() encoder := bopts.EncoderPool().Get() encoder.Reset(b.start, bopts.DatabaseBlockAllocSize()) // If we have to merge bootstrapped from disk during a merge then this // can make ticking very slow, ensure to notify this bug if len(b.bootstrapped) > 0 { unretrieved := 0 for i := range b.bootstrapped { if !b.bootstrapped[i].IsRetrieved() { unretrieved++ } } if unretrieved > 0 { log := b.opts.InstrumentOptions().Logger() log.Warnf("buffer merging %d unretrieved blocks", unretrieved) } }

This is a very weird code path that I don't think we would ever follow at this point. If we want to be extra cautious could replace that IsRetrieved() call with WasReadFromDisk()/IsCachedBlock as an invariant violated.

@robskillington

Yeah let's remove it, don't think this is plausible now.

prateek · 2018-10-17T20:14:12Z

src/dbnode/storage/shard.go

 		// API will only return active block metadata (mutable and cached)
 		// hence this call is invalid
 		return nil, nil, fmt.Errorf(
 			"fetch blocks metadata v1 endpoint invalid with cache policy: %s",
 			s.opts.SeriesCachePolicy().String())
 	}

-	// For v1 endpoint we always include cached blocks because when using
-	// CacheAllMetadata the blocks will appear cached
+	// For v1 endpoint we always include cached blocks


Should be able to remove the V1 endpoint code at this point too /cc @robskillington

delete all the things. We agreed a long time ago we could delete v1 API code whenever we wanted because if we ever needed to do a migration we can always just roll an older version of M3DB.

@justinjc I would say just delete all the V1 endpoint code and tests. In addition, we can rename all the "v2" code to not use the word "v2" anymore (except the thrift API will still include v2 in the name because its not worth doing a migration to change that)

lets definitely def delete all V1 code/tests. Don't do the rename tho - the 1-1 parity between go methods and the thrift code is nice. Plus it means if we ever add v3 or w/e it'd be a lot more explicit

prateek · 2018-10-17T20:14:42Z

@justinjc could you update the buildkite pipelines to remove the integration test target for this policy

richardartoul · 2018-10-18T15:02:19Z

src/dbnode/storage/block/block.go

@@ -263,13 +263,6 @@ func (b *dbBlock) HasMergeTarget() bool {
 	return hasMergeTarget
 }

-func (b *dbBlock) IsRetrieved() bool {


Thank god, I never understood this. Does this mean you can remove the retriever field from the block as well as any logic dependent on that being present?

The following functions still seem to be using the retriever field:

streamWithRLock resetNewBlockStartWithLock resetRetrievableWithLock

We might even get a small perf boost cause this is will be million of less pointers for the G.C to chase

richardartoul · 2018-10-18T15:12:26Z

src/dbnode/storage/bootstrap/bootstrapper/peers/source.go

-// retrievable (so that we can retrieve data for the blocks that we're caching
-// the metadata for).
-// In addition, if the caching policy is not CacheAll or CacheAllMetadata, then
+// caching policy the function is either done.


This sentence terminates too early now

richardartoul · 2018-10-18T15:14:05Z

src/dbnode/storage/series/series.go

@@ -235,29 +226,9 @@ func (s *dbSeries) updateBlocksWithLock() (updateBlocksResult, error) {
 		}

 		if shouldUnwire {
-			switch cachePolicy {


So much cleanup :D

richardartoul · 2018-10-18T15:29:53Z

src/dbnode/storage/block/block.go

-	b.RUnlock()
-	return retrieved
-}
-
 func (b *dbBlock) WasRetrievedFromDisk() bool {


Another random thing, I think we might be able to delete the OnRetrieveBlock method in the DatabaseBlock. I think its primary purpose was to allow us to separate the following two actions:

Resetting a block
and

Fetching the blocks data

Without the CacheAllMetadata policy, these two steps will never be separate so you can probably combine it all into one function call. For example in the series we do this:

b.ResetRetrievable(startTime, blockSize, s.blockRetriever, metadata) // Use s.id instead of id here, because id is finalized by the context whereas // we rely on the G.C to reclaim s.id. This is important because the block will // hold onto the id ref, and (if the LRU caching policy is enabled) the shard // will need it later when the WiredList calls its OnEvictedFromWiredList method. // Also note that ResetRetrievable will mark the block as not retrieved from disk, // but OnRetrieveBlock will then properly mark it as retrieved from disk so subsequent // calls to WasRetrievedFromDisk will return true. b.OnRetrieveBlock(s.id, tags, startTime, segment)

It seems like that could just become

b.Reset(startTime, blockSize, segment)

And then we could delete the OnRetrieveBlock and ResetRetrievable methods (and their helper methods like resetRetrievableWithLock) since they're only required in a world where data can be lazily fetched

@robskillington

codecov · 2018-10-18T22:18:06Z

Codecov Report

Merging #1110 into master will decrease coverage by 0.18%.
The diff coverage is 80.76%.

@@            Coverage Diff             @@
##           master    #1110      +/-   ##
==========================================
- Coverage   70.17%   69.99%   -0.19%     
==========================================
  Files         670      669       -1     
  Lines       55643    55082     -561     
==========================================
- Hits        39050    38556     -494     
+ Misses      14072    14025      -47     
+ Partials     2521     2501      -20

Flag	Coverage Δ
#aggregator	`81.63% <ø> (ø)`	⬆️
#collector	`78.13% <ø> (ø)`	⬆️
#dbnode	`81.11% <80.76%> (-0.25%)`	⬇️
#m3em	`73.21% <ø> (ø)`	⬆️
#m3ninx	`75.33% <ø> (ø)`	⬆️
#m3nsch	`51.19% <ø> (ø)`	⬆️
#metrics	`18.38% <ø> (ø)`	⬆️
#msg	`75.11% <ø> (+0.12%)`	⬆️
#query	`63.64% <ø> (+0.06%)`	⬆️
#x	`75.1% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1504486...2b5ac80. Read the comment docs.

robskillington · 2018-10-22T17:01:23Z

src/dbnode/client/session.go

-					// ever called
-					err = xerrors.NewNonRetryableError(errInvalidFetchBlocksMetadataVersion)
-				}
+				currPageToken, err = s.streamBlocksMetadataFromPeerV2(namespace, shardID,


nit: Maybe rename streamBlocksMetadataFromPeerV2 to just streamBlocksMetadataFromPeer?

robskillington · 2018-10-22T17:02:05Z

src/dbnode/client/session.go

@@ -2034,168 +1993,6 @@ func (s *session) streamBlocksMetadataFromPeers(
 // TODO(r): Delete this once we delete the V1 code path
 type pageToken interface{}


We can delete this now too, since the page token will always be of the same type now (that is []byte).

yeah I would do a grep for "Delete this once we delete the V1 code path" I think I used that exact string in a bunch of places so we could grep for it all

robskillington · 2018-10-22T17:03:02Z

src/dbnode/client/session_fetch_bulk_blocks_test.go

-		} else {
-			expectFetchMetadataAndReturn(client, result, opts)
-		}
+		expectFetchMetadataAndReturnV2(client, result, opts)


nit: Maybe we can rename this from expectFetchMetadataAndReturnV2 to just expectFetchMetadataAndReturn?

robskillington · 2018-10-22T17:03:26Z

src/dbnode/client/session_fetch_bulk_blocks_test.go

@@ -2198,64 +2003,12 @@ func expectFetchMetadataAndReturnV2(
 type fetchMetadataReqMatcher struct {
 	shard        int32
 	limit        int64
-	pageToken    *int64
 	pageTokenV2  []byte


nit: Maybe we can rename pageTokenV2 to pageToken now?

robskillington · 2018-10-22T17:14:40Z

src/dbnode/storage/block/block.go

@@ -55,7 +55,6 @@ type dbBlock struct {

 	mergeTarget DatabaseBlock

-	retriever  DatabaseShardBlockRetriever
 	retrieveID ident.ID


Can you rename retrieveID since it's not used for retrieving any more, it's just used for accounting purposes, mainly in the wired list.

So just rename it to seriesID here and in the wiredListEntry it should also be renamed to seriesID.

That sounds good?

richardartoul · 2018-10-22T17:28:07Z

src/dbnode/storage/block/block.go

@@ -186,26 +167,6 @@ func (b *dbBlock) Checksum() (uint32, error) {
 	return b.checksum, nil
 }

-func (b *dbBlock) OnRetrieveBlock(


@justinjc Now that you've deleted this method, how does the seriesID get set? I think you might need to make seriesID a required argument for the Reset method otherwise it will mess up the LRU caching policy

As discussed offline, going to make a separate ResetFromDisk function that's basically Reset with setting the seriesID

robskillington · 2018-10-22T19:22:37Z

src/dbnode/storage/block/wired_list.go

+					xlog.NewField("wasRetrievedFromDisk", entry.wasRetrievedFromDisk),
+				).Errorf("wired list tried to close a block that was not from disk")
+			}
+			onEvict.OnEvictedFromWiredList(entry.seriesID, entry.startTime)


I believe yeah we only want to do this in the else case to this if condition, otherwise we're sending an arg down the callchain that might not be nil-checked later.

robskillington · 2018-10-22T19:23:23Z

src/dbnode/storage/types.go

@@ -371,7 +371,7 @@ type databaseShard interface {
 		starts []time.Time,
 	) ([]block.FetchBlockResult, error)

-	// FetchBlocksMetadataV2 retrieves blocks metadata.
+	// FetchBlocksMetadata retrieves blocks metadata.
 	FetchBlocksMetadataV2(


Hm did you mean to update this method too? Only the comment got updated.

Oops, I think we want to keep the "V2" in the name in this one because it's part of the chain from RPC code. I'll revert this comment.

robskillington

LGTM when build is passing

justinjc added 2 commits October 17, 2018 16:07

Remove CacheAllMetadata caching policy

69546a4

Regenerate mock

c252c9b

prateek reviewed Oct 17, 2018

View reviewed changes

Remove buildkite pipeline

a289cd7

richardartoul reviewed Oct 18, 2018

View reviewed changes

justinjc added 2 commits October 18, 2018 18:06

Remove retriever from block

449854f

Remove retrievable properties and methods

8b26fe7

Remove FetchMetdata v1 code

c8663c3

justinjc force-pushed the juchan/rip-cacheallmetadata branch from 7a9ff29 to c8663c3 Compare October 18, 2018 22:58

justinjc added 3 commits October 19, 2018 10:49

Remove more v1 code

3336894

Fix unit test

37f4cb8

Remove more FetchBlocksMetadata (v1) code

444880e

robskillington reviewed Oct 22, 2018

View reviewed changes

richardartoul reviewed Oct 22, 2018

View reviewed changes

justinjc added 2 commits October 22, 2018 14:46

Remove 'V2' from some functions; more clean up

0a41825

Fix test

a86d18f

robskillington reviewed Oct 22, 2018

View reviewed changes

justinjc added 2 commits October 22, 2018 15:24

Fix invariant violation log

296be93

Fix comment

2b5ac80

robskillington approved these changes Oct 22, 2018

View reviewed changes

justinjc merged commit 6c086cf into master Oct 22, 2018

justinjc deleted the juchan/rip-cacheallmetadata branch October 22, 2018 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove CacheAllMetadata caching policy #1110

Remove CacheAllMetadata caching policy #1110

justinjc commented Oct 17, 2018

prateek Oct 17, 2018

justinjc Oct 17, 2018

richardartoul Oct 18, 2018 •

edited

Loading

robskillington Oct 22, 2018

prateek Oct 17, 2018

richardartoul Oct 18, 2018 •

edited

Loading

prateek Oct 19, 2018

prateek commented Oct 17, 2018

richardartoul Oct 18, 2018

richardartoul Oct 18, 2018

richardartoul Oct 18, 2018

richardartoul Oct 18, 2018

richardartoul Oct 18, 2018

codecov bot commented Oct 18, 2018 •

edited

Loading

robskillington Oct 22, 2018

robskillington Oct 22, 2018

richardartoul Oct 22, 2018

robskillington Oct 22, 2018

robskillington Oct 22, 2018

robskillington Oct 22, 2018

richardartoul Oct 22, 2018

justinjc Oct 22, 2018

robskillington Oct 22, 2018

robskillington Oct 22, 2018 •

edited

Loading

justinjc Oct 22, 2018

robskillington left a comment

		@@ -2034,168 +1993,6 @@ func (s *session) streamBlocksMetadataFromPeers(
		// TODO(r): Delete this once we delete the V1 code path
		type pageToken interface{}

Remove CacheAllMetadata caching policy #1110

Remove CacheAllMetadata caching policy #1110

Conversation

justinjc commented Oct 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardartoul Oct 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardartoul Oct 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prateek commented Oct 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 18, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Oct 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington left a comment

Choose a reason for hiding this comment

richardartoul Oct 18, 2018 •

edited

Loading

richardartoul Oct 18, 2018 •

edited

Loading

codecov bot commented Oct 18, 2018 •

edited

Loading

robskillington Oct 22, 2018 •

edited

Loading