Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove CacheAllMetadata caching policy #1110

Merged
merged 13 commits into from
Oct 22, 2018
Merged

Conversation

justinjc
Copy link
Collaborator

No description provided.

@@ -168,10 +168,6 @@ type DatabaseBlock interface {
// merged during Stream().
HasMergeTarget() bool

// IsRetrieved returns whether the block is already retrieved. Only
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, shouldn't this be required for some cache policies (e.g. LRU/RecentlyRead)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trusted the comment! I believe the other caching policies use IsCachedBlock() and "retrieved" means we have metadata but we may not have the data for it.

Copy link
Contributor

@richardartoul richardartoul Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think IsRetrieved was specific to that policy (you can see the same logic implemented in the IsCachedBlock function. I think we definitely want to get rid of this function and move towards using IsCachedBlock or honestly at this point we can delete IsCachedBlock as well and only use WasRetrievedFromDisk since they will have the exact same logic at that point. The only place we use IsCachedBlock is in series.FetchBlocksMetadata and its just an optimization there (and not a particularly accurate one at that since it doesn't work in the recently_read policy)

Also, the only place where i see IsRetrieved being used outside the context of the CacheAllMetadata policy is in the series buffer:

func (b *dbBufferBucket) merge() (mergeResult, error) {
	if !b.needsMerge() {
		// Save unnecessary work
		return mergeResult{}, nil
	}

	merges := 0
	bopts := b.opts.DatabaseBlockOptions()
	encoder := bopts.EncoderPool().Get()
	encoder.Reset(b.start, bopts.DatabaseBlockAllocSize())

	// If we have to merge bootstrapped from disk during a merge then this
	// can make ticking very slow, ensure to notify this bug
	if len(b.bootstrapped) > 0 {
		unretrieved := 0
		for i := range b.bootstrapped {
			if !b.bootstrapped[i].IsRetrieved() {
				unretrieved++
			}
		}
		if unretrieved > 0 {
			log := b.opts.InstrumentOptions().Logger()
			log.Warnf("buffer merging %d unretrieved blocks", unretrieved)
		}
	}

This is a very weird code path that I don't think we would ever follow at this point. If we want to be extra cautious could replace that IsRetrieved() call with WasReadFromDisk()/IsCachedBlock as an invariant violated.

@robskillington

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's remove it, don't think this is plausible now.

// API will only return active block metadata (mutable and cached)
// hence this call is invalid
return nil, nil, fmt.Errorf(
"fetch blocks metadata v1 endpoint invalid with cache policy: %s",
s.opts.SeriesCachePolicy().String())
}

// For v1 endpoint we always include cached blocks because when using
// CacheAllMetadata the blocks will appear cached
// For v1 endpoint we always include cached blocks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to remove the V1 endpoint code at this point too /cc @robskillington

Copy link
Contributor

@richardartoul richardartoul Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete all the things. We agreed a long time ago we could delete v1 API code whenever we wanted because if we ever needed to do a migration we can always just roll an older version of M3DB.

@justinjc I would say just delete all the V1 endpoint code and tests. In addition, we can rename all the "v2" code to not use the word "v2" anymore (except the thrift API will still include v2 in the name because its not worth doing a migration to change that)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets definitely def delete all V1 code/tests. Don't do the rename tho - the 1-1 parity between go methods and the thrift code is nice. Plus it means if we ever add v3 or w/e it'd be a lot more explicit

@prateek
Copy link
Collaborator

prateek commented Oct 17, 2018

@justinjc could you update the buildkite pipelines to remove the integration test target for this policy

@@ -263,13 +263,6 @@ func (b *dbBlock) HasMergeTarget() bool {
return hasMergeTarget
}

func (b *dbBlock) IsRetrieved() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank god, I never understood this. Does this mean you can remove the retriever field from the block as well as any logic dependent on that being present?

The following functions still seem to be using the retriever field:

streamWithRLock
resetNewBlockStartWithLock
resetRetrievableWithLock

We might even get a small perf boost cause this is will be million of less pointers for the G.C to chase

// retrievable (so that we can retrieve data for the blocks that we're caching
// the metadata for).
// In addition, if the caching policy is not CacheAll or CacheAllMetadata, then
// caching policy the function is either done.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence terminates too early now

@@ -235,29 +226,9 @@ func (s *dbSeries) updateBlocksWithLock() (updateBlocksResult, error) {
}

if shouldUnwire {
switch cachePolicy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much cleanup :D

b.RUnlock()
return retrieved
}

func (b *dbBlock) WasRetrievedFromDisk() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another random thing, I think we might be able to delete the OnRetrieveBlock method in the DatabaseBlock. I think its primary purpose was to allow us to separate the following two actions:

  1. Resetting a block
    and
  2. Fetching the blocks data

Without the CacheAllMetadata policy, these two steps will never be separate so you can probably combine it all into one function call. For example in the series we do this:

b.ResetRetrievable(startTime, blockSize, s.blockRetriever, metadata)
	// Use s.id instead of id here, because id is finalized by the context whereas
	// we rely on the G.C to reclaim s.id. This is important because the block will
	// hold onto the id ref, and (if the LRU caching policy is enabled) the shard
	// will need it later when the WiredList calls its OnEvictedFromWiredList method.
	// Also note that ResetRetrievable will mark the block as not retrieved from disk,
	// but OnRetrieveBlock will then properly mark it as retrieved from disk so subsequent
	// calls to WasRetrievedFromDisk will return true.
	b.OnRetrieveBlock(s.id, tags, startTime, segment)

It seems like that could just become

b.Reset(startTime, blockSize, segment)

And then we could delete the OnRetrieveBlock and ResetRetrievable methods (and their helper methods like resetRetrievableWithLock) since they're only required in a world where data can be lazily fetched

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Oct 18, 2018

Codecov Report

Merging #1110 into master will decrease coverage by 0.18%.
The diff coverage is 80.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1110      +/-   ##
==========================================
- Coverage   70.17%   69.99%   -0.19%     
==========================================
  Files         670      669       -1     
  Lines       55643    55082     -561     
==========================================
- Hits        39050    38556     -494     
+ Misses      14072    14025      -47     
+ Partials     2521     2501      -20
Flag Coverage Δ
#aggregator 81.63% <ø> (ø) ⬆️
#collector 78.13% <ø> (ø) ⬆️
#dbnode 81.11% <80.76%> (-0.25%) ⬇️
#m3em 73.21% <ø> (ø) ⬆️
#m3ninx 75.33% <ø> (ø) ⬆️
#m3nsch 51.19% <ø> (ø) ⬆️
#metrics 18.38% <ø> (ø) ⬆️
#msg 75.11% <ø> (+0.12%) ⬆️
#query 63.64% <ø> (+0.06%) ⬆️
#x 75.1% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1504486...2b5ac80. Read the comment docs.

@justinjc justinjc force-pushed the juchan/rip-cacheallmetadata branch from 7a9ff29 to c8663c3 Compare October 18, 2018 22:58
// ever called
err = xerrors.NewNonRetryableError(errInvalidFetchBlocksMetadataVersion)
}
currPageToken, err = s.streamBlocksMetadataFromPeerV2(namespace, shardID,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe rename streamBlocksMetadataFromPeerV2 to just streamBlocksMetadataFromPeer?

@@ -2034,168 +1993,6 @@ func (s *session) streamBlocksMetadataFromPeers(
// TODO(r): Delete this once we delete the V1 code path
type pageToken interface{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can delete this now too, since the page token will always be of the same type now (that is []byte).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I would do a grep for "Delete this once we delete the V1 code path" I think I used that exact string in a bunch of places so we could grep for it all

} else {
expectFetchMetadataAndReturn(client, result, opts)
}
expectFetchMetadataAndReturnV2(client, result, opts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe we can rename this from expectFetchMetadataAndReturnV2 to just expectFetchMetadataAndReturn?

@@ -2198,64 +2003,12 @@ func expectFetchMetadataAndReturnV2(
type fetchMetadataReqMatcher struct {
shard int32
limit int64
pageToken *int64
pageTokenV2 []byte
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe we can rename pageTokenV2 to pageToken now?

@@ -55,7 +55,6 @@ type dbBlock struct {

mergeTarget DatabaseBlock

retriever DatabaseShardBlockRetriever
retrieveID ident.ID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename retrieveID since it's not used for retrieving any more, it's just used for accounting purposes, mainly in the wired list.

So just rename it to seriesID here and in the wiredListEntry it should also be renamed to seriesID.

That sounds good?

@@ -186,26 +167,6 @@ func (b *dbBlock) Checksum() (uint32, error) {
return b.checksum, nil
}

func (b *dbBlock) OnRetrieveBlock(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinjc Now that you've deleted this method, how does the seriesID get set? I think you might need to make seriesID a required argument for the Reset method otherwise it will mess up the LRU caching policy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, going to make a separate ResetFromDisk function that's basically Reset with setting the seriesID

xlog.NewField("wasRetrievedFromDisk", entry.wasRetrievedFromDisk),
).Errorf("wired list tried to close a block that was not from disk")
}
onEvict.OnEvictedFromWiredList(entry.seriesID, entry.startTime)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe yeah we only want to do this in the else case to this if condition, otherwise we're sending an arg down the callchain that might not be nil-checked later.

@@ -371,7 +371,7 @@ type databaseShard interface {
starts []time.Time,
) ([]block.FetchBlockResult, error)

// FetchBlocksMetadataV2 retrieves blocks metadata.
// FetchBlocksMetadata retrieves blocks metadata.
FetchBlocksMetadataV2(
Copy link
Collaborator

@robskillington robskillington Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm did you mean to update this method too? Only the comment got updated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I think we want to keep the "V2" in the name in this one because it's part of the chain from RPC code. I'll revert this comment.

Copy link
Collaborator

@robskillington robskillington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when build is passing

@justinjc justinjc merged commit 6c086cf into master Oct 22, 2018
@justinjc justinjc deleted the juchan/rip-cacheallmetadata branch October 22, 2018 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants