-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove CacheAllMetadata caching policy #1110
Conversation
@@ -168,10 +168,6 @@ type DatabaseBlock interface { | |||
// merged during Stream(). | |||
HasMergeTarget() bool | |||
|
|||
// IsRetrieved returns whether the block is already retrieved. Only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, shouldn't this be required for some cache policies (e.g. LRU/RecentlyRead)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I trusted the comment! I believe the other caching policies use IsCachedBlock()
and "retrieved" means we have metadata but we may not have the data for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think IsRetrieved
was specific to that policy (you can see the same logic implemented in the IsCachedBlock
function. I think we definitely want to get rid of this function and move towards using IsCachedBlock
or honestly at this point we can delete IsCachedBlock
as well and only use WasRetrievedFromDisk
since they will have the exact same logic at that point. The only place we use IsCachedBlock
is in series.FetchBlocksMetadata
and its just an optimization there (and not a particularly accurate one at that since it doesn't work in the recently_read
policy)
Also, the only place where i see IsRetrieved
being used outside the context of the CacheAllMetadata
policy is in the series buffer:
func (b *dbBufferBucket) merge() (mergeResult, error) {
if !b.needsMerge() {
// Save unnecessary work
return mergeResult{}, nil
}
merges := 0
bopts := b.opts.DatabaseBlockOptions()
encoder := bopts.EncoderPool().Get()
encoder.Reset(b.start, bopts.DatabaseBlockAllocSize())
// If we have to merge bootstrapped from disk during a merge then this
// can make ticking very slow, ensure to notify this bug
if len(b.bootstrapped) > 0 {
unretrieved := 0
for i := range b.bootstrapped {
if !b.bootstrapped[i].IsRetrieved() {
unretrieved++
}
}
if unretrieved > 0 {
log := b.opts.InstrumentOptions().Logger()
log.Warnf("buffer merging %d unretrieved blocks", unretrieved)
}
}
This is a very weird code path that I don't think we would ever follow at this point. If we want to be extra cautious could replace that IsRetrieved() call with WasReadFromDisk()
/IsCachedBlock
as an invariant violated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's remove it, don't think this is plausible now.
src/dbnode/storage/shard.go
Outdated
// API will only return active block metadata (mutable and cached) | ||
// hence this call is invalid | ||
return nil, nil, fmt.Errorf( | ||
"fetch blocks metadata v1 endpoint invalid with cache policy: %s", | ||
s.opts.SeriesCachePolicy().String()) | ||
} | ||
|
||
// For v1 endpoint we always include cached blocks because when using | ||
// CacheAllMetadata the blocks will appear cached | ||
// For v1 endpoint we always include cached blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to remove the V1 endpoint code at this point too /cc @robskillington
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete all the things. We agreed a long time ago we could delete v1 API code whenever we wanted because if we ever needed to do a migration we can always just roll an older version of M3DB.
@justinjc I would say just delete all the V1 endpoint code and tests. In addition, we can rename all the "v2" code to not use the word "v2" anymore (except the thrift API will still include v2 in the name because its not worth doing a migration to change that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets definitely def delete all V1 code/tests. Don't do the rename tho - the 1-1 parity between go methods and the thrift code is nice. Plus it means if we ever add v3 or w/e it'd be a lot more explicit
@justinjc could you update the buildkite pipelines to remove the integration test target for this policy |
@@ -263,13 +263,6 @@ func (b *dbBlock) HasMergeTarget() bool { | |||
return hasMergeTarget | |||
} | |||
|
|||
func (b *dbBlock) IsRetrieved() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank god, I never understood this. Does this mean you can remove the retriever field from the block as well as any logic dependent on that being present?
The following functions still seem to be using the retriever field:
streamWithRLock
resetNewBlockStartWithLock
resetRetrievableWithLock
We might even get a small perf boost cause this is will be million of less pointers for the G.C to chase
// retrievable (so that we can retrieve data for the blocks that we're caching | ||
// the metadata for). | ||
// In addition, if the caching policy is not CacheAll or CacheAllMetadata, then | ||
// caching policy the function is either done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence terminates too early now
@@ -235,29 +226,9 @@ func (s *dbSeries) updateBlocksWithLock() (updateBlocksResult, error) { | |||
} | |||
|
|||
if shouldUnwire { | |||
switch cachePolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much cleanup :D
b.RUnlock() | ||
return retrieved | ||
} | ||
|
||
func (b *dbBlock) WasRetrievedFromDisk() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another random thing, I think we might be able to delete the OnRetrieveBlock
method in the DatabaseBlock. I think its primary purpose was to allow us to separate the following two actions:
- Resetting a block
and - Fetching the blocks data
Without the CacheAllMetadata policy, these two steps will never be separate so you can probably combine it all into one function call. For example in the series we do this:
b.ResetRetrievable(startTime, blockSize, s.blockRetriever, metadata)
// Use s.id instead of id here, because id is finalized by the context whereas
// we rely on the G.C to reclaim s.id. This is important because the block will
// hold onto the id ref, and (if the LRU caching policy is enabled) the shard
// will need it later when the WiredList calls its OnEvictedFromWiredList method.
// Also note that ResetRetrievable will mark the block as not retrieved from disk,
// but OnRetrieveBlock will then properly mark it as retrieved from disk so subsequent
// calls to WasRetrievedFromDisk will return true.
b.OnRetrieveBlock(s.id, tags, startTime, segment)
It seems like that could just become
b.Reset(startTime, blockSize, segment)
And then we could delete the OnRetrieveBlock
and ResetRetrievable
methods (and their helper methods like resetRetrievableWithLock
) since they're only required in a world where data can be lazily fetched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #1110 +/- ##
==========================================
- Coverage 70.17% 69.99% -0.19%
==========================================
Files 670 669 -1
Lines 55643 55082 -561
==========================================
- Hits 39050 38556 -494
+ Misses 14072 14025 -47
+ Partials 2521 2501 -20
Continue to review full report at Codecov.
|
7a9ff29
to
c8663c3
Compare
src/dbnode/client/session.go
Outdated
// ever called | ||
err = xerrors.NewNonRetryableError(errInvalidFetchBlocksMetadataVersion) | ||
} | ||
currPageToken, err = s.streamBlocksMetadataFromPeerV2(namespace, shardID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe rename streamBlocksMetadataFromPeerV2
to just streamBlocksMetadataFromPeer
?
src/dbnode/client/session.go
Outdated
@@ -2034,168 +1993,6 @@ func (s *session) streamBlocksMetadataFromPeers( | |||
// TODO(r): Delete this once we delete the V1 code path | |||
type pageToken interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can delete this now too, since the page token will always be of the same type now (that is []byte
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I would do a grep for "Delete this once we delete the V1 code path" I think I used that exact string in a bunch of places so we could grep for it all
} else { | ||
expectFetchMetadataAndReturn(client, result, opts) | ||
} | ||
expectFetchMetadataAndReturnV2(client, result, opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe we can rename this from expectFetchMetadataAndReturnV2
to just expectFetchMetadataAndReturn
?
@@ -2198,64 +2003,12 @@ func expectFetchMetadataAndReturnV2( | |||
type fetchMetadataReqMatcher struct { | |||
shard int32 | |||
limit int64 | |||
pageToken *int64 | |||
pageTokenV2 []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe we can rename pageTokenV2
to pageToken
now?
src/dbnode/storage/block/block.go
Outdated
@@ -55,7 +55,6 @@ type dbBlock struct { | |||
|
|||
mergeTarget DatabaseBlock | |||
|
|||
retriever DatabaseShardBlockRetriever | |||
retrieveID ident.ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename retrieveID
since it's not used for retrieving any more, it's just used for accounting purposes, mainly in the wired list.
So just rename it to seriesID
here and in the wiredListEntry
it should also be renamed to seriesID
.
That sounds good?
@@ -186,26 +167,6 @@ func (b *dbBlock) Checksum() (uint32, error) { | |||
return b.checksum, nil | |||
} | |||
|
|||
func (b *dbBlock) OnRetrieveBlock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinjc Now that you've deleted this method, how does the seriesID get set? I think you might need to make seriesID a required argument for the Reset
method otherwise it will mess up the LRU caching policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, going to make a separate ResetFromDisk
function that's basically Reset
with setting the seriesID
xlog.NewField("wasRetrievedFromDisk", entry.wasRetrievedFromDisk), | ||
).Errorf("wired list tried to close a block that was not from disk") | ||
} | ||
onEvict.OnEvictedFromWiredList(entry.seriesID, entry.startTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe yeah we only want to do this in the else
case to this if condition, otherwise we're sending an arg down the callchain that might not be nil-checked later.
@@ -371,7 +371,7 @@ type databaseShard interface { | |||
starts []time.Time, | |||
) ([]block.FetchBlockResult, error) | |||
|
|||
// FetchBlocksMetadataV2 retrieves blocks metadata. | |||
// FetchBlocksMetadata retrieves blocks metadata. | |||
FetchBlocksMetadataV2( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm did you mean to update this method too? Only the comment got updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I think we want to keep the "V2" in the name in this one because it's part of the chain from RPC code. I'll revert this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM when build is passing
No description provided.