-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache deletion mark files together with meta files. #2457
Conversation
Signed-off-by: Peter Štibraný <[email protected]>
As commented above, but also:
I would say we always have to fetch, that's why I would think rather deletion marks should be done before fitlers even |
Otherwise make sense, I guess once we see deletion mark we cache it and no longer check for it right? |
question is: Do we even need to delete some deletion mark to undo deletion |
That's exactly what the code does now (https://github.com/pstibrany/thanos/blob/3a2276aa182a918e888413c0cac3b088788a87fc/pkg/block/fetcher.go#L377). However, there are some places in UI where no filters are used, and I was thinking that perhaps in those cases we don't need to fetch deletion mark files. |
Similar to meta.json, we only check if it still exists on the storage, but don't read it anymore. If it does, we use local cache. If it doesn't, we report that mark file no longer exist (even if we have it locally). |
This would still work! |
Which in retrospect doesn't save that much, since before this PR it's one operation to read the mark file, after this PR it's one operation to check if it exists (+ another one to read if, it it's not yet cached). 🤔 |
Exactly. I think with deletion marks we can go much further. We don't expect them to be changed ever. So we can cache once and always ignore such block. For readers they can ignore also meta.json after ignore time (: |
As agree offline I will try to embed this code into fix for #2459 |
Maybe in two steps |
Actually let's have quick fix for v0.12.1 maybe. |
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Added caching for non-existant markers as well. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Cache entry time-to-live durations are currently hardcoded, but can be made configurable if required. |
…Aaaarrrrggggghhh. 😱 Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Fix unit test by flushing the cache. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Back to draft... I've added caching of negative results too, but that has some consequences that I'd rather not tackle in this PR. I will remove that and then un-draft it. |
negative result? |
When deletion marker doesn’t exist on the remote storage, my PR was caching that info for a short while. |
Signed-off-by: Peter Štibraný <[email protected]>
|
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @pstibrany. The overall design LGTM and passing the deletion marks to Filter()
looks good for future extensibility from my perspective. I left few comments here and there, all small things.
ttl := f.deletionMarkPositiveCacheEntryTTL | ||
|
||
return &cachedDeletionMark{ | ||
nextCheck: now.Add(ttl/2 + time.Duration(rand.Int63n(ttl.Nanoseconds()))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the effective one is between ttl/2 and ttl*1.5. Is it what we want right? I would have expected deletionMarkPositiveCacheEntryTTL
to be the upper bound, not the average TTL, but I don't have strong opinions as far as we document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly wanted to avoid doing all rechecks at the same time, but to spread them a little. You're right that TTL could be made upper-bound instead, and then we can compute effective TTL for example as random between [TTL/2, TTL]. I also don't have strong opinions on this. I think the naming is bit confusing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think could be made a bit more clear if we would have two hardcoded values: min TTL and max TTL. Then the actual TTL is a random between min and max. Alternatively, TTL + jitter. But making it explicit would help to clarify it in the code.
Signed-off-by: Peter Štibraný <[email protected]>
Thanks for review @pracucci, I've addressed your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @pstibrany. Changes LGTM. I left few last comments / questions.
func (r response) deletionMarksCopyForFilter() map[ulid.ULID]*metadata.DeletionMark { | ||
marks := make(map[ulid.ULID]*metadata.DeletionMark, len(r.marks)) | ||
for id, m := range r.marks { | ||
marks[id] = &m.mark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I commented about storing by copy/reference in cachedDeletionMark
is due to the factor that here we work with pointers, while in the cache we store by copy. It just looked weird to me, but I'm not feeling strong about this.
if metaErr == nil { | ||
mark, markErr = f.loadDeletionMark(ctx, id, now) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking loudly: the contract of this fetchMetadata()
function is that it returns found metas also in case of any error fetching any meta (so it's OK returning a partial view). I'm wondering if it's correct to never fetch the deletion mark in case of any error: ie. should we fetch it anyway in case of ErrorSyncMetaCorrupted
? I don't have an answer honestly, but I would like to have a discussion on it.
} | ||
f.cached = cached | ||
f.cached = resp.metasCopy() | ||
f.marks = resp.marks // no need to copy, as it's not going to be modified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given deletion marks are relatively small (compared to metas), could just be more future-proof always copying them?
} | ||
|
||
// deletionMarksCopyForFilter makes a copy of deletion marks map, suitable for passing to Filter method. | ||
func (r response) deletionMarksCopyForFilter() map[ulid.ULID]*metadata.DeletionMark { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a bit weird to me calling it ForFilter()
. The usage we're going to do with a copy of deletion marks shouldn't be reflected in the function name IMO.
expectedMetas: ULIDs(1, 3, 6), | ||
expectedCorruptedMeta: ULIDs(5), | ||
expectedNoMeta: ULIDs(4, 2), | ||
expectedMetaErr: errors.New("incomplete view: unexpected meta file: 00000000070000000000000000/meta.json version: 20"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also expect an error for the corrupted deletion mark?
We have deployed updated Cortex with this PR in today, and results are somewhat disappointing... If we focus on get and exists operations only, we can see that we now issue even more operations to the storage: It seems that reason for this is a combination of several factors:
Previously deletion marks were only fetched for blocks that passed "is this block interesting" filter, which resulted in fewer operations. |
In the light of these findings, let's not merge this yet. I'd like to write a design document to suggest more generic caching approach around storage. |
Changes
This PR adds caching of deletion mark files, similar to how meta.json files are cached. Deletion mark files are stored at the same place.
In order to make this work, deletion mark files are fetched at the same time as
meta.json
files. They are fetched for each block, stored in a map and on the disk (if they exist), and then passed to filters.Open questions:
RequiresDeletionMarks() bool
function toFilter
interface, to figure out if we need them, and passing the OR-ed result from all filters toBaseFetcher
.I personally think this idea can be extended so that Fetcher returns mark files as well. Compactor could use that information, instead of relying on
ignoreDeletionMarkFilter
like it does today. But that would be a different PR.Verification
Not tested yet, and unit tests are broken. I will continue working on tests once I know this is a valid approach.