-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dbnode] Add claims for index segments volume index #2846
[dbnode] Add claims for index segments volume index #2846
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple nits, but changes LGTM.
Seems like this logic works fine w/ how we determine whether we've warm flushed an index block:
func (i *nsIndex) hasIndexWarmFlushedToDisk(
infoFiles []fs.ReadIndexInfoFileResult,
blockStart time.Time,
) bool {
var hasIndexWarmFlushedToDisk bool
// NB(bodu): We consider the block to have been warm flushed if there are any
// filesets on disk. This is consistent with the "has warm flushed" check in the db shard.
// Shard block starts are marked as having warm flushed if an info file is successfully read from disk.
for _, f := range infoFiles {
indexVolumeType := idxpersist.DefaultIndexVolumeType
if f.Info.IndexVolumeType != nil {
indexVolumeType = idxpersist.IndexVolumeType(f.Info.IndexVolumeType.Value)
}
if f.ID.BlockStart == blockStart && indexVolumeType == idxpersist.DefaultIndexVolumeType {
hasIndexWarmFlushedToDisk = true
}
}
return hasIndexWarmFlushedToDisk
}
We check the index volume type so there is no dependency on the volume index.
// Now check if previous claim exists. | ||
blockStart := xtime.ToUnixNano(blockStartTime) | ||
namespace := namespaceMetadata.ID() | ||
key := fmt.Sprintf("%s/%s/%d", filePathPrefix, namespace.String(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it better to have a nested map here so we're not generating a ton of keys with duplicated strings for long retention?
Also, isn't file path prefix static after set? I don't think we have multiple file paths and/or dynamically changing file paths at runtime?
rp, bs, t := retOpts.RetentionPeriod(), indexOpts.BlockSize(), nowFn() | ||
earliestBlockStart := retention.FlushTimeStartForRetentionPeriod(rp, bs, t) | ||
earliestBlockStartUnixNanos := xtime.ToUnixNano(earliestBlockStart) | ||
for key, claim := range indexVolumeIndexClaims { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prob a hyper optimization but would it be better to just iterate over block starts beginning w/ edge of retention and going backwards in time until we can't find any claim and bail then? That way we're only ever iterating over block starts that are out of retention (if any). If there are none, it will just check the map once and bail early.
// to validate the index segment. | ||
// If fails validation will rebuild since missing from | ||
// fulfilled range. | ||
fsOpts = fsOpts.SetIndexReaderAutovalidateIndexSegments(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this change, are there other places that we'd want to also validate index segments on read or are we already doing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self f/u on this: It looks like the only other place we are reading in index segments is in the persist manager after flush the segments to disk. We've just written to disk at that point so that should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG pending code gen + test fix ups
@@ -654,3 +657,70 @@ func (pm *persistManager) SetRuntimeOptions(value runtime.Options) { | |||
pm.currRateLimitOpts = value.PersistRateLimitOptions() | |||
pm.Unlock() | |||
} | |||
|
|||
var ( | |||
indexVolumeIndexClaimsLock sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide on going down the path of a global persist manager instance vs a global shared lock btwn persist managers, currently both flush and cold flush each have their own persist manager instance.
We also create one for the fs/peers bootstrappers but they run serially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spoke offline, decided on going w/ a global index volume index claims manager approach instead.
7bda086
to
a8e0720
Compare
|
||
// NewIndexClaimsManager returns an instance of the index claim manager. This manages | ||
// concurrent claims for volume indices per ns and block start. | ||
// NB(bodu): There should be only a single shared index claim manager among all threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does it make sense to make this method cache an instance and just return one if it already exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so there are unit tests that rely on the CTOR returning separate instances of the index claims manager.
I could add a reset method to the index claims manager and explicitly reset before each test but that feels kinda awkward esp just for tests...
I think it's better to just keep it as is for now. We have some confidence we're using a single instance since we are setting it at the storage.Options
level and propagating it through the db w/ validation checks if not set.
Still LG @notbdu. Codegen failure looks lint related so nbd. 👍 |
544f20e
to
5fb09f5
Compare
Codecov Report
@@ Coverage Diff @@
## master #2846 +/- ##
========================================
Coverage 72.1% 72.1%
========================================
Files 1100 1101 +1
Lines 99968 100071 +103
========================================
+ Hits 72077 72184 +107
+ Misses 22941 22933 -8
- Partials 4950 4954 +4
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
* master: (28 commits) [dbnode] Add claims for index segments volume index (#2846) [dbnode] Remove namespaces from example config and integration tests (#2866) [dbnode] Resurrect flaky test skip (#2868) [aggregator] Fix checkCampaignStateLoop (#2867) [dbnode] implement deletion method in namespace kvadmin service (#2861) Replace closer with resource package (#2864) Add coding style guide (#2831) Add GOVERNANCE.md to describe governance (#2830) Add COMPATIBILITY.md to describe version compatibility (#2829) Refactor etcd config as discovery section with convenience types (#2843) Refactor x/lockfile into dbnode/server (#2862) [lint] Disable nlreturn linter (#2865) [m3cluster] Expose placement algorithm in placement service (#2858) [etcd] Set reasonable cluster connection/sync settings by default (#2860) [dbnode] Use bits.LeadingZeros64 to improve encoder performance (#2857) Cleanup m3nsch leftovers (#2856) Update ci-scripts to correct coverage tracking (#2854) [aggregator] Process campaign state without waiting for first campaign check interval (#2855) Bump go to 1.14 (#2853) [query] Remove single series error from M3 ...
What this PR does / why we need it:
This adds the ability to safely flush in parallel for index segments for the same block start without concern about concurrency and writing to the same file paths. Adds ability to safely do more compactions/updates/etc to same block start for index segments.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: