Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Avoid loading blocks in memory for namespaces with snapshots disabled during bootstrapping #3919

Merged
merged 7 commits into from
Nov 17, 2021

Conversation

soundvibe
Copy link
Collaborator

@soundvibe soundvibe commented Nov 11, 2021

Use persist.FileSetFlushType instead of persist.FileSetSnapshotType for second target data and index ranges if bootstrapping namespace has !snapshotEnabled. Because of this change, bootstrappers won't keep their second range blocks in memory for snapshot disabled namespaces (shouldPersist will evaluate to true for them).
This should help to reduce dbnode's memory usage during bootstrapping because snapshot-disabled namespaces usually have long retentions and block sizes so keeping them all in memory could potentially lead to OOM.

What this PR does / why we need it:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:


Does this PR require updating code package or user-facing documentation?:


…e` for second target data and index ranges if bootstrapping namespace is read only. Because of this change, bootstrappers won't keep second range blocks in memory for read only namespaces (`shouldPersist` will evaluate to true for them).
// If yes, return an error to force a retry
if persistConf := ns.DataRunOptions.RunOptions.PersistConfig(); persistConf.Enabled &&
persistConf.FileSetType == persist.FileSetSnapshotType {
if runIndex == 1 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking for the second run should be enough here to replace previous condition. This is needed because now second run blocks won't always have persist.FileSetSnapshotType set.

Copy link
Collaborator

@Antanukas Antanukas Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that this is not really required to avoid loading blocks in memory for read only? Can we split this into separate PR? Mainly because with my limited knowledge this seem to be a dangerious change that might also affect non read only namespaces and we might need to revert it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because if this condition is left unchanged, we won't be able to check if target ranges have advanced for read-only namespaces. Don't think that this change is dangerous, because it will behave in the same way as before (just the condition is different)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, i agree the logic is still similar but i agree - i’d rather not change setting file snapshot type to something other than Snapshot.

I’ll keep reviewing and give recommendation once understanding the larger change, but seems we are going from type safe to type unsafe potentially.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this make more sense if we based the decision on "snapshotEnabled": false namespace property (instead of readOnly)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that decision based on "snapshotEnabled": false could be considered as well.
And for Rob's concerns - I don't think type safety gives us much here. The most important thing we're doing here is checking that target ranges have not advanced for the last run or latest ranges we've calculated before. I am not sure if this should depend on snapshot fileSet type because this is more of the time problem we're solving here and not the type of persistence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR to use snapshotEnabled instead.

@codecov
Copy link

codecov bot commented Nov 11, 2021

Codecov Report

Merging #3919 (7f2a91f) into master (13e4c45) will decrease coverage by 0.3%.
The diff coverage is 94.2%.

❗ Current head 7f2a91f differs from pull request most recent head e205a44. Consider uploading reports for the commit e205a44 to get more accurate results

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #3919     +/-   ##
========================================
- Coverage    56.9%   56.6%   -0.4%     
========================================
  Files         555     555             
  Lines       63456   63326    -130     
========================================
- Hits        36152   35882    -270     
- Misses      24119   24240    +121     
- Partials     3185    3204     +19     
Flag Coverage Δ
aggregator 62.2% <ø> (+0.1%) ⬆️
cluster ∅ <ø> (∅)
collector 58.4% <ø> (ø)
dbnode 60.3% <94.2%> (-0.5%) ⬇️
m3em 46.4% <ø> (ø)
metrics 19.7% <ø> (ø)
msg 74.6% <ø> (-0.2%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13e4c45...e205a44. Read the comment docs.

* master:
  [m3msg] Remove unnecessary ConsumeHandler interface (#3918)
  [query] Prom converter supporting value decrease tolerance (#3914)
src/dbnode/storage/bootstrap/process.go Outdated Show resolved Hide resolved
// If yes, return an error to force a retry
if persistConf := ns.DataRunOptions.RunOptions.PersistConfig(); persistConf.Enabled &&
persistConf.FileSetType == persist.FileSetSnapshotType {
if runIndex == 1 {
Copy link
Collaborator

@Antanukas Antanukas Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that this is not really required to avoid loading blocks in memory for read only? Can we split this into separate PR? Mainly because with my limited knowledge this seem to be a dangerious change that might also affect non read only namespaces and we might need to revert it.

@@ -215,23 +219,24 @@ func (b bootstrapProcess) Run(
},
})
secondRanges := b.newShardTimeRanges(
dataRanges.secondRangeWithPersistFalse.Range, namespace.Shards)
dataRanges.secondRange.Range, namespace.Shards)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm why are we using “secondRange” instead of “secondRangeWithPersistFalse” here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if namespace is read-only, it will resolve to shouldPersist=true for the second range. Other namespaces will still resolve to shouldPersist=false as it was before these changes.

@soundvibe soundvibe changed the title [dbnode] Avoid loading blocks in memory for read-only namespaces during bootstrapping [dbnode] Avoid loading blocks in memory for namespaces with snapshots disabled during bootstrapping Nov 15, 2021
Copy link
Collaborator

@linasm linasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just asked for some alignments with the recent readOnly -> !snapshotEnabled change (comments inline).

src/dbnode/storage/bootstrap/process_test.go Outdated Show resolved Hide resolved
src/dbnode/storage/bootstrap/process.go Show resolved Hide resolved
* master:
  [agg] Use timestamp (not start aligned) for expiring forward versions (#3922)
  [tests] Add support for calls to label APIs in resources.Coordinator (#3916)
  [tests] Convert repair_and_replication Docker Integration Test to In-process (#3903)
  Always Close the conn if failed to write acks (#3855)
  [m3msg] Add receive and handle latency to consumers (#3920)
@soundvibe soundvibe merged commit e2b4a8a into master Nov 17, 2021
@soundvibe soundvibe deleted the linasn/bootstrap-readonly-ns branch November 17, 2021 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants