-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: add virtual backing protection mechanism #3400
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1.
Reviewable status: 2 of 7 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde)
internal/manifest/virtual_backings.go
line 53 at r1 (raw file):
// protectionCount is used by Protect to temporarily prevent a backing from // being reported as unused. protectionCount int32
I am curious why this needs to be a separate count field from useCount
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 7 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)
internal/manifest/virtual_backings.go
line 53 at r1 (raw file):
Previously, sumeerbhola wrote…
I am curious why this needs to be a separate count field from
useCount
.
responsibleForGarbageBytes
wants to know how many tables use a backing. Also, I'd probably prefer to keep them separate anyway just for debuggability.
d10bed9
to
671cca1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r3, 2 of 5 files at r4, all commit messages.
Reviewable status: 4 of 7 files reviewed, 4 unresolved discussions (waiting on @RaduBerinde)
version_set.go
line 529 at r4 (raw file):
// Update virtualBackings and populate ve.RemovedBackingTables. for _, b := range ve.CreatedBackingTables { b.Ref()
odd to see the Ref
here, instead of inside VirtualBackings.Add
.
version_set.go
line 537 at r4 (raw file):
} } for _, m := range ve.DeletedFiles {
I'm getting very confused here, in that the logic was nicely abstracted in getZombiesAndUpdateVirtualBackings
and now it is harder to understand.
What is the key insight that I am missing in why this is needed? Is this to delay unreffing until after the write to the manifest? The previous version is still alive, so why does it matter if the unref happens now?
internal/manifest/version.go
line 405 at r4 (raw file):
// // In addition, a reference count is taken for every backing in the latest // version's VirtualBackings (necessary to support Protect/Unprotect).
I'm getting a bit confused about the lifetime. Something will be in the latest version's VirtualBackings
if the version has a file reffing it. If it doesn't have a file reffing it, unused
will return it before the version is installed and the file will be removed from VirtualBackings
. I suppose with the change in this PR if the file is being protected, it will not be included in the return value from unused
and so it will stay in the VirtualBacking
. But if there isn't a ref protecting deletion of the file, it will get deleted while VirtualBackings
still thinks it is alive.
Some code commentary would help:
- Pointing out that the only thing standing in the way of file deletion is
FileBacking.refs
. - We don't know which of those refs are coming from the latest version and which from earlier versions, hence
backingFileMetadata.useCount
which counts just the latest ones, so we can compute the rightversionEdit
. - If the latest version has useCount > 0, it implies
FileBacking.refs > 0
but not vice versa. - With this change if useCount + protectionCount > 0, it is possible that useCount = 0, so
VirtualBackings
needs to hold a ref to prevent file deletion.
The other question is why we need to delay returning something from unused
until useCount + protectionCount > 0
. Isn't it ok to return if useCount == 0
? All it does is remove it from the version. I think the answer is that VirtualBackings.Add
corresponds to it being added in a VersionEdit.CreatedBackingTables
, so if it stops existing in the latest version and remains in VirtualBackings
we will not add the metadata for it in a future version edit when the useCount
increases above 0. That is, VirtualBackings
needs to mirror what is in the latest version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 7 files reviewed, 4 unresolved discussions (waiting on @sumeerbhola)
version_set.go
line 537 at r4 (raw file):
Previously, sumeerbhola wrote…
I'm getting very confused here, in that the logic was nicely abstracted in
getZombiesAndUpdateVirtualBackings
and now it is harder to understand.
What is the key insight that I am missing in why this is needed? Is this to delay unreffing until after the write to the manifest? The previous version is still alive, so why does it matter if the unref happens now?
Yes its so we Unref after. Some of the unused backings might not be in the previous version (thet's the entire point of the protection mechanism). Maybe we can Unref here and only do the addOnsolete below?
internal/manifest/version.go
line 405 at r4 (raw file):
Previously, sumeerbhola wrote…
I'm getting a bit confused about the lifetime. Something will be in the latest version's
VirtualBackings
if the version has a file reffing it. If it doesn't have a file reffing it,unused
will return it before the version is installed and the file will be removed fromVirtualBackings
. I suppose with the change in this PR if the file is being protected, it will not be included in the return value fromunused
and so it will stay in theVirtualBacking
. But if there isn't a ref protecting deletion of the file, it will get deleted whileVirtualBackings
still thinks it is alive.Some code commentary would help:
- Pointing out that the only thing standing in the way of file deletion is
FileBacking.refs
.- We don't know which of those refs are coming from the latest version and which from earlier versions, hence
backingFileMetadata.useCount
which counts just the latest ones, so we can compute the rightversionEdit
.- If the latest version has useCount > 0, it implies
FileBacking.refs > 0
but not vice versa.- With this change if useCount + protectionCount > 0, it is possible that useCount = 0, so
VirtualBackings
needs to hold a ref to prevent file deletion.The other question is why we need to delay returning something from
unused
untiluseCount + protectionCount > 0
. Isn't it ok to return ifuseCount == 0
? All it does is remove it from the version. I think the answer is thatVirtualBackings.Add
corresponds to it being added in aVersionEdit.CreatedBackingTables
, so if it stops existing in the latest version and remains inVirtualBackings
we will not add the metadata for it in a future version edit when theuseCount
increases above 0. That is,VirtualBackings
needs to mirror what is in the latest version.
Yes the point is to prevent it from being removed via DeletedBackingTables (not just to keep the object alive). Perhaps another way is to add it back via CreatedBackingTables as necessary. There might be some problems with reusing backing numbers in that way.
To me it makes more sense for VirtualBackings to always mirror the backings you have in the latest version) those you would have if you read the manifest).
Previously, RaduBerinde wrote…
Perhaps I should look at always delaying putting backings in DeletedBackimgTables until they are obsolete (ie ref count goes to 0)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 7 files reviewed, 4 unresolved discussions (waiting on @sumeerbhola)
version_set.go
line 529 at r4 (raw file):
Previously, sumeerbhola wrote…
odd to see the
Ref
here, instead of insideVirtualBackings.Add
.
Unref should happen at the same layer.. If we add Unref inside Delete, we have to handle the case where it's the last ref. Felt cleaner to not entangle these in VirtualBackings
version_set.go
line 537 at r4 (raw file):
Maybe we can Unref here and only do the addOnsolete below?
I tried this (in a separate commit r5) and reinstated the getZombies function, it does feel cleaner. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 7 files reviewed, 4 unresolved discussions (waiting on @RaduBerinde)
version_set.go
line 537 at r4 (raw file):
Previously, RaduBerinde wrote…
Maybe we can Unref here and only do the addOnsolete below?
I tried this (in a separate commit r5) and reinstated the getZombies function, it does feel cleaner. Let me know what you think.
I haven't looked at the latest code changes yet, but was thinking about it this morning and it looks to me that the current getZombiesAndUpdateVirtualBackings (gzauvb) structure is viable.
Let's say we are going from version V1 to version V2, using version edit VE2. The current VirtualBackings is VB1. VB1 contains exactly the virtual sst backings that are in V1. For a backing in V1, we have one of two cases:
- Both V1 and VB1 have refs.
- Only VB1 has a ref. This is when a protect call prevented it from becoming unused when V1 was being installed, even though useCount was 0, i.e., it was not returned by the unused method. It is possible that useCount+protectionCount is now 0, but that is fine since we still hold the ref. Consider two such file backings f1 and f2, that have useCount+protectionCount equal to 0.
VB2 is going to be the next value of VirtualBackings. We have the partial VE2 that does not contain RemovedBackingTables, and we call gzauvb(VE2, VB1):
- VE2 contains some new file(s) that are backed by f1. We only allow physical to virtual backing transitions, so these new file must be virtual. We have not yet taken a ref in V2, since V2 has not been created yet. But we will call AddFile here, that will prevent it from becoming unused. When we exit gzauvb, the only ref to f1 is held by VB2, and when V2 is generated it will additionally hold a ref.
- VE2 contains no new file that is backed by f2. gzauvb will see it returned from unused, and will call VirtualBackings.Remove. The Remove method should also return whether the ref count dropped to 0. If so, the file belongs in both zombies and obsolete, and both should be returned from gzauvb. After vs.append we would explicitly (inside logAndApply) call versionSet.addObsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 7 files reviewed, 6 unresolved discussions (waiting on @RaduBerinde)
version_set.go
line 689 at r5 (raw file):
delete(zombies, nf.Meta.FileBacking.DiskFileNum) } }
Based on my latest comment, do we even need this structure to change?
It was simple to understand that we
- first handle all the physical backings, then move to virtual backings.
- for virtual backings the first thing is to call VirtualBackings.Add and it suffices to do those for CreatedBackingTables.
- then handle the AddTable and RemoveTable, and finally Unused.
- All the zombie and obsolete logic happen when we call Unused.
Perhaps this is because this code is trying to avoid entangling VirtualBackings
in reffing and unreffing. I am not sure why. We need to maintain the invariant that everything in VirtualBackings
has a ref held by it, so it should do it. If we add a code comment that explains things in a manner akin to my latest comment, I think it should be easy-ish for the reader.
version_set.go
line 820 at r5 (raw file):
fileNum, minUnflushedLogNum base.DiskFileNum, nextFileNum uint64, virtualBackings []*fileBacking,
related to the other comments, IIUC, we shouldn't need this change to accomplish the protect stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 7 files reviewed, 6 unresolved discussions (waiting on @sumeerbhola)
version_set.go
line 537 at r4 (raw file):
Previously, sumeerbhola wrote…
I haven't looked at the latest code changes yet, but was thinking about it this morning and it looks to me that the current getZombiesAndUpdateVirtualBackings (gzauvb) structure is viable.
Let's say we are going from version V1 to version V2, using version edit VE2. The current VirtualBackings is VB1. VB1 contains exactly the virtual sst backings that are in V1. For a backing in V1, we have one of two cases:
- Both V1 and VB1 have refs.
- Only VB1 has a ref. This is when a protect call prevented it from becoming unused when V1 was being installed, even though useCount was 0, i.e., it was not returned by the unused method. It is possible that useCount+protectionCount is now 0, but that is fine since we still hold the ref. Consider two such file backings f1 and f2, that have useCount+protectionCount equal to 0.
VB2 is going to be the next value of VirtualBackings. We have the partial VE2 that does not contain RemovedBackingTables, and we call gzauvb(VE2, VB1):
- VE2 contains some new file(s) that are backed by f1. We only allow physical to virtual backing transitions, so these new file must be virtual. We have not yet taken a ref in V2, since V2 has not been created yet. But we will call AddFile here, that will prevent it from becoming unused. When we exit gzauvb, the only ref to f1 is held by VB2, and when V2 is generated it will additionally hold a ref.
- VE2 contains no new file that is backed by f2. gzauvb will see it returned from unused, and will call VirtualBackings.Remove. The Remove method should also return whether the ref count dropped to 0. If so, the file belongs in both zombies and obsolete, and both should be returned from gzauvb. After vs.append we would explicitly (inside logAndApply) call versionSet.addObsolete.
Yeah this is exactly what the latest commit does. I haven't moved the Red/Unref inside Add/Remove but can easily do that now since they happen at the same place.
version_set.go
line 820 at r5 (raw file):
Previously, sumeerbhola wrote…
related to the other comments, IIUC, we shouldn't need this change to accomplish the protect stuff.
We do.. this function tries to determine the backings in the latest version based on the tables. Protected backings will dissapear. If virtualBackings was a COW structure, this would just consult the list in the currenr version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 7 files reviewed, 6 unresolved discussions (waiting on @RaduBerinde)
version_set.go
line 820 at r5 (raw file):
Previously, RaduBerinde wrote…
We do.. this function tries to determine the backings in the latest version based on the tables. Protected backings will dissapear. If virtualBackings was a COW structure, this would just consult the list in the currenr version.
I think I see what is happening now. Let me try to play back my mental model and you can correct me.
We had this dedup structure, which meant we didn't need to appeal to anything in VirtualBackings
. We discovered exactly the backings we needed and and wrote them as CreatedBackingTables
. It didn't matter that we had already updated VirtualBackings
before calling createManifest
.
Now with the protect feature, we have a ref inside VirtualBackings
. That ref could go away when VirtualBackings
is updated, say for file f, which has no remaining refs. If the machine crashes after creating this manifest and before writing the version edit, we are still ok since we delete file f after writing the version edit.
But f, by definition, is one where the current version has no mentions. So if we reconstruct CreatedBackingTables
purely based on the mentions in this version, we won't write f in CreatedBackingTables
. This will create an inconsistency in the case when we don't crash, since the next version edit expects to find f.
We could have passed the zombies here, since it is a superset of files like f. But since we have to iterate over the whole version anyway, to write this snapshot, this code takes the simpler approach of grabbing all the backings from the VirtualBackings
before editing it, and passing it in. And that means the dedup logic goes away. Sounds good.
Previously, sumeerbhola wrote…
Exactly. Thanks for the thorough review, it helps a lot. I will work on improving the comments and I will add an example of the "normal path" where all this refcounting is inconsequential and an example of the "corner case" path. |
Previously, RaduBerinde wrote…
(and I will move the Ref/Unref inside VirtualBackings) |
dae13c6
to
29383fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Added a detailed comment about the protection mechanism in VirtualBackings.
Reviewable status: 1 of 8 files reviewed, 6 unresolved discussions (waiting on @sumeerbhola)
29383fe
to
d7326d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r4, 3 of 5 files at r6, 2 of 2 files at r7, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @RaduBerinde)
version_set.go
line 922 at r7 (raw file):
} vs.virtualBackings.ForEach(func(b *fileBacking) { m[b.DiskFileNum] = struct{}{}
Could you add a comment that virtualBackings contains both backings that are referenced by some virtual tables in the latest version (which are handled above), and those that are not but stayed alive due to protection. And this loop is ensuring the latter get added to the map.
internal/manifest/version_edit.go
line 929 at r7 (raw file):
for _, n := range ve.RemovedBackingTables { if _, ok := b.AddedFileBacking[n]; ok { delete(b.AddedFileBacking, n)
is this a bug in master in that when BVE accumulated many VersionEdits during open, it would not remove from the AddedFileBacking if a later VE had removed it? Or this was getting handled correctly when the BVE is applied to the Version?
d7326d0
to
130ac15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)
version_set.go
line 922 at r7 (raw file):
Previously, sumeerbhola wrote…
Could you add a comment that virtualBackings contains both backings that are referenced by some virtual tables in the latest version (which are handled above), and those that are not but stayed alive due to protection. And this loop is ensuring the latter get added to the map.
Done.
internal/manifest/version_edit.go
line 929 at r7 (raw file):
Previously, sumeerbhola wrote…
is this a bug in master in that when BVE accumulated many VersionEdits during open, it would not remove from the AddedFileBacking if a later VE had removed it? Or this was getting handled correctly when the BVE is applied to the Version?
These aren't even used when applying, since the version doesn't have a separate record of the backings. They are only used separately during manifest replay. All the AddedBackingTables
were added to the map and then the Removed ones were then removed. It wasn't incorrect, just potentially a bit wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a problem with moving the Unref
to happen before the unlocked section (I hit it in the meta test, with backing deduplication on top of this PR). If some old version was also hanging on to the backing and that version gets unrefed, it could do the last Unref
on the backing before the zombieTables
get updated, and we get the obsolete table not marked as zombie
fatal error.
We could potentially move up the updating of the zombie tables (but I don't like it since it's less cleanly defined), or we could remove that check.
Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I can't move the Remove calls to after the manifest update because we don't want an ingestion to use any of the backings that are removed in the version edit.
Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is more than just that panic. Once something makes it into the obsolete tables, it could get deleted at any time. If we crash before writing to the manifest, we will find the backing in the manifest but the object could have been deleted already. We are going to delete it anyway so it won't matter in practice, but it feels subtle and not quite right.
Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The smallest diff to address the issue is to restore the later Unref
while keeping the existing structure the same: https://gist.github.com/RaduBerinde/9edca4a63605448c15d0c3e731a31009
Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is more than just that panic. Once something makes it into the obsolete tables, it could get deleted at any time. If we crash before writing to the manifest, we will find the backing in the manifest but the object could have been deleted already. We are going to delete it anyway so it won't matter in practice, but it feels subtle and not quite right.
I agree this is not right. IIUC, the problem is not when the table becomes obsolete in the getZombies
call, but that once we get rid of the ref in getZombies
any other concurrent action could cause it to become obsolete and deleted, and then we could crash before writing the manifest.
The diff looks good.
Reviewed 1 of 1 files at r8.
Reviewable status: all files reviewed (commit messages unreviewed), 8 unresolved discussions (waiting on @RaduBerinde)
We add a `Protect`/`Unprotect` API to `VirtualBackings` which temporarily keep backings alive even if they become unused. This will allow external ingestions to temporarily prevent a backing from being removed from the latest version. The comment on `VirtualBackings` explains how it works. The main change in the `versionSet` code is that backings can now become obsolete when removing them from the `VirtualBackings` set.
130ac15
to
a5c423e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated. I fixed up some comments and made the code that calls addObsolete
more straightforward, the DeleteFunc thing was hard to read.
Reviewable status: 5 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r9, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @RaduBerinde)
Thanks again for your help! |
manifest: add Protect API to VirtualBackings
Add
Protect
andUnprotect
which control a separate ref count oneach backing. This will allow external ingestions to temporarily
prevent a backing from being removed from the latest version.
db: handle protected backings in versionSet
This commit updates versionSet code to support protected virtual
backings. The main change is that we now also
Ref
all backings thatexist in
vs.virtualBackings
; we reorganize the code related tozombie backings in
logAndApply
to facilitate maintaining these refscorrectly.