Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: add virtual backing protection mechanism #3400

Merged
merged 1 commit into from
Mar 17, 2024

Conversation

RaduBerinde
Copy link
Member

manifest: add Protect API to VirtualBackings

Add Protect and Unprotect which control a separate ref count on
each backing. This will allow external ingestions to temporarily
prevent a backing from being removed from the latest version.

db: handle protected backings in versionSet

This commit updates versionSet code to support protected virtual
backings. The main change is that we now also Ref all backings that
exist in vs.virtualBackings; we reorganize the code related to
zombie backings in logAndApply to facilitate maintaining these refs
correctly.

@RaduBerinde RaduBerinde requested review from sumeerbhola and a team March 13, 2024 17:32
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1.
Reviewable status: 2 of 7 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde)


internal/manifest/virtual_backings.go line 53 at r1 (raw file):

	// protectionCount is used by Protect to temporarily prevent a backing from
	// being reported as unused.
	protectionCount int32

I am curious why this needs to be a separate count field from useCount.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 7 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)


internal/manifest/virtual_backings.go line 53 at r1 (raw file):

Previously, sumeerbhola wrote…

I am curious why this needs to be a separate count field from useCount.

responsibleForGarbageBytes wants to know how many tables use a backing. Also, I'd probably prefer to keep them separate anyway just for debuggability.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r3, 2 of 5 files at r4, all commit messages.
Reviewable status: 4 of 7 files reviewed, 4 unresolved discussions (waiting on @RaduBerinde)


version_set.go line 529 at r4 (raw file):

	// Update virtualBackings and populate ve.RemovedBackingTables.
	for _, b := range ve.CreatedBackingTables {
		b.Ref()

odd to see the Ref here, instead of inside VirtualBackings.Add.


version_set.go line 537 at r4 (raw file):

		}
	}
	for _, m := range ve.DeletedFiles {

I'm getting very confused here, in that the logic was nicely abstracted in getZombiesAndUpdateVirtualBackings and now it is harder to understand.
What is the key insight that I am missing in why this is needed? Is this to delay unreffing until after the write to the manifest? The previous version is still alive, so why does it matter if the unref happens now?


internal/manifest/version.go line 405 at r4 (raw file):

	//
	// In addition, a reference count is taken for every backing in the latest
	// version's VirtualBackings (necessary to support Protect/Unprotect).

I'm getting a bit confused about the lifetime. Something will be in the latest version's VirtualBackings if the version has a file reffing it. If it doesn't have a file reffing it, unused will return it before the version is installed and the file will be removed from VirtualBackings. I suppose with the change in this PR if the file is being protected, it will not be included in the return value from unused and so it will stay in the VirtualBacking. But if there isn't a ref protecting deletion of the file, it will get deleted while VirtualBackings still thinks it is alive.

Some code commentary would help:

  • Pointing out that the only thing standing in the way of file deletion is FileBacking.refs.
  • We don't know which of those refs are coming from the latest version and which from earlier versions, hence backingFileMetadata.useCount which counts just the latest ones, so we can compute the right versionEdit.
  • If the latest version has useCount > 0, it implies FileBacking.refs > 0 but not vice versa.
  • With this change if useCount + protectionCount > 0, it is possible that useCount = 0, so VirtualBackings needs to hold a ref to prevent file deletion.

The other question is why we need to delay returning something from unused until useCount + protectionCount > 0. Isn't it ok to return if useCount == 0? All it does is remove it from the version. I think the answer is that VirtualBackings.Add corresponds to it being added in a VersionEdit.CreatedBackingTables, so if it stops existing in the latest version and remains in VirtualBackings we will not add the metadata for it in a future version edit when the useCount increases above 0. That is, VirtualBackings needs to mirror what is in the latest version.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 7 files reviewed, 4 unresolved discussions (waiting on @sumeerbhola)


version_set.go line 537 at r4 (raw file):

Previously, sumeerbhola wrote…

I'm getting very confused here, in that the logic was nicely abstracted in getZombiesAndUpdateVirtualBackings and now it is harder to understand.
What is the key insight that I am missing in why this is needed? Is this to delay unreffing until after the write to the manifest? The previous version is still alive, so why does it matter if the unref happens now?

Yes its so we Unref after. Some of the unused backings might not be in the previous version (thet's the entire point of the protection mechanism). Maybe we can Unref here and only do the addOnsolete below?


internal/manifest/version.go line 405 at r4 (raw file):

Previously, sumeerbhola wrote…

I'm getting a bit confused about the lifetime. Something will be in the latest version's VirtualBackings if the version has a file reffing it. If it doesn't have a file reffing it, unused will return it before the version is installed and the file will be removed from VirtualBackings. I suppose with the change in this PR if the file is being protected, it will not be included in the return value from unused and so it will stay in the VirtualBacking. But if there isn't a ref protecting deletion of the file, it will get deleted while VirtualBackings still thinks it is alive.

Some code commentary would help:

  • Pointing out that the only thing standing in the way of file deletion is FileBacking.refs.
  • We don't know which of those refs are coming from the latest version and which from earlier versions, hence backingFileMetadata.useCount which counts just the latest ones, so we can compute the right versionEdit.
  • If the latest version has useCount > 0, it implies FileBacking.refs > 0 but not vice versa.
  • With this change if useCount + protectionCount > 0, it is possible that useCount = 0, so VirtualBackings needs to hold a ref to prevent file deletion.

The other question is why we need to delay returning something from unused until useCount + protectionCount > 0. Isn't it ok to return if useCount == 0? All it does is remove it from the version. I think the answer is that VirtualBackings.Add corresponds to it being added in a VersionEdit.CreatedBackingTables, so if it stops existing in the latest version and remains in VirtualBackings we will not add the metadata for it in a future version edit when the useCount increases above 0. That is, VirtualBackings needs to mirror what is in the latest version.

Yes the point is to prevent it from being removed via DeletedBackingTables (not just to keep the object alive). Perhaps another way is to add it back via CreatedBackingTables as necessary. There might be some problems with reusing backing numbers in that way.

To me it makes more sense for VirtualBackings to always mirror the backings you have in the latest version) those you would have if you read the manifest).

@RaduBerinde
Copy link
Member Author

internal/manifest/version.go line 405 at r4 (raw file):

Previously, RaduBerinde wrote…

Yes the point is to prevent it from being removed via DeletedBackingTables (not just to keep the object alive). Perhaps another way is to add it back via CreatedBackingTables as necessary. There might be some problems with reusing backing numbers in that way.

To me it makes more sense for VirtualBackings to always mirror the backings you have in the latest version) those you would have if you read the manifest).

Perhaps I should look at always delaying putting backings in DeletedBackimgTables until they are obsolete (ie ref count goes to 0)?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 7 files reviewed, 4 unresolved discussions (waiting on @sumeerbhola)


version_set.go line 529 at r4 (raw file):

Previously, sumeerbhola wrote…

odd to see the Ref here, instead of inside VirtualBackings.Add.

Unref should happen at the same layer.. If we add Unref inside Delete, we have to handle the case where it's the last ref. Felt cleaner to not entangle these in VirtualBackings


version_set.go line 537 at r4 (raw file):

Maybe we can Unref here and only do the addOnsolete below?

I tried this (in a separate commit r5) and reinstated the getZombies function, it does feel cleaner. Let me know what you think.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 7 files reviewed, 4 unresolved discussions (waiting on @RaduBerinde)


version_set.go line 537 at r4 (raw file):

Previously, RaduBerinde wrote…

Maybe we can Unref here and only do the addOnsolete below?

I tried this (in a separate commit r5) and reinstated the getZombies function, it does feel cleaner. Let me know what you think.

I haven't looked at the latest code changes yet, but was thinking about it this morning and it looks to me that the current getZombiesAndUpdateVirtualBackings (gzauvb) structure is viable.

Let's say we are going from version V1 to version V2, using version edit VE2. The current VirtualBackings is VB1. VB1 contains exactly the virtual sst backings that are in V1. For a backing in V1, we have one of two cases:

  • Both V1 and VB1 have refs.
  • Only VB1 has a ref. This is when a protect call prevented it from becoming unused when V1 was being installed, even though useCount was 0, i.e., it was not returned by the unused method. It is possible that useCount+protectionCount is now 0, but that is fine since we still hold the ref. Consider two such file backings f1 and f2, that have useCount+protectionCount equal to 0.

VB2 is going to be the next value of VirtualBackings. We have the partial VE2 that does not contain RemovedBackingTables, and we call gzauvb(VE2, VB1):

  • VE2 contains some new file(s) that are backed by f1. We only allow physical to virtual backing transitions, so these new file must be virtual. We have not yet taken a ref in V2, since V2 has not been created yet. But we will call AddFile here, that will prevent it from becoming unused. When we exit gzauvb, the only ref to f1 is held by VB2, and when V2 is generated it will additionally hold a ref.
  • VE2 contains no new file that is backed by f2. gzauvb will see it returned from unused, and will call VirtualBackings.Remove. The Remove method should also return whether the ref count dropped to 0. If so, the file belongs in both zombies and obsolete, and both should be returned from gzauvb. After vs.append we would explicitly (inside logAndApply) call versionSet.addObsolete.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 7 files reviewed, 6 unresolved discussions (waiting on @RaduBerinde)


version_set.go line 689 at r5 (raw file):

			delete(zombies, nf.Meta.FileBacking.DiskFileNum)
		}
	}

Based on my latest comment, do we even need this structure to change?
It was simple to understand that we

  • first handle all the physical backings, then move to virtual backings.
  • for virtual backings the first thing is to call VirtualBackings.Add and it suffices to do those for CreatedBackingTables.
  • then handle the AddTable and RemoveTable, and finally Unused.
  • All the zombie and obsolete logic happen when we call Unused.

Perhaps this is because this code is trying to avoid entangling VirtualBackings in reffing and unreffing. I am not sure why. We need to maintain the invariant that everything in VirtualBackings has a ref held by it, so it should do it. If we add a code comment that explains things in a manner akin to my latest comment, I think it should be easy-ish for the reader.


version_set.go line 820 at r5 (raw file):

	fileNum, minUnflushedLogNum base.DiskFileNum,
	nextFileNum uint64,
	virtualBackings []*fileBacking,

related to the other comments, IIUC, we shouldn't need this change to accomplish the protect stuff.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 7 files reviewed, 6 unresolved discussions (waiting on @sumeerbhola)


version_set.go line 537 at r4 (raw file):

Previously, sumeerbhola wrote…

I haven't looked at the latest code changes yet, but was thinking about it this morning and it looks to me that the current getZombiesAndUpdateVirtualBackings (gzauvb) structure is viable.

Let's say we are going from version V1 to version V2, using version edit VE2. The current VirtualBackings is VB1. VB1 contains exactly the virtual sst backings that are in V1. For a backing in V1, we have one of two cases:

  • Both V1 and VB1 have refs.
  • Only VB1 has a ref. This is when a protect call prevented it from becoming unused when V1 was being installed, even though useCount was 0, i.e., it was not returned by the unused method. It is possible that useCount+protectionCount is now 0, but that is fine since we still hold the ref. Consider two such file backings f1 and f2, that have useCount+protectionCount equal to 0.

VB2 is going to be the next value of VirtualBackings. We have the partial VE2 that does not contain RemovedBackingTables, and we call gzauvb(VE2, VB1):

  • VE2 contains some new file(s) that are backed by f1. We only allow physical to virtual backing transitions, so these new file must be virtual. We have not yet taken a ref in V2, since V2 has not been created yet. But we will call AddFile here, that will prevent it from becoming unused. When we exit gzauvb, the only ref to f1 is held by VB2, and when V2 is generated it will additionally hold a ref.
  • VE2 contains no new file that is backed by f2. gzauvb will see it returned from unused, and will call VirtualBackings.Remove. The Remove method should also return whether the ref count dropped to 0. If so, the file belongs in both zombies and obsolete, and both should be returned from gzauvb. After vs.append we would explicitly (inside logAndApply) call versionSet.addObsolete.

Yeah this is exactly what the latest commit does. I haven't moved the Red/Unref inside Add/Remove but can easily do that now since they happen at the same place.


version_set.go line 820 at r5 (raw file):

Previously, sumeerbhola wrote…

related to the other comments, IIUC, we shouldn't need this change to accomplish the protect stuff.

We do.. this function tries to determine the backings in the latest version based on the tables. Protected backings will dissapear. If virtualBackings was a COW structure, this would just consult the list in the currenr version.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 7 files reviewed, 6 unresolved discussions (waiting on @RaduBerinde)


version_set.go line 820 at r5 (raw file):

Previously, RaduBerinde wrote…

We do.. this function tries to determine the backings in the latest version based on the tables. Protected backings will dissapear. If virtualBackings was a COW structure, this would just consult the list in the currenr version.

I think I see what is happening now. Let me try to play back my mental model and you can correct me.

We had this dedup structure, which meant we didn't need to appeal to anything in VirtualBackings. We discovered exactly the backings we needed and and wrote them as CreatedBackingTables. It didn't matter that we had already updated VirtualBackings before calling createManifest.

Now with the protect feature, we have a ref inside VirtualBackings. That ref could go away when VirtualBackings is updated, say for file f, which has no remaining refs. If the machine crashes after creating this manifest and before writing the version edit, we are still ok since we delete file f after writing the version edit.

But f, by definition, is one where the current version has no mentions. So if we reconstruct CreatedBackingTables purely based on the mentions in this version, we won't write f in CreatedBackingTables. This will create an inconsistency in the case when we don't crash, since the next version edit expects to find f.

We could have passed the zombies here, since it is a superset of files like f. But since we have to iterate over the whole version anyway, to write this snapshot, this code takes the simpler approach of grabbing all the backings from the VirtualBackings before editing it, and passing it in. And that means the dedup logic goes away. Sounds good.

@RaduBerinde
Copy link
Member Author

version_set.go line 820 at r5 (raw file):

Previously, sumeerbhola wrote…

I think I see what is happening now. Let me try to play back my mental model and you can correct me.

We had this dedup structure, which meant we didn't need to appeal to anything in VirtualBackings. We discovered exactly the backings we needed and and wrote them as CreatedBackingTables. It didn't matter that we had already updated VirtualBackings before calling createManifest.

Now with the protect feature, we have a ref inside VirtualBackings. That ref could go away when VirtualBackings is updated, say for file f, which has no remaining refs. If the machine crashes after creating this manifest and before writing the version edit, we are still ok since we delete file f after writing the version edit.

But f, by definition, is one where the current version has no mentions. So if we reconstruct CreatedBackingTables purely based on the mentions in this version, we won't write f in CreatedBackingTables. This will create an inconsistency in the case when we don't crash, since the next version edit expects to find f.

We could have passed the zombies here, since it is a superset of files like f. But since we have to iterate over the whole version anyway, to write this snapshot, this code takes the simpler approach of grabbing all the backings from the VirtualBackings before editing it, and passing it in. And that means the dedup logic goes away. Sounds good.

Exactly. Thanks for the thorough review, it helps a lot. I will work on improving the comments and I will add an example of the "normal path" where all this refcounting is inconsequential and an example of the "corner case" path.

@RaduBerinde
Copy link
Member Author

version_set.go line 820 at r5 (raw file):

Previously, RaduBerinde wrote…

Exactly. Thanks for the thorough review, it helps a lot. I will work on improving the comments and I will add an example of the "normal path" where all this refcounting is inconsequential and an example of the "corner case" path.

(and I will move the Ref/Unref inside VirtualBackings)

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Added a detailed comment about the protection mechanism in VirtualBackings.

Reviewable status: 1 of 8 files reviewed, 6 unresolved discussions (waiting on @sumeerbhola)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great
:lgtm:

Reviewed 2 of 5 files at r4, 3 of 5 files at r6, 2 of 2 files at r7, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @RaduBerinde)


version_set.go line 922 at r7 (raw file):

	}
	vs.virtualBackings.ForEach(func(b *fileBacking) {
		m[b.DiskFileNum] = struct{}{}

Could you add a comment that virtualBackings contains both backings that are referenced by some virtual tables in the latest version (which are handled above), and those that are not but stayed alive due to protection. And this loop is ensuring the latter get added to the map.


internal/manifest/version_edit.go line 929 at r7 (raw file):

	for _, n := range ve.RemovedBackingTables {
		if _, ok := b.AddedFileBacking[n]; ok {
			delete(b.AddedFileBacking, n)

is this a bug in master in that when BVE accumulated many VersionEdits during open, it would not remove from the AddedFileBacking if a later VE had removed it? Or this was getting handled correctly when the BVE is applied to the Version?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)


version_set.go line 922 at r7 (raw file):

Previously, sumeerbhola wrote…

Could you add a comment that virtualBackings contains both backings that are referenced by some virtual tables in the latest version (which are handled above), and those that are not but stayed alive due to protection. And this loop is ensuring the latter get added to the map.

Done.


internal/manifest/version_edit.go line 929 at r7 (raw file):

Previously, sumeerbhola wrote…

is this a bug in master in that when BVE accumulated many VersionEdits during open, it would not remove from the AddedFileBacking if a later VE had removed it? Or this was getting handled correctly when the BVE is applied to the Version?

These aren't even used when applying, since the version doesn't have a separate record of the backings. They are only used separately during manifest replay. All the AddedBackingTables were added to the map and then the Removed ones were then removed. It wasn't incorrect, just potentially a bit wasteful.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem with moving the Unref to happen before the unlocked section (I hit it in the meta test, with backing deduplication on top of this PR). If some old version was also hanging on to the backing and that version gets unrefed, it could do the last Unref on the backing before the zombieTables get updated, and we get the obsolete table not marked as zombie fatal error.

We could potentially move up the updating of the zombie tables (but I don't like it since it's less cleanly defined), or we could remove that check.

Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I can't move the Remove calls to after the manifest update because we don't want an ingestion to use any of the backings that are removed in the version edit.

Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is more than just that panic. Once something makes it into the obsolete tables, it could get deleted at any time. If we crash before writing to the manifest, we will find the backing in the manifest but the object could have been deleted already. We are going to delete it anyway so it won't matter in practice, but it feels subtle and not quite right.

Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The smallest diff to address the issue is to restore the later Unref while keeping the existing structure the same: https://gist.github.com/RaduBerinde/9edca4a63605448c15d0c3e731a31009

Reviewable status: 7 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is more than just that panic. Once something makes it into the obsolete tables, it could get deleted at any time. If we crash before writing to the manifest, we will find the backing in the manifest but the object could have been deleted already. We are going to delete it anyway so it won't matter in practice, but it feels subtle and not quite right.

I agree this is not right. IIUC, the problem is not when the table becomes obsolete in the getZombies call, but that once we get rid of the ref in getZombies any other concurrent action could cause it to become obsolete and deleted, and then we could crash before writing the manifest.

The diff looks good.

Reviewed 1 of 1 files at r8.
Reviewable status: all files reviewed (commit messages unreviewed), 8 unresolved discussions (waiting on @RaduBerinde)

We add a `Protect`/`Unprotect` API to `VirtualBackings` which
temporarily keep backings alive even if they become unused.  This will
allow external ingestions to temporarily prevent a backing from being
removed from the latest version.

The comment on `VirtualBackings` explains how it works. The main
change in the `versionSet` code is that backings can now become
obsolete when removing them from the `VirtualBackings` set.
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated. I fixed up some comments and made the code that calls addObsolete more straightforward, the DeleteFunc thing was hard to read.

Reviewable status: 5 of 8 files reviewed, 8 unresolved discussions (waiting on @sumeerbhola)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r9, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @RaduBerinde)

@RaduBerinde
Copy link
Member Author

Thanks again for your help!

@RaduBerinde RaduBerinde merged commit fa6da33 into cockroachdb:master Mar 17, 2024
11 checks passed
@RaduBerinde RaduBerinde deleted the backings-protect branch March 17, 2024 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants