Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: refactor replayWAL to use flushes to make versionEdits #3955

Merged
merged 1 commit into from
Sep 24, 2024

Conversation

itsbilal
Copy link
Member

Previously, replayWAL was too tightly coupled with the internals of a flush; it ran a flush directly and would produce a versionEdit that would then be accumulated manually (eg. without a BVE) using assumptions true only for memtable flushes. This required relatively intricate interactions with flushable ingests, and meant that flushable ingested sstables would always go into L0.

To pave the way for flushable ingests+excises, we refactor out all flush logic from replayWAL, and have replayWAL just replay WALs into the flushable queue. After this, Open() just schedules flushes until we're done.

Informs #2676.

@itsbilal itsbilal requested a review from jbowens September 23, 2024 19:35
@itsbilal itsbilal requested a review from a team as a code owner September 23, 2024 19:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 0 of 17 files reviewed, 4 unresolved discussions (waiting on @itsbilal)


open.go line 775 at r1 (raw file):

	addFileNum(encodedKey)

	for i := 1; i < int(b.Count()); i++ {

I didn't understand why the logic handles the first key of the batch individually before we enter the loop. Can this start from i := 0 if we remove the `br.Next()`` up above?


open.go line 803 at r1 (raw file):

		}
		if objMeta.IsRemote() {
			readable, err = d.objProvider.OpenForReading(context.TODO(), fileTypeTable, n, objstorage.OpenOptions{MustExist: true})

shouldn't we be able to do remove the explicit objProvider.Lookup call and just unconditionally call OpenForReading, letting the objProvider internally call into the vfs provider as necessary?


open.go line 879 at r1 (raw file):

			return
		}
		mem.writerUnref()

why do we need to unref here? won't mem still be sitting in the flushable queue, so it should maintain a reference until it's been flushed? I'm probably missing something obvious


open.go line 885 at r1 (raw file):

		if !d.opts.ReadOnly {
			entry.flushForced = true
		}

nit:

entry.flushForced = !d.opts.ReadOnly

@jbowens
Copy link
Collaborator

jbowens commented Sep 23, 2024

Looks like your PR surfaced an unrelated failure on the 24.2 branch in the crossversion metamorphic test run. I'll file an issue.

Actually, it looks like it wasn't on the 24.2 branch, so it may be an issue with this change.

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the failure was on this branch and it was a race between compactions deleting away the flushable ingested file during Open(), and the obsolete file scan in Open(). I've fixed it now, PTAL!

Reviewable status: 0 of 18 files reviewed, 3 unresolved discussions (waiting on @jbowens)


open.go line 775 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I didn't understand why the logic handles the first key of the batch individually before we enter the loop. Can this start from i := 0 if we remove the `br.Next()`` up above?

No reason; copy-paste code from before. Removed.


open.go line 803 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

shouldn't we be able to do remove the explicit objProvider.Lookup call and just unconditionally call OpenForReading, letting the objProvider internally call into the vfs provider as necessary?

Removed. Was copy-pasted from what was there before.


open.go line 879 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

why do we need to unref here? won't mem still be sitting in the flushable queue, so it should maintain a reference until it's been flushed? I'm probably missing something obvious

The writer unref is supposed to get zeroed when we're done writing to the memtable, which is the case here. The memtable starts off with one writer ref on itself (see memTable.init), and one reader ref on the flushableEntry wrapping it. We still keep the reader ref around until we flush the memtable. If we don't writerUnref here, we leak the memtable.

Previously, replayWAL was too tightly coupled with the internals
of a flush; it ran a flush directly and would produce a versionEdit
that would then be accumulated manually (eg. without a BVE)
using assumptions true only for memtable flushes. This required
relatively intricate interactions with flushable ingests, and
meant that flushable ingested sstables would always go into L0.

To pave the way for flushable ingests+excises, we refactor out
all flush logic from replayWAL, and have replayWAL just replay
WALs into the flushable queue. After this, Open() just schedules
flushes until we're done.

Informs cockroachdb#2676.
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Looks great!

Reviewed 15 of 17 files at r1, all commit messages.
Reviewable status: 15 of 18 files reviewed, 3 unresolved discussions (waiting on @itsbilal)


obsolete_files.go line 363 at r2 (raw file):

	// already been flushed.
	for _, fEntry := range d.mu.mem.queue {
		if f, ok := fEntry.flushable.(*ingestedFlushable); ok {

I didn't follow what necessitated passing the flushable ingests into this function as an argument, or was that a stylistic choice?

Also should we leave a TODO to rework this so that ingested flushable sstables receive a ref when they're replayed (and when ingested) and are unref'd after they've been 'flushed' and are now part of a version.


open.go line 879 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The writer unref is supposed to get zeroed when we're done writing to the memtable, which is the case here. The memtable starts off with one writer ref on itself (see memTable.init), and one reader ref on the flushableEntry wrapping it. We still keep the reader ref around until we flush the memtable. If we don't writerUnref here, we leak the memtable.

Ah thanks, I wasn't thinking and missed that this was a writer reference.


testdata/checkpoint line 21 at r2 (raw file):

sync: db
open-dir: db
sync: db/MANIFEST-000001

this sync is gone because we don't replay anything so we don't flush anything, whereas previously we'd unconditionally commit a version edit? is that right?

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 15 of 18 files reviewed, 3 unresolved discussions (waiting on @jbowens)


obsolete_files.go line 363 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I didn't follow what necessitated passing the flushable ingests into this function as an argument, or was that a stylistic choice?

Also should we leave a TODO to rework this so that ingested flushable sstables receive a ref when they're replayed (and when ingested) and are unref'd after they've been 'flushed' and are now part of a version.

The queue should at this point have no flushable ingests in it, because we flushed it away in the maybeScheduleFlush loop earlier in Open(). This didn't matter back when flushable ingests wouldn't leave the queue until after we had cleaned away obsolete files, but now we need to save the flushable ingests externally so we don't double-delete their files.

The ref you're mentioning in the TODO does get added. The issue is that our obsolete file scan in this method doesn't look at refs, it just looks at files not in the current version and deletes them. It's why it had to be explicitly taught about flushable ingests in the first place.


testdata/checkpoint line 21 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

this sync is gone because we don't replay anything so we don't flush anything, whereas previously we'd unconditionally commit a version edit? is that right?

that's right. we still create and sync the manifest once, we just don't do it as second time because we wouldn't do the empty versionEdit.

@itsbilal itsbilal merged commit 9035602 into cockroachdb:master Sep 24, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants