db: refactor replayWAL to use flushes to make versionEdits #3955

itsbilal · 2024-09-23T19:35:43Z

Previously, replayWAL was too tightly coupled with the internals of a flush; it ran a flush directly and would produce a versionEdit that would then be accumulated manually (eg. without a BVE) using assumptions true only for memtable flushes. This required relatively intricate interactions with flushable ingests, and meant that flushable ingested sstables would always go into L0.

To pave the way for flushable ingests+excises, we refactor out all flush logic from replayWAL, and have replayWAL just replay WALs into the flushable queue. After this, Open() just schedules flushes until we're done.

Informs #2676.

cockroach-teamcity · 2024-09-23T19:35:50Z

This change is

jbowens

Reviewed all commit messages.
Reviewable status: 0 of 17 files reviewed, 4 unresolved discussions (waiting on @itsbilal)

open.go line 775 at r1 (raw file):

	addFileNum(encodedKey)

	for i := 1; i < int(b.Count()); i++ {

I didn't understand why the logic handles the first key of the batch individually before we enter the loop. Can this start from i := 0 if we remove the `br.Next()`` up above?

open.go line 803 at r1 (raw file):

		}
		if objMeta.IsRemote() {
			readable, err = d.objProvider.OpenForReading(context.TODO(), fileTypeTable, n, objstorage.OpenOptions{MustExist: true})

shouldn't we be able to do remove the explicit objProvider.Lookup call and just unconditionally call OpenForReading, letting the objProvider internally call into the vfs provider as necessary?

open.go line 879 at r1 (raw file):

			return
		}
		mem.writerUnref()

why do we need to unref here? won't mem still be sitting in the flushable queue, so it should maintain a reference until it's been flushed? I'm probably missing something obvious

open.go line 885 at r1 (raw file):

		if !d.opts.ReadOnly {
			entry.flushForced = true
		}

nit:

entry.flushForced = !d.opts.ReadOnly

jbowens · 2024-09-23T21:05:21Z

~~Looks like your PR surfaced an unrelated failure on the 24.2 branch in the crossversion metamorphic test run. I'll file an issue.~~

Actually, it looks like it wasn't on the 24.2 branch, so it may be an issue with this change.

itsbilal

Yeah, the failure was on this branch and it was a race between compactions deleting away the flushable ingested file during Open(), and the obsolete file scan in Open(). I've fixed it now, PTAL!

Reviewable status: 0 of 18 files reviewed, 3 unresolved discussions (waiting on @jbowens)

open.go line 775 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I didn't understand why the logic handles the first key of the batch individually before we enter the loop. Can this start from i := 0 if we remove the `br.Next()`` up above?

No reason; copy-paste code from before. Removed.

open.go line 803 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

shouldn't we be able to do remove the explicit objProvider.Lookup call and just unconditionally call OpenForReading, letting the objProvider internally call into the vfs provider as necessary?

Removed. Was copy-pasted from what was there before.

open.go line 879 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

why do we need to unref here? won't mem still be sitting in the flushable queue, so it should maintain a reference until it's been flushed? I'm probably missing something obvious

The writer unref is supposed to get zeroed when we're done writing to the memtable, which is the case here. The memtable starts off with one writer ref on itself (see memTable.init), and one reader ref on the flushableEntry wrapping it. We still keep the reader ref around until we flush the memtable. If we don't writerUnref here, we leak the memtable.

Previously, replayWAL was too tightly coupled with the internals of a flush; it ran a flush directly and would produce a versionEdit that would then be accumulated manually (eg. without a BVE) using assumptions true only for memtable flushes. This required relatively intricate interactions with flushable ingests, and meant that flushable ingested sstables would always go into L0. To pave the way for flushable ingests+excises, we refactor out all flush logic from replayWAL, and have replayWAL just replay WALs into the flushable queue. After this, Open() just schedules flushes until we're done. Informs cockroachdb#2676.

jbowens

Looks great!

Reviewed 15 of 17 files at r1, all commit messages.
Reviewable status: 15 of 18 files reviewed, 3 unresolved discussions (waiting on @itsbilal)

obsolete_files.go line 363 at r2 (raw file):

	// already been flushed.
	for _, fEntry := range d.mu.mem.queue {
		if f, ok := fEntry.flushable.(*ingestedFlushable); ok {

I didn't follow what necessitated passing the flushable ingests into this function as an argument, or was that a stylistic choice?

Also should we leave a TODO to rework this so that ingested flushable sstables receive a ref when they're replayed (and when ingested) and are unref'd after they've been 'flushed' and are now part of a version.

open.go line 879 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The writer unref is supposed to get zeroed when we're done writing to the memtable, which is the case here. The memtable starts off with one writer ref on itself (see memTable.init), and one reader ref on the flushableEntry wrapping it. We still keep the reader ref around until we flush the memtable. If we don't writerUnref here, we leak the memtable.

Ah thanks, I wasn't thinking and missed that this was a writer reference.

testdata/checkpoint line 21 at r2 (raw file):

sync: db
open-dir: db
sync: db/MANIFEST-000001

this sync is gone because we don't replay anything so we don't flush anything, whereas previously we'd unconditionally commit a version edit? is that right?

itsbilal

TFTR!

Reviewable status: 15 of 18 files reviewed, 3 unresolved discussions (waiting on @jbowens)

obsolete_files.go line 363 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I didn't follow what necessitated passing the flushable ingests into this function as an argument, or was that a stylistic choice?

Also should we leave a TODO to rework this so that ingested flushable sstables receive a ref when they're replayed (and when ingested) and are unref'd after they've been 'flushed' and are now part of a version.

The queue should at this point have no flushable ingests in it, because we flushed it away in the maybeScheduleFlush loop earlier in Open(). This didn't matter back when flushable ingests wouldn't leave the queue until after we had cleaned away obsolete files, but now we need to save the flushable ingests externally so we don't double-delete their files.

The ref you're mentioning in the TODO does get added. The issue is that our obsolete file scan in this method doesn't look at refs, it just looks at files not in the current version and deletes them. It's why it had to be explicitly taught about flushable ingests in the first place.

testdata/checkpoint line 21 at r2 (raw file):

Previously, jbowens (Jackson Owens) wrote…

this sync is gone because we don't replay anything so we don't flush anything, whereas previously we'd unconditionally commit a version edit? is that right?

that's right. we still create and sync the manifest once, we just don't do it as second time because we wouldn't do the empty versionEdit.

itsbilal requested a review from jbowens September 23, 2024 19:35

itsbilal requested a review from a team as a code owner September 23, 2024 19:35

itsbilal force-pushed the replay-wal-refactor branch from 6d9f003 to 52a8137 Compare September 23, 2024 19:44

itsbilal mentioned this pull request Sep 23, 2024

db: allow excises to unconditionally be flushable ingests #3897

Merged

jbowens reviewed Sep 23, 2024

View reviewed changes

itsbilal force-pushed the replay-wal-refactor branch from 52a8137 to c82638f Compare September 24, 2024 15:03

itsbilal commented Sep 24, 2024

View reviewed changes

itsbilal force-pushed the replay-wal-refactor branch from c82638f to 246ad9e Compare September 24, 2024 15:47

jbowens approved these changes Sep 24, 2024

View reviewed changes

itsbilal commented Sep 24, 2024

View reviewed changes

itsbilal merged commit 9035602 into cockroachdb:master Sep 24, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db: refactor replayWAL to use flushes to make versionEdits #3955

db: refactor replayWAL to use flushes to make versionEdits #3955

itsbilal commented Sep 23, 2024

cockroach-teamcity commented Sep 23, 2024

jbowens left a comment

jbowens commented Sep 23, 2024 •

edited

Loading

itsbilal left a comment

jbowens left a comment

itsbilal left a comment

db: refactor replayWAL to use flushes to make versionEdits #3955

db: refactor replayWAL to use flushes to make versionEdits #3955

Conversation

itsbilal commented Sep 23, 2024

cockroach-teamcity commented Sep 23, 2024

jbowens left a comment

Choose a reason for hiding this comment

jbowens commented Sep 23, 2024 • edited Loading

itsbilal left a comment

Choose a reason for hiding this comment

jbowens left a comment

Choose a reason for hiding this comment

itsbilal left a comment

Choose a reason for hiding this comment

jbowens commented Sep 23, 2024 •

edited

Loading