Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: Wire up L0SubLevels with iterator and version creation #624

Merged

Conversation

itsbilal
Copy link
Contributor

This change effectively lowers read amplification in L0 by taking
advantage of knowledge about sublevels contained in
Version.L0SubLevels and using it to create levelIters for each sublevel
instead of directly adding one iterator per file to the mergingIter.
The get_iter is also updated to take advantage of this information.

The Version struct is also updated, as a result of this change, to
hold a reference to an L0SubLevels instance. That reference is initialized
during Version initialization in version_edit.

Most of the test changes are a result of always printing the sublevel
alongside the level for L0.

The first commit is from #614; only review the second commit.

@itsbilal itsbilal self-assigned this Apr 15, 2020
@petermattis
Copy link
Collaborator

This change is Reviewable

@itsbilal itsbilal force-pushed the l0-sublevels-iters-and-version branch from 23f1cfc to 3ab9a4d Compare April 16, 2020 15:23
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, but then I wrote a chunk of this code. 😃

Reviewable status: 0 of 30 files reviewed, 6 unresolved discussions (waiting on @itsbilal, @jbowens, @petermattis, and @sumeerbhola)


db.go, line 711 at r2 (raw file):

	levels := buf.levels[:]
	addLevelIterForFiles := func(files []*manifest.FileMetadata, level int) {

Can you check that this structure isn't causing extra allocations? Sometimes that can happen with the use of closures. There is a script in the Cockroach repo which can show which allocations escape to the heap. Another technique is to find a benchmark which uses this function and to run the benchmark with -memprofile mem.prof -memprofilerate=1 which will cause every memory allocation to be sampled.


metrics_test.go, line 173 at r2 (raw file):

			// Some iterators (eg. levelIter) do not instantiate the underlying
			// iterator until the first positioning call. Call First() to
			// ensure full instantiation.

Why is full instantiation needed for this test?

Update: Ok, I think I understand the need here. Instantiation didn't capture the need for me. How about something like: Position the iterator so that levelIters will have loaded an sstable.


internal/manifest/version.go, line 209 at r2 (raw file):

	refs int32

	// The level 0 sstables organized in a series of sublevels. Similar to the

s/organized/are organized/g


internal/manifest/version.go, line 217 at r2 (raw file):

	// sublevel n+1.
	//
	// L0SubLevels.Files contains L0 files ordered by sublevels.

Should probably also mention that L0Sublevels.Files contains every table in Files[0].


internal/manifest/version.go, line 218 at r2 (raw file):

	//
	// L0SubLevels.Files contains L0 files ordered by sublevels.
	L0SubLevels *L0SubLevels

Making this a pointer implies that the zero value for Version needs some initialization. Is being able to copy the pointer when L0 isn't changing saving a lot vs making this a value?


internal/manifest/version.go, line 437 at r2 (raw file):

		// Pass 1 as level instead of 0 to trigger the level-related SST overlap
		// checks.
		if err := CheckOrdering(cmp, format, 1, v.L0SubLevels.Files[sublevel]); err != nil {

Passing 1 will cause the error messages to be prefixed with L1 which is incorrect. Perhaps we should define a level type here which includes both level and sublevel components. Then if the sublevel component is some magic value we'd engage in the legacy L0 ordering check. This type could also provide the String() method which formats L<level>[.<sublevel>] correctly. I'm imagining something like:

const invalidSublevel = int32(-1)

type level struct {
  level int32
  sublevel int32
}

func makeSublevel(l, s int32) level {
  return level{
    level: l,
    sublevel: s,
  }
}

func makeLevel(level int32) level {
  return level{
    level: l,
    sublevel: invalidSublevel,
  }
}

func (l *level) String() string {
  if l.sublevel == invalidSublevel {
    return fmt.Sprintf("L%d", l.level)
  }
  return fmt.Sprintf("L%d.%d", l.level, l.sublevel)
}

The name level isn't create for this type. levelAndSublevel is more explicit, but also a bit of a mouthful to speak and type. Perhaps you have a better suggestion.

@itsbilal itsbilal force-pushed the l0-sublevels-iters-and-version branch from 3ab9a4d to 3119b4c Compare April 20, 2020 19:47
Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: 0 of 31 files reviewed, 6 unresolved discussions (waiting on @jbowens, @petermattis, and @sumeerbhola)


db.go, line 711 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can you check that this structure isn't causing extra allocations? Sometimes that can happen with the use of closures. There is a script in the Cockroach repo which can show which allocations escape to the heap. Another technique is to find a benchmark which uses this function and to run the benchmark with -memprofile mem.prof -memprofilerate=1 which will cause every memory allocation to be sampled.

Just confirmed with a benchmark in cockroach. All the allocations in this function are either in the iterAllocPool.Get above, or down at the buf.merging.init. This closure doesn't add any.


metrics_test.go, line 173 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Why is full instantiation needed for this test?

Update: Ok, I think I understand the need here. Instantiation didn't capture the need for me. How about something like: Position the iterator so that levelIters will have loaded an sstable.

Done. Yeah, that's basically it. If I didn't add this part, all the cache hit stats would be 0.


internal/manifest/version.go, line 209 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/organized/are organized/g

Done.


internal/manifest/version.go, line 217 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Should probably also mention that L0Sublevels.Files contains every table in Files[0].

Done.


internal/manifest/version.go, line 218 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Making this a pointer implies that the zero value for Version needs some initialization. Is being able to copy the pointer when L0 isn't changing saving a lot vs making this a value?

I think suggesting initialization in general is a good idea - member variables like Compare and Formatter would make less sense at their zero values as well. The fact that we're able to reuse the relatively hefty struct in a no-change case is an added bonus.


internal/manifest/version.go, line 437 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Passing 1 will cause the error messages to be prefixed with L1 which is incorrect. Perhaps we should define a level type here which includes both level and sublevel components. Then if the sublevel component is some magic value we'd engage in the legacy L0 ordering check. This type could also provide the String() method which formats L<level>[.<sublevel>] correctly. I'm imagining something like:

const invalidSublevel = int32(-1)

type level struct {
  level int32
  sublevel int32
}

func makeSublevel(l, s int32) level {
  return level{
    level: l,
    sublevel: s,
  }
}

func makeLevel(level int32) level {
  return level{
    level: l,
    sublevel: invalidSublevel,
  }
}

func (l *level) String() string {
  if l.sublevel == invalidSublevel {
    return fmt.Sprintf("L%d", l.level)
  }
  return fmt.Sprintf("L%d.%d", l.level, l.sublevel)
}

The name level isn't create for this type. levelAndSublevel is more explicit, but also a bit of a mouthful to speak and type. Perhaps you have a better suggestion.

That would involve exporting the type, and for a little helper struct, it seemed too overkill. I just added a sublevel argument and exported InvalidSublevel so callers can use it. PTAL?

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 31 files reviewed, 3 unresolved discussions (waiting on @itsbilal, @jbowens, and @sumeerbhola)


db.go, line 711 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Just confirmed with a benchmark in cockroach. All the allocations in this function are either in the iterAllocPool.Get above, or down at the buf.merging.init. This closure doesn't add any.

Great. Thanks for checking.


internal/manifest/version.go, line 218 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I think suggesting initialization in general is a good idea - member variables like Compare and Formatter would make less sense at their zero values as well. The fact that we're able to reuse the relatively hefty struct in a no-change case is an added bonus.

Ack.


internal/manifest/version.go, line 437 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

That would involve exporting the type, and for a little helper struct, it seemed too overkill. I just added a sublevel argument and exported InvalidSublevel so callers can use it. PTAL?

Ok. I think this is good for now. There is an argument to be made that we should expert a Level type so that we can ensure consistent formatting of L<level>[.<sublevel>] everywhere, but it doesn't have to be done in this PR.


internal/manifest/testdata/version_check_ordering, line 19 at r4 (raw file):

  a.SET.1-b.SET.2
----
L0 files 000001 and 000002 are not properly ordered: <#3-#4> vs <#1-#2>

I don't see any test cases which have an error referencing L0.<sublevel>. Are those possible to create with the current test setup? I suspect we'll need to modify the test harness so that we can explicitly specify the files in a sublevel.

@itsbilal itsbilal force-pushed the l0-sublevels-iters-and-version branch 2 times, most recently from 170c602 to 3ca6ed7 Compare April 21, 2020 16:44
Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 31 files reviewed, 3 unresolved discussions (waiting on @jbowens, @petermattis, and @sumeerbhola)


internal/manifest/testdata/version_check_ordering, line 19 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I don't see any test cases which have an error referencing L0.<sublevel>. Are those possible to create with the current test setup? I suspect we'll need to modify the test harness so that we can explicitly specify the files in a sublevel.

Yup, that's what I did - updated the test harness to be able to work around the initialization code and specifically specify sublevels. Thanks!

This change effectively lowers read amplification in L0 by taking
advantage of knowledge about sublevels contained in
Version.L0SubLevels and using it to create levelIters for each sublevel
instead of directly adding one iterator per file to the mergingIter.
The get_iter is also updated to take advantage of this information.

The Version struct is also updated, as a result of this change, to
hold a reference to an L0SubLevels instance. That reference is initialized
during Version initialization in version_edit.

Most of the test changes are a result of always printing the sublevel
alongside the level for L0.
@itsbilal itsbilal force-pushed the l0-sublevels-iters-and-version branch from 3ca6ed7 to 089202d Compare April 21, 2020 18:58
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 0 of 29 files reviewed, all discussions resolved (waiting on @jbowens and @sumeerbhola)

@itsbilal
Copy link
Contributor Author

TFTR!

@itsbilal itsbilal merged commit 43f8d50 into cockroachdb:master Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants