Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: Clean up empty range directories after snapshots #84100

Merged
merged 1 commit into from
Jul 11, 2022

Conversation

itsbilal
Copy link
Member

@itsbilal itsbilal commented Jul 8, 2022

Previously, we were creating subdirectories for ranges and
range snapshots in the auxiliary directory every time we
accepted a snapshot, but only cleaning up the snapshot
subdirectories after a snapshot scratch space closed. This
left empty parent range directories around on the FS,
slowing down future calls to Pebble.Capacity() and indirectly
slowing down AddSSTable in the future.

This change adds code to clean up empty range directories
in the aux directory if they're not being used. Some coordination
and synchronization code had to be added to ensure we wouldn't
remove a directory that was just created by a concurrent snapshot.

Fixes #83137

Release note (bug fix, performance improvement): Addresses issue where
imports and rebalances were being slowed down due to the accumulation
of empty directories from range snapshot applications.

@itsbilal itsbilal requested a review from a team July 8, 2022 20:44
@itsbilal itsbilal requested a review from a team as a code owner July 8, 2022 20:44
@itsbilal itsbilal self-assigned this Jul 8, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@itsbilal itsbilal force-pushed the auxiliary-clean-rangedir branch from b1856a3 to cd76112 Compare July 8, 2022 20:44
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -42,14 +47,26 @@ func NewSSTSnapshotStorage(engine storage.Engine, limiter *rate.Limiter) SSTSnap
}
}

// Init initializes the SSTSnapshotStorage struct.
func (s *SSTSnapshotStorage) Init() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any reason we're not just doing this in NewSSTSnapshotStorage()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The goal is to avoid a Mutex copy as much as possible. The syntax to do it in one line without copying the return value in NewSSTSnapshotStorage is somewhat ugly but not as bad as it was in one of my earlier iterations (which had more stuff in mu), so I just went ahead and made that change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Why are we worried about the mutex copy? We're only calling this once on server startup, and there is no concurrency until it's constructed, but maybe I'm missing something.

pkg/kv/kvserver/replica_sst_snapshot_storage.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_sst_snapshot_storage.go Outdated Show resolved Hide resolved
@itsbilal itsbilal force-pushed the auxiliary-clean-rangedir branch from cd76112 to 05ee29e Compare July 11, 2022 15:23
@itsbilal
Copy link
Member Author

TFTR!

Copy link
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up! :lgtm:

Reviewed 1 of 4 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @erikgrinaker and @itsbilal)


-- commits line 19 at r2:
Do we not consider this to be a bug fix too? i.e. fixing the "leak"? No strong feelings either way, just wanted to check.


pkg/kv/kvserver/replica_sst_snapshot_storage.go line 96 at r2 (raw file):

		// a performance issue when we later walk directories in pebble.Capacity()
		// but not a correctness issue.
		_ = s.engine.RemoveAll(filepath.Join(s.dir, strconv.Itoa(int(rangeID))))

Are there any invariants worth checking in this deletion codepath? i.e. are there circumstances in which we should not delete a directory (or its children) that we should be checking for before proceeding? Or is the reference count enough.

Previously, we were creating subdirectories for ranges and
range snapshots in the auxiliary directory every time we
accepted a snapshot, but only cleaning up the snapshot
subdirectories after a snapshot scratch space closed. This
left empty parent range directories around on the FS,
slowing down future calls to Pebble.Capacity() and indirectly
slowing down AddSSTable in the future.

This change adds code to clean up empty range directories
in the aux directory if they're not being used. Some coordination
and synchronization code had to be added to ensure we wouldn't
remove a directory that was just created by a concurrent snapshot.

Fixes cockroachdb#83137.

Release note (bug fix, performance improvement): Addresses issue where
imports and rebalances were being slowed down due to the accumulation
of empty directories from range snapshot applications.
@itsbilal itsbilal force-pushed the auxiliary-clean-rangedir branch from 05ee29e to 73c5980 Compare July 11, 2022 20:03
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker and @nicktrav)


-- commits line 19 at r2:

Previously, nicktrav (Nick Travers) wrote…

Do we not consider this to be a bug fix too? i.e. fixing the "leak"? No strong feelings either way, just wanted to check.

Good point, fixed.


pkg/kv/kvserver/replica_sst_snapshot_storage.go line 51 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Thanks! Why are we worried about the mutex copy? We're only calling this once on server startup, and there is no concurrency until it's constructed, but maybe I'm missing something.

It just threw a warning in my IDE and I was guessing it'd throw a warning in the linter as well, but I never confirmed it. Either way, this syntax sidesteps that question.


pkg/kv/kvserver/replica_sst_snapshot_storage.go line 96 at r2 (raw file):

Previously, nicktrav (Nick Travers) wrote…

Are there any invariants worth checking in this deletion codepath? i.e. are there circumstances in which we should not delete a directory (or its children) that we should be checking for before proceeding? Or is the reference count enough.

Seeing as this scratch is the only thing that creates and uses that directory, and we delete the entire aux directory on node restart, it should be pretty safe.

Copy link
Collaborator

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker)


pkg/kv/kvserver/replica_sst_snapshot_storage.go line 96 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Seeing as this scratch is the only thing that creates and uses that directory, and we delete the entire aux directory on node restart, it should be pretty safe.

sgtm

@itsbilal
Copy link
Member Author

bors r=nicktrav

@craig
Copy link
Contributor

craig bot commented Jul 11, 2022

Build succeeded:

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/kv/kvserver/replica_sst_snapshot_storage.go line 62 at r3 (raw file):

	rangeStorage := s.mu.ranges[rangeID]
	if rangeStorage == 0 {
		s.mu.ranges[rangeID] = 1

The use of map[roachpb.RangeID]int suggests this is a refcount, but it doesn't seem to be. What is the semantics of this value? What is the case where this if-condition will not be true?
nit: if this is a refcount, I'd suggest renaming ranges to rangeRefCount.


pkg/kv/kvserver/replica_sst_snapshot_storage.go line 89 at r3 (raw file):

		panic("inconsistent scratch ref count")
	}
	val--

looks like a refcount now. But we didn't ++ it in NewScratchSpace. I am confused.

itsbilal added a commit to itsbilal/cockroach that referenced this pull request Jul 13, 2022
Previously, in rare cases where we had two SSTSnapshotStorageScratch
for a given range, we'd have not incremented the refcount for the
second scratch creation, leading to an inconsistent refcount
panic down the line. This change fixes it and adds a test to
exercise the concurrent case.

Bug popped up in cockroachdb#84100.

Release note: None.
@itsbilal
Copy link
Member Author

pkg/kv/kvserver/replica_sst_snapshot_storage.go line 62 at r3 (raw file):

Previously, sumeerbhola wrote…

The use of map[roachpb.RangeID]int suggests this is a refcount, but it doesn't seem to be. What is the semantics of this value? What is the case where this if-condition will not be true?
nit: if this is a refcount, I'd suggest renaming ranges to rangeRefCount.

Oops, yes it's a refcount and it's missing a ++ here in the non-empty case. Good catch. I'm pushing a PR for this.

craig bot pushed a commit that referenced this pull request Jul 13, 2022
84282: sql: new cluster setting for index recommendation r=maryliag a=maryliag

As part of #83782, we will generate index recommendation
per statement fingerprint id. As a safety, creating the
cluster setting
`sql.metrics.statement_details.index_recommendation_collection.enabled`
that can be disable, in case there are any issues with
performance.

Release note (sql change): Creation of cluster setting
`sql.metrics.statement_details.index_recommendation_collection.enabled`
that can be disabled if index recommendation generation is
causing performance issues.

84316: authors: add Jake to authors r=j82w a=j82w

Release note: None

84367: kvserver: Fix missing increment in SSTSnapshotStorage refcount r=erikgrinaker a=itsbilal

Previously, in rare cases where we had two SSTSnapshotStorageScratch
for a given range, we'd have not incremented the refcount for the
second scratch creation, leading to an inconsistent refcount
panic down the line. This change fixes it and adds a test to
exercise the concurrent case.

Bug popped up in #84100.

Release note: None.

Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: j82w <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
itsbilal added a commit that referenced this pull request Jul 13, 2022
Previously, in rare cases where we had two SSTSnapshotStorageScratch
for a given range, we'd have not incremented the refcount for the
second scratch creation, leading to an inconsistent refcount
panic down the line. This change fixes it and adds a test to
exercise the concurrent case.

Bug popped up in #84100.

Release note: None.
itsbilal added a commit that referenced this pull request Jul 18, 2022
Previously, in rare cases where we had two SSTSnapshotStorageScratch
for a given range, we'd have not incremented the refcount for the
second scratch creation, leading to an inconsistent refcount
panic down the line. This change fixes it and adds a test to
exercise the concurrent case.

Bug popped up in #84100.

Release note: None.
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Sep 19, 2022
Previously, in rare cases where we had two SSTSnapshotStorageScratch
for a given range, we'd have not incremented the refcount for the
second scratch creation, leading to an inconsistent refcount
panic down the line. This change fixes it and adds a test to
exercise the concurrent case.

Bug popped up in cockroachdb#84100.

Release note: None.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: EngineCapacity call is progressively expensive on an import in a large cluster
5 participants