Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add setting to seed up consistency checks in tests #116892

Merged
merged 1 commit into from
Dec 21, 2023

Conversation

itsbilal
Copy link
Member

Previously, we'd fall back to the 3 second consistency checker EventuallyFileOnlySnapshot (EFOS) wait in roachtests, which was slowing all of
them down when we ran every replica through the consistency checker in post-test assertions. This change speeds up that consistency check in roachtest post-test assertions by flipping a new cluster setting to speed up EFOS waits for consistency checks after a roachtest finishes.

Epic: none
Unblocks #116330.

Release note: None

@itsbilal itsbilal requested review from a team as code owners December 20, 2023 21:19
@itsbilal itsbilal requested review from a team, srosenberg, DarrylWong and aadityasondhi and removed request for a team December 20, 2023 21:19
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor

Does this imply that running crdb_internal.check_consistency() is going to be generally slow?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi, @DarrylWong, @itsbilal, and @srosenberg)


pkg/storage/engine.go line 638 at r1 (raw file):

	// argument specifies how long to wait for before attempting a flush to
	// force a transition to a file-only snapshot.
	WaitForFileOnly(context.Context, time.Duration) error

[nit] I'd name give the parameter a useful name, maybe gracePeriodBeforeFlush. Ideally we'd plumb that through the implementation - dur is not useful, and in fact is misleading: a generic duration argument to a method like Wait usually means that we'll give up waiting after that duration.

@itsbilal itsbilal force-pushed the efos-speed-up-testing branch from bfb9b74 to ce26047 Compare December 21, 2023 16:34
Previously, we'd fall back to the 3 second consistency
checker EventuallyFileOnlySnapshot (EFOS) wait in roachtests,
which was slowing all of
them down when we ran every replica through the consistency
checker in post-test assertions. This change speeds up that
consistency check in roachtest post-test assertions by
flipping a new cluster setting to speed up EFOS waits for
consistency checks after a roachtest finishes.

Epic: none
Unblocks cockroachdb#116330.

Release note: None
@itsbilal itsbilal force-pushed the efos-speed-up-testing branch from ce26047 to f2ae78b Compare December 21, 2023 18:18
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

@erikgrinaker that's correct but only if we're running that command with little/no background writes happening. Flushes will result in forward progress on WaitForFileOnly without the 3 second wait.

In general when we were making this change, we were under the impression that consistency checks are usually a background operation. If this is an undesirable knock-on effect of this change we can look into ways to speed up consistency checks too.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aadityasondhi, @DarrylWong, and @srosenberg)


pkg/storage/engine.go line 638 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] I'd name give the parameter a useful name, maybe gracePeriodBeforeFlush. Ideally we'd plumb that through the implementation - dur is not useful, and in fact is misleading: a generic duration argument to a method like Wait usually means that we'll give up waiting after that duration.

Done.

@itsbilal
Copy link
Member Author

bors r=RaduBerinde

@craig
Copy link
Contributor

craig bot commented Dec 21, 2023

Build succeeded:

@craig craig bot merged commit 7059101 into cockroachdb:master Dec 21, 2023
9 checks passed
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were under the impression that consistency checks are usually a background operation. If this is an undesirable knock-on effect of this change we can look into ways to speed up consistency checks too.

That's generally true, except for crdb_internal.check_consistency(). I doubt it's used much outside of tests. We could consider disabling EFOS for foreground consistency checks.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

craig bot pushed a commit that referenced this pull request Jan 3, 2024
117236: roachtest: fix consistency check settings error r=erikgrinaker a=erikgrinaker

Fixes this error, which effectively disabled post-test consistency checks as we have to ignore errors:

```
SET CLUSTER SETTING cannot be used inside a multi-statement transaction
```

Follows #116892.
Epic: none
Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
@nvanbenschoten
Copy link
Member

That's generally true, except for crdb_internal.check_consistency(). I doubt it's used much outside of tests. We could consider disabling EFOS for foreground consistency checks.

We do know at this level whether we're running on behalf of the consistency queue or on behalf of the crdb_internal.check_consistency() builtin. The mode param will have a value of ChecksumMode_CHECK_VIA_QUEUE when from the queue and either ChecksumMode_CHECK_FULL or ChecksumMode_CHECK_STATS when manually run.

It would probably make sense to disable EFOS for both. If not, we should at least do it for ChecksumMode_CHECK_STATS, which doesn't iterate over the range and should be quick.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jan 3, 2024
…checks

See discussion in cockroachdb#116892.

Epic: None
Release note: None
@nvanbenschoten
Copy link
Member

I'll make the less contentious change for now. #117297.

@erikgrinaker
Copy link
Contributor

blathers backport 23.2

Copy link

blathers-crl bot commented Jan 10, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from f2ae78b to blathers/backport-release-23.2-116892: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@erikgrinaker
Copy link
Contributor

Nevermind, I guess we didn't enable this by default for 23.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants