Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: retry sql stats compaction job #107306

Closed

Conversation

zachlite
Copy link
Contributor

If the sql stats compaction job fails, the error is returned to the job system, where an hour must elapse (by default) before compaction can try again.

This commit introduces a compaction job level retry, where compaction can retry up to 10 times before the error is returned to the jobs system. Retries are idempotent.

The error semantics are preserved. If all 10 attempts fail, the most recent error is returned.

Resolves #107108
Epic: none
Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Jul 20, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@j82w
Copy link
Contributor

j82w commented Jul 20, 2023

pkg/sql/compact_sql_stats.go line 85 at r1 (raw file):

	}

	err = retry.WithMaxAttempts(ctx, retryOpts, 10 /* maxAttempts */, func() error {

Please add a test.

@maryliag maryliag requested a review from a team July 20, 2023 20:57
@maryliag maryliag added backport-22.2.x backport-23.1.x Flags PRs that need to be backported to 23.1 labels Jul 20, 2023
@THardy98 THardy98 requested review from a team as code owners July 24, 2023 18:12
@THardy98 THardy98 requested review from a team July 24, 2023 18:12
@THardy98 THardy98 requested review from a team as code owners July 24, 2023 18:12
@THardy98 THardy98 requested review from rhu713, mgartner, RaduBerinde and ericharmeling and removed request for a team July 24, 2023 18:12
@THardy98 THardy98 force-pushed the 230720.sql-stats-compaction-retry branch from c2dc039 to c5ee7c2 Compare July 24, 2023 18:18
@THardy98 THardy98 requested a review from a team as a code owner July 24, 2023 18:18
If the sql stats compaction job fails, the error is returned
to the job system, where an hour must elapse (by default) before
compaction can try again.

This commit introduces a compaction job level retry, where compaction
can retry up to 10 times before the error is returned to the jobs system.
Retries are idempotent.

The error semantics are preserved. If all 10 attempts fail,
the most recent error is returned.

Resolves cockroachdb#107108
Epic: none
Release note: None
@THardy98 THardy98 force-pushed the 230720.sql-stats-compaction-retry branch from c5ee7c2 to 5ad1747 Compare July 24, 2023 18:19
@THardy98 THardy98 force-pushed the 230720.sql-stats-compaction-retry branch 2 times, most recently from a4a3a80 to 99002f7 Compare July 24, 2023 21:30
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @zachlite)


-- commits line 28 at r3:
nit: currently this setting isn't marked as "public" (i.e. it's missing WithPublic() suffix), yet we mention it in the release note. Release notes should contain things that are intended to be exposed to users, and in almost all cases we'd expect the setting to be public if we wanted users to know about. In short, there is a mismatch - we either don't mention the cluster setting in the release note (which I would do) or we mark the setting as public so that it's properly documented.


pkg/sql/compact_sql_stats.go line 86 at r3 (raw file):

	maxRetryAttempts := int(persistedsqlstats.CompactionNumRetryAttempts.Get(&r.st.SV))

	if p.ExecCfg().SQLStatsTestingKnobs != nil && p.ExecCfg().SQLStatsTestingKnobs.OverrideCompactionRetryOptions != nil {

nit: p.ExecCfg().SQLStatsTestingKnobs could be extracted into a local variable.

@THardy98 THardy98 force-pushed the 230720.sql-stats-compaction-retry branch from 99002f7 to 79c53d5 Compare July 25, 2023 13:38
@THardy98
Copy link

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @zachlite)

-- commits line 28 at r3: nit: currently this setting isn't marked as "public" (i.e. it's missing WithPublic() suffix), yet we mention it in the release note. Release notes should contain things that are intended to be exposed to users, and in almost all cases we'd expect the setting to be public if we wanted users to know about. In short, there is a mismatch - we either don't mention the cluster setting in the release note (which I would do) or we mark the setting as public so that it's properly documented.

pkg/sql/compact_sql_stats.go line 86 at r3 (raw file):

	maxRetryAttempts := int(persistedsqlstats.CompactionNumRetryAttempts.Get(&r.st.SV))

	if p.ExecCfg().SQLStatsTestingKnobs != nil && p.ExecCfg().SQLStatsTestingKnobs.OverrideCompactionRetryOptions != nil {

nit: p.ExecCfg().SQLStatsTestingKnobs could be extracted into a local variable.

Added WithPublic and addressed the nit

retry logic

This change adds a test for the compaction retry logic as well as a
cluster setting `sql.stats.cleanup.num_retries` to configure the number
of retry attempts before the compaction job fails.

Release note(sql change): add a cluster setting
`sql.stats.cleanup.num_retries` to configure the number of retry
attempts for the sql stats compaction job before it fails
@THardy98 THardy98 force-pushed the 230720.sql-stats-compaction-retry branch from 79c53d5 to 5ea7958 Compare July 25, 2023 13:57
@THardy98 THardy98 closed this Jul 27, 2023
@THardy98
Copy link

Closing after contention fix: #107549

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

persistedsqlstats: add retries to StatsCompactor
6 participants