Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: automatically delete stats for dropped tables #105364

Merged
merged 1 commit into from
Jun 24, 2023

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jun 22, 2023

This commit adds another task to the stats refresher that periodically runs a query to delete stats for dropped tables from the system table. By default, this query runs once an hour, but this can be configured via a cluster setting (which also exposes a way to disable this new "stats garbage collector" when it is set to 0). The query also limits the number of dropped tables to process at once to 1000 by default (controled via another cluster setting). The rationale for introducing the limit is to prevent a huge DELETE when the cluster that has been running for long time with many dropped tables has just upgraded to the binary with this fix.

Fixes: #94195.

Release note (bug fix): CockroachDB now automatically deletes statistics for dropped tables from system.table_statistics table.

@yuzefovich yuzefovich added backport-22.2.x backport-23.1.x Flags PRs that need to be backported to 23.1 labels Jun 22, 2023
@yuzefovich yuzefovich requested review from rytaft, michae2 and a team June 22, 2023 17:33
@blathers-crl
Copy link

blathers-crl bot commented Jun 22, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

we currently don't have a way to easily force schema GC on the dropped tables

Is there no way to make the GC interval smaller as a testing knob? It would be nice to have some sort of non-manual test so we don't break this feature in the future...

Fixes: #94915.

This doesn't look like the right issue

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)


pkg/sql/stats/automatic_stats.go line 601 at r1 (raw file):

				select {
				case <-intervalChangedCh:
					continue

Is it worth adding some logging here (and for the same case below)?

@yuzefovich yuzefovich force-pushed the delete-old-stats branch 2 times, most recently from ca1a01c to 25cc72b Compare June 22, 2023 21:58
@yuzefovich yuzefovich requested a review from a team as a code owner June 22, 2023 21:58
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no way to make the GC interval smaller as a testing knob? It would be nice to have some sort of non-manual test so we don't break this feature in the future...

I asked about this on slack, and my conclusion from that thread is that it is currently not easy to speed up GC of the dropped table. To go around it, I've just introduced a testing knob and added the corresponding test. It'd be good to get sign off from @chengxiong-ruan on this.

This doesn't look like the right issue

Indeed, nice catch, thanks!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @rytaft)


pkg/sql/stats/automatic_stats.go line 601 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Is it worth adding some logging here (and for the same case below)?

Done.

Copy link
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gc jobs change lgtm.


// If true, disables the usage of DelRange (i.e. falls back to the legacy
// way).
DisableDelRange bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a TODO (to me) to remove this when we figure out how to control the mvcc gc queue? Thanks.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @rytaft)


pkg/sql/schema_changer.go line 2332 at r2 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Could add a TODO (to me) to remove this when we figure out how to control the mvcc gc queue? Thanks.

Done.

@erikgrinaker
Copy link
Contributor

The DisableDelRange testing knob can be replaced with this:

diff --git a/pkg/sql/stats/delete_stats_dropped_table_test.go b/pkg/sql/stats/delete_stats_dropped_table_test.go
index 7208606f8a5..ae723b572bb 100644
--- a/pkg/sql/stats/delete_stats_dropped_table_test.go
+++ b/pkg/sql/stats/delete_stats_dropped_table_test.go
@@ -31,11 +31,7 @@ func TestStatsAreDeletedForDroppedTables(t *testing.T) {
        defer log.Scope(t).Close(t)
 
        params, _ := tests.CreateTestServerParams()
-       params.Knobs.GCJob = &sql.GCJobTestingKnobs{
-               // We disable the usage of DeleteRange primitive in order to speed up
-               // the test.
-               DisableDelRange: true,
-       }
+       params.ScanMaxIdleTime = time.Millisecond // speed up MVCC GC queue scans
        s, sqlDB, _ := serverutils.StartServer(t, params)
        defer s.Stopper().Stop(context.Background())
        runner := sqlutils.MakeSQLRunner(sqlDB)
@@ -44,6 +40,10 @@ func TestStatsAreDeletedForDroppedTables(t *testing.T) {
        runner.Exec(t, "SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;")
        // Lower the garbage collection interval to speed up the test.
        runner.Exec(t, "SET CLUSTER SETTING sql.stats.garbage_collection_interval = '1s';")
+       // Poll for MVCC GC more frequently.
+       runner.Exec(t, "SET CLUSTER SETTING sql.gc_job.wait_for_gc.interval = '1s';")
+       // Cached protected timestamp state delays MVCC GC, update it every second.
+       runner.Exec(t, "SET CLUSTER SETTING kv.protectedts.poll_interval = '1s';")
 
        // Create a table with short TTL and collect stats on it.
        runner.Exec(t, "CREATE TABLE t (k PRIMARY KEY) AS SELECT 1;")

This commit adds another task to the stats refresher that periodically
runs a query to delete stats for dropped tables from the system table.
By default, this query runs once an hour, but this can be configured via
a cluster setting (which also exposes a way to disable this new "stats
garbage collector" when it is set to 0). The query also limits the
number of dropped tables to process at once to 1000 by default
(controled via another cluster setting). The rationale for introducing
the limit is to prevent a huge DELETE when the cluster that has been
running for long time with many dropped tables has just upgraded to the
binary with this fix.

Release note (bug fix): CockroachDB now automatically deletes statistics
for dropped tables from `system.table_statistics` table.
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Erik for figuring this out! Updated the PR with the suggestion.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thanks for adding the test!

Reviewed 2 of 5 files at r2, 4 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @yuzefovich)


pkg/sql/stats/automatic_stats.go line 601 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Done.

How about for the interval changed? Or do we already log changes to cluster settings?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @rytaft)


pkg/sql/stats/automatic_stats.go line 601 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

How about for the interval changed? Or do we already log changes to cluster settings?

We don't, but I don't think we log setting changes in other similar situations (e.g. statement diagnostics registry has "poling interval", but we don't explicitly log when the interval is changed via the cluster setting), so I'll keep it out here too.

@craig
Copy link
Contributor

craig bot commented Jun 24, 2023

Build succeeded:

@craig craig bot merged commit 7fd4c21 into cockroachdb:master Jun 24, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jun 24, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 00388a1 to blathers/backport-release-22.2-105364: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Statistics for deleted tables in system.table_statistics do not get removed
5 participants