-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: automatically delete stats for dropped tables #105364
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
33b81ff
to
c276738
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
we currently don't have a way to easily force schema GC on the dropped tables
Is there no way to make the GC interval smaller as a testing knob? It would be nice to have some sort of non-manual test so we don't break this feature in the future...
Fixes: #94915.
This doesn't look like the right issue
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/stats/automatic_stats.go
line 601 at r1 (raw file):
select { case <-intervalChangedCh: continue
Is it worth adding some logging here (and for the same case below)?
ca1a01c
to
25cc72b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to make the GC interval smaller as a testing knob? It would be nice to have some sort of non-manual test so we don't break this feature in the future...
I asked about this on slack, and my conclusion from that thread is that it is currently not easy to speed up GC of the dropped table. To go around it, I've just introduced a testing knob and added the corresponding test. It'd be good to get sign off from @chengxiong-ruan on this.
This doesn't look like the right issue
Indeed, nice catch, thanks!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @rytaft)
pkg/sql/stats/automatic_stats.go
line 601 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Is it worth adding some logging here (and for the same case below)?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gc jobs change lgtm.
pkg/sql/schema_changer.go
Outdated
|
||
// If true, disables the usage of DelRange (i.e. falls back to the legacy | ||
// way). | ||
DisableDelRange bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a TODO (to me) to remove this when we figure out how to control the mvcc gc queue? Thanks.
25cc72b
to
874a00e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @rytaft)
pkg/sql/schema_changer.go
line 2332 at r2 (raw file):
Previously, chengxiong-ruan (Chengxiong Ruan) wrote…
Could add a TODO (to me) to remove this when we figure out how to control the mvcc gc queue? Thanks.
Done.
The diff --git a/pkg/sql/stats/delete_stats_dropped_table_test.go b/pkg/sql/stats/delete_stats_dropped_table_test.go
index 7208606f8a5..ae723b572bb 100644
--- a/pkg/sql/stats/delete_stats_dropped_table_test.go
+++ b/pkg/sql/stats/delete_stats_dropped_table_test.go
@@ -31,11 +31,7 @@ func TestStatsAreDeletedForDroppedTables(t *testing.T) {
defer log.Scope(t).Close(t)
params, _ := tests.CreateTestServerParams()
- params.Knobs.GCJob = &sql.GCJobTestingKnobs{
- // We disable the usage of DeleteRange primitive in order to speed up
- // the test.
- DisableDelRange: true,
- }
+ params.ScanMaxIdleTime = time.Millisecond // speed up MVCC GC queue scans
s, sqlDB, _ := serverutils.StartServer(t, params)
defer s.Stopper().Stop(context.Background())
runner := sqlutils.MakeSQLRunner(sqlDB)
@@ -44,6 +40,10 @@ func TestStatsAreDeletedForDroppedTables(t *testing.T) {
runner.Exec(t, "SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;")
// Lower the garbage collection interval to speed up the test.
runner.Exec(t, "SET CLUSTER SETTING sql.stats.garbage_collection_interval = '1s';")
+ // Poll for MVCC GC more frequently.
+ runner.Exec(t, "SET CLUSTER SETTING sql.gc_job.wait_for_gc.interval = '1s';")
+ // Cached protected timestamp state delays MVCC GC, update it every second.
+ runner.Exec(t, "SET CLUSTER SETTING kv.protectedts.poll_interval = '1s';")
// Create a table with short TTL and collect stats on it.
runner.Exec(t, "CREATE TABLE t (k PRIMARY KEY) AS SELECT 1;") |
874a00e
to
323a9f5
Compare
This commit adds another task to the stats refresher that periodically runs a query to delete stats for dropped tables from the system table. By default, this query runs once an hour, but this can be configured via a cluster setting (which also exposes a way to disable this new "stats garbage collector" when it is set to 0). The query also limits the number of dropped tables to process at once to 1000 by default (controled via another cluster setting). The rationale for introducing the limit is to prevent a huge DELETE when the cluster that has been running for long time with many dropped tables has just upgraded to the binary with this fix. Release note (bug fix): CockroachDB now automatically deletes statistics for dropped tables from `system.table_statistics` table.
323a9f5
to
00388a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Erik for figuring this out! Updated the PR with the suggestion.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r2, 4 of 4 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @yuzefovich)
pkg/sql/stats/automatic_stats.go
line 601 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Done.
How about for the interval changed? Or do we already log changes to cluster settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs!
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @chengxiong-ruan, @michae2, and @rytaft)
pkg/sql/stats/automatic_stats.go
line 601 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
How about for the interval changed? Or do we already log changes to cluster settings?
We don't, but I don't think we log setting changes in other similar situations (e.g. statement diagnostics registry has "poling interval", but we don't explicitly log when the interval is changed via the cluster setting), so I'll keep it out here too.
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 00388a1 to blathers/backport-release-22.2-105364: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This commit adds another task to the stats refresher that periodically runs a query to delete stats for dropped tables from the system table. By default, this query runs once an hour, but this can be configured via a cluster setting (which also exposes a way to disable this new "stats garbage collector" when it is set to 0). The query also limits the number of dropped tables to process at once to 1000 by default (controled via another cluster setting). The rationale for introducing the limit is to prevent a huge DELETE when the cluster that has been running for long time with many dropped tables has just upgraded to the binary with this fix.
Fixes: #94195.
Release note (bug fix): CockroachDB now automatically deletes statistics for dropped tables from
system.table_statistics
table.