Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics for deleted tables in system.table_statistics do not get removed #94195

Closed
jonstjohn opened this issue Dec 23, 2022 · 2 comments · Fixed by #105364
Closed

Statistics for deleted tables in system.table_statistics do not get removed #94195

jonstjohn opened this issue Dec 23, 2022 · 2 comments · Fixed by #105364
Assignees
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. docs-done docs-known-limitation E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team

Comments

@jonstjohn
Copy link
Collaborator

jonstjohn commented Dec 23, 2022

Describe the problem

The problem was first noticed when performance of SHOW TABLES on a specific cluster slowed to > 60 seconds. A statement trace showed that most time was spent on the following query to the system.table_statistics table:

SELECT
  s."tableID", max(s."rowCount")
FROM
  system.table_statistics AS s
  JOIN (
      SELECT "tableID", max("createdAt") AS last_dt FROM system.table_statistics GROUP BY "tableID"
    )
      AS l ON l."tableID" = s."tableID" AND l.last_dt = s."createdAt"
    AS OF SYSTEM TIME '-10s'
GROUP BY
  s."tableID"

The system.table_statistics table had grown quite large (> 5 million rows) and appeared to retain statistics on deleted tables.

To Reproduce

Start a single node.

cockroach start-single-node --insecure

Query for the count of rows in system.table_statistics by database name. A watch provides repeated output of the command.

./cockroach sql --execute 'select t.database_name, count(ts."tableID") from system.table_statistics ts left join crdb_internal.tables t on ts."tableID" = t.table_id group by t.database_name;' --watch 10s --insecure

(it may take a couple of minutes for the statistics to show for system tables)

Set the gc.ttlseconds on the default range to 500 seconds to force garbage collection of the table:

ALTER RANGE DEFAULT CONFIGURE ZONE USING gc.ttlseconds = 300;

Create a database and table:

CREATE DATABASE d;
CREATE TABLE d.t (k uuid, v string);

Observe that rows in system.table_statistics are created for the table.
(it may take a couple of minutes for the statistics to show for the newly created table)

Drop the database:

SET sql_safe_updates=false;
DROP DATABASE d;

Observe that the rows for the table are not deleted, even after waiting until garbage collection completes (5 minutes).

Expected behavior
When a table is dropped, either directly or via dropping a database, the related rows in system.table_statistics are deleted. This prevents accumulation of orphaned rows that can slow down the performance of SHOW TABLES.

Additional data / screenshots
None

Environment:

  • CockroachDB version: reproduced in 21.2.12 and 22.1.1
  • Server OS: reproduced on MacOS but also observed on Linux
  • Client app: cockroach sql

Additional context
May impact other queries or cluster operations.

Jira issue: CRDB-22733

@jonstjohn jonstjohn added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Dec 23, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Dec 23, 2022
@mgartner mgartner self-assigned this Dec 27, 2022
@mgartner
Copy link
Collaborator

This is a known limitation - we don't delete old stats. Moving to backlog.

@mgartner mgartner removed their assignment Dec 27, 2022
@jhaynie
Copy link

jhaynie commented Dec 29, 2022

As a workaround, we're now using this in a periodic cleanup script:

DELETE FROM system.table_statistics WHERE "tableID" NOT IN (SELECT table_id FROM crdb_internal.tables);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. docs-done docs-known-limitation E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

7 participants