Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: stats collection doesn't get re-tried after a server shutdown #100482

Open
knz opened this issue Apr 3, 2023 · 4 comments
Open

sql: stats collection doesn't get re-tried after a server shutdown #100482

knz opened this issue Apr 3, 2023 · 4 comments
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@knz
Copy link
Contributor

knz commented Apr 3, 2023

Discovered by @rytaft

Describe the problem

The automatic stats collection (used by SQL query planning) is triggered lazily when tables encounter mutations.

This lazy mechanism is based on an in-memory FIFO (a Go channel).

If a server gets shut down after a SQL mutation completes but before the mutation was popped from this FIFO (and thus before the job to compute the stats has been created), there will never be any re-computation.

This can result in bad query plans for tables that had a bunch of changes just before a node shuts down and then never change again.

To Reproduce

Issue a large number of mutations to a table and issue a server shutdown in the middle of these mutations.
Then restart the server and do not modify the table again. Notice that the stats remain out of date forever.

Expected behavior

  • Either persist the in-flight FIFO queue of mutated tables so they get a chance to see their jobs created when the server starts again.
  • Wait during graceful shutdown until all the jobs are created, but without running them. (This could delay the drain in an unacceptable way if there were many tables. Testing should evaluate this overhead)

Environment:

crdb v23.1 and master branch

Jira issue: CRDB-26459

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-table-stats Table statistics (and their automatic refresh). T-sql-queries SQL Queries Team labels Apr 3, 2023
@rytaft
Copy link
Collaborator

rytaft commented Apr 3, 2023

Thanks for filing, @knz!

I think this potential problem has existed since auto stats were first introduced, but we haven't seen it become a major problem in practice. Putting this on the backlog, but we should keep an eye out in case this does become a bigger problem.

@knz
Copy link
Contributor Author

knz commented Aug 15, 2023

One concrete example of this bug is the risk to leave stale entries for dropped tables in system.table_statistics.
See #108813.

@michae2
Copy link
Collaborator

michae2 commented Aug 15, 2023

One concrete example of this bug is the risk to leave stale entries for dropped tables in system.table_statistics. See #108813.

Cleaning up after dropped tables was recently fixed by #105364.

@yuzefovich
Copy link
Member

One concrete example of this bug is the risk to leave stale entries for dropped tables in system.table_statistics.

This seems orthogonal to the original issue. Mutations don't currently drive whether the stats for dropped tables are removed (and that has been fixed by #105364 as Michael mentioned). The original issue indicating that FIFO of mutations' notifications are not persisted across node restarts, and as a result we might not collect new table stats even though we should, that issue is still present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

4 participants