Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlstats: fix reset in-memory sql stats on flush #121156

Merged
merged 1 commit into from
Mar 27, 2024

Conversation

xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Mar 26, 2024

After flushing in-memory sql stats to disk, we reset and prep each app container for reuse by:

  • Decrementing the per-node fingerprint counter by the size of the app container. This counter prevents us from writing more sql stats when we reach the maximum amount of fingerprints stored in memory.
  • Clearing the container and reducing its capacity to 1/2.

When introducing atomic flushing, we swapped the 2 ops above in the reset step, resulting in the decrement step being a noop. The counter never resets and eventually results in each attempt at writing sql stats to be throttled which then also signals the sql stats flush worker.

Epic: none
Fixes: #121134

Release note: None

@xinhaoz xinhaoz requested review from a team and dhartunian and removed request for a team March 26, 2024 21:28
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: just add some docs before merging.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @xinhaoz)


pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go line 663 at r1 (raw file):

}

func (s *Container) clearLocked(ctx context.Context) {

can you add a docstring here to explain that freeLocked needs to happen first and why.

After flushing in-memory sql stats to disk, we reset and prep
each app container for reuse by:
- Decrementing the per-node fingerprint counter by the size
of the app container. This counter prevents us from writing
more sql stats when we reach the maximum amount of fingerprints
stored in memory.
- Clearing the container and reducing its capacity to 1/2.

When introducing atomic flushing, we swapped the 2 ops above in the
reset step, resulting in the decrement step being a noop.
The counter never resets and eventually results in each attempt
at writing sql stats to be throttled which then also signals
the sql stats flush worker.

Epic: none
Fixes: cockroachdb#121134

Release note: None
@xinhaoz
Copy link
Member Author

xinhaoz commented Mar 27, 2024

bors r+

@craig craig bot merged commit 66517cd into cockroachdb:master Mar 27, 2024
16 of 22 checks passed
@xinhaoz xinhaoz deleted the fix-reset-flush branch April 1, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sqlstats: fix reset stats container on flush
3 participants