sqlstats: fix reset stats container on flush #121134

xinhaoz · 2024-03-26T17:57:59Z

cockroach/pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go

Lines 607 to 613 in a44e43c

    
           s.mu.stmts = make(map[stmtKey]*stmtStats, len(s.mu.stmts)/2) 
        
           s.mu.txns = make(map[appstatspb.TransactionFingerprintID]*txnStats, len(s.mu.txns)/2) 
        
           s.mu.sampledPlanMetadataCache = make(map[sampledPlanKey]time.Time, len(s.mu.sampledPlanMetadataCache)/2) 
        
           s.freeLocked(ctx) 
        
           if s.knobs != nil && s.knobs.OnAfterClear != nil { 
        
           	s.knobs.OnAfterClear() 
        
           }

In the above, freeLocked is called after clearing the containers. Within freeLocked, the size of each container cleared is used to decrement uniqueServerCount which is used to track the total number of in-memory fingerprints tracked by sql stats. Due to these steps we never actually decrement the counter after each flush, causing the uniqueServerCount to grow erroneously. This can cause us to incorrectly spam sending the memory pressure signal to the sql stats flush worker, causing rapid flushing of empty containers.

Jira issue: CRDB-37092

The text was updated successfully, but these errors were encountered:

blathers-crl · 2024-03-26T17:58:02Z

Hi @xinhaoz, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

121086: kv: use response keys of ranged writes for lock tracking r=nvanbenschoten a=nvanbenschoten Informs #117978. This PR updates the `txnPipeliner` to use the response keys from `Get`, `Scan`, `ReverseScan`, and `DeleteRange` requests to track pipelined and non-pipelined lock acquisitions / intent writes, instead of assuming that the requests could have left intents anywhere in their request span. This more precise tracking avoid broad ranged intent resolution when more narrow intent resolution is possible. It will also be used by #117978 to track in-flight replicated lock acquisition. This is a WIP. Some tests need to be updated. Release note: None 121152: roachtest: adjust WAL failover disk stall roachtest's logging config r=sumeerbhola a=jbowens The previous logging config outputted all logs to stderr too, which is not buffered and could become blocked on the stalled disk. Epic: none Release note: none 121156: sqlstats: fix reset in-memory sql stats on flush r=xinhaoz a=xinhaoz After flushing in-memory sql stats to disk, we reset and prep each app container for reuse by: - Decrementing the per-node fingerprint counter by the size of the app container. This counter prevents us from writing more sql stats when we reach the maximum amount of fingerprints stored in memory. - Clearing the container and reducing its capacity to 1/2. When introducing atomic flushing, we swapped the 2 ops above in the reset step, resulting in the decrement step being a noop. The counter never resets and eventually results in each attempt at writing sql stats to be throttled which then also signals the sql stats flush worker. Epic: none Fixes: #121134 Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]>

xinhaoz added the T-observability label Mar 26, 2024

xinhaoz self-assigned this Mar 26, 2024

xinhaoz mentioned this issue Mar 26, 2024

roachtest: weekly/tpcc/headroom failed #120978

Closed

xinhaoz added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 26, 2024

xinhaoz mentioned this issue Mar 26, 2024

sqlstats: fix reset in-memory sql stats on flush #121156

Merged

craig bot closed this as completed in ee97259 Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlstats: fix reset stats container on flush #121134

sqlstats: fix reset stats container on flush #121134

xinhaoz commented Mar 26, 2024 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Mar 26, 2024

sqlstats: fix reset stats container on flush #121134

sqlstats: fix reset stats container on flush #121134

Comments

xinhaoz commented Mar 26, 2024 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Mar 26, 2024

xinhaoz commented Mar 26, 2024 •

edited by cockroach-jira-scripts

Loading