-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: atomic flushing of sql stats #116444
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @maryliag, and @xinhaoz)
pkg/sql/sqlstats/sslocal/sql_stats.go
line 164 at r1 (raw file):
// MaybeDumpStatsToLog flushes stats into target If it is not nil. func (s *SQLStats) MaybeDumpStatsToLog(
Refactored to stand alone function without any changes. The main reason is to reuse only specific part of this functionality,
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 151 at r1 (raw file):
app := app container := s.GetApplicationStats(app, true).(*ssmemstorage.Container) if err := s.MaybeDumpStatsToLog(ctx, app, container, s.flushTarget); err != nil {
Dump stats to logs before we reset all stats from in-memory storage
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
line 599 at r1 (raw file):
// PopAllStatementsStats returns all collected statement stats in memory to the caller and clears SQL stats // make sure that new arriving stats won't be interfering with existing one. func (s *Container) PopAllStatementsStats() []*appstatspb.CollectedStatementStatistics {
Logic of constructing CollectedStatementStatistics
and CollectedTransactionStatistics
in PopAllStatementsStats()
and PopAllTransactionsStats()
functions are borrowed from StmtStatsIterator.Next
function, nothing new is added. Couldn't reuse it because needed everything here under the single Lock
.
2afef14
to
478a1a3
Compare
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
478a1a3
to
986ca15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction this PR is headed, and I think these are important changes to make!
I know @maryliag had some memory concerns, so we should outline those here and consider how to address those. IIRC, when we dump the cache to flush, the concern is that the new cache continues to grow before we've released the one we're flushing (e.g. 2x the current memory limit). I'm sure we could address those, for example by 1/2ing the mem limit - although admittedly I'm not sure the implications of doing so.
I had a different idea in my mind on how I expected this to look re: the atomic flushing. I tried to explain below, let me know what you think.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @koorosh, @maryliag, and @xinhaoz)
-- commits
line 24 at r3:
Good callout. What would this look like to a user? Would the user see the in-memory stats in the UI, refresh the page, and suddenly they're missing until they're finally persisted?
Code quote:
New implementation does have disadvantage, it might cause glitches when we pop
stats from in-memory storage and before persisting them - user might not see
up to date information. It is assumed that this is better than having missing
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 100 at r3 (raw file):
}) if s.cfg.Knobs != nil && s.cfg.Knobs.OnStmtStatsFlushFinished != nil {
Do we need to maintain OnTxnStatsFlushFinished()
as well? e.g.:
if s.cfg.Knobs != nil && s.cfg.Knobs.OnTxnStatsFlushFinished != nil {
s.cfg.Knobs.OnTxnStatsFlushFinished()
}
I noticed it from the previous diff.
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 151 at r1 (raw file):
Previously, koorosh (Andrii Vorobiov) wrote…
Dump stats to logs before we reset all stats from in-memory storage
Underneath, this causes an entirely separate iteration of the items in the ssmemstorage.Container
. I wonder if we can somehow group this logic together with the existing iterations we do below to avoid iterating twice?
Seems like it's only done for stmtStats, not txnStats... So maybe it can be included as part of the 1st goroutine that iterates the stmtStats? LMK what you think, maybe it's a premature optimization 🤷♂️.
cockroach/pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
Lines 575 to 595 in 986ca15
func (s *Container) SaveToLog(ctx context.Context, appName string) { | |
s.mu.Lock() | |
defer s.mu.Unlock() | |
if len(s.mu.stmts) == 0 { | |
return | |
} | |
var buf bytes.Buffer | |
for key, stats := range s.mu.stmts { | |
json, err := func() ([]byte, error) { | |
stats.mu.Lock() | |
defer stats.mu.Unlock() | |
return json.Marshal(stats.mu.data) | |
}() | |
if err != nil { | |
log.Errorf(ctx, "error while marshaling stats for %q // %q: %v", appName, key.String(), err) | |
continue | |
} | |
fmt.Fprintf(&buf, "%q: %s\n", key.String(), json) | |
} | |
log.Infof(ctx, "statistics for %q:\n%s", appName, buf.String()) | |
} |
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 152 at r3 (raw file):
) { apps := s.getAppNames(false) for _, app := range apps {
I imagined this to work a bit differently.
In SQLStats.mu
we have apps map[string]*ssmemstorage.Container
.
I was thinking we could just pull a local reference to the entire map into memory here, and replace it with a new empty map in SQLStats
. For example:
func (s *SQLStats) dumpAllStatsByAppName() map[string]*ssmemstorage.Container {
s.mu.Lock()
defer s.mu.Unlock()
toReturn := s.mu.apps
s.mu.apps = make(map[string]*ssmemstorage.Container)
// Maybe do some stuff with s.mu.mon as well to reset it?
// Or maybe that's handled already in `container.Free(ctx)`...
return toReturn
}
This way, the "cutoff" for all apps in this map is the same. As we're doing it today, that time window is "sliding" because we iterate the apps individually, meaning that the point in time where an app's cache is "cleared" is not consistent from one app to the next.
We can do so, and then iterate the map k, v
instead of the app names. Overall, this approach feels much more "atomic" to me, since the entire map is cleared all at once in the eyes of clients writing new SQL stats to the cache, instead of them clearing one-at-a-time. It also feels easier to reason about, because the caches we're iterating is no longer changing.
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
line 614 at r3 (raw file):
database := stmt.mu.database querySummary := stmt.mu.querySummary stmt.mu.Unlock() // nolint:deferunlockcheck
nit: can we wrap this in an anon function so we can use defer? e.g.:
// Declare vars outside (I omitted types, but you get the idea!)
var data, distSQLUsed, vectorized, fullScan, database, querySummary ...
// Assign within an anon function call, so we can still use defer for the unlock.
func () {
stmt.mu.Lock()
defer stmt.mu.Unlock()
data := stmt.mu.data
distSQLUsed := stmt.mu.distSQLUsed
vectorized := stmt.mu.vectorized
fullScan := stmt.mu.fullScan
database := stmt.mu.database
querySummary := stmt.mu.querySummary
}()
We've had some nasty deadlock bugs from undeferred unlocks, so if possible I'd like to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @maryliag, and @xinhaoz)
Previously, abarganier (Alex Barganier) wrote…
Good callout. What would this look like to a user? Would the user see the in-memory stats in the UI, refresh the page, and suddenly they're missing until they're finally persisted?
In general, yes. Here's what it looks like if I understand the process of showing combined (in memory and persisted stats)
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 100 at r3 (raw file):
Previously, abarganier (Alex Barganier) wrote…
Do we need to maintain
OnTxnStatsFlushFinished()
as well? e.g.:if s.cfg.Knobs != nil && s.cfg.Knobs.OnTxnStatsFlushFinished != nil { s.cfg.Knobs.OnTxnStatsFlushFinished() }
I noticed it from the previous diff.
Definitely it should be handled here as well! Thanks!
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 151 at r1 (raw file):
Previously, abarganier (Alex Barganier) wrote…
Underneath, this causes an entirely separate iteration of the items in the
ssmemstorage.Container
. I wonder if we can somehow group this logic together with the existing iterations we do below to avoid iterating twice?Seems like it's only done for stmtStats, not txnStats... So maybe it can be included as part of the 1st goroutine that iterates the stmtStats? LMK what you think, maybe it's a premature optimization 🤷♂️.
cockroach/pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
Lines 575 to 595 in 986ca15
func (s *Container) SaveToLog(ctx context.Context, appName string) { s.mu.Lock() defer s.mu.Unlock() if len(s.mu.stmts) == 0 { return } var buf bytes.Buffer for key, stats := range s.mu.stmts { json, err := func() ([]byte, error) { stats.mu.Lock() defer stats.mu.Unlock() return json.Marshal(stats.mu.data) }() if err != nil { log.Errorf(ctx, "error while marshaling stats for %q // %q: %v", appName, key.String(), err) continue } fmt.Fprintf(&buf, "%q: %s\n", key.String(), json) } log.Infof(ctx, "statistics for %q:\n%s", appName, buf.String()) }
All of your suggestions are reasonable and make sense to me, the main point to push them bake is to avoid total refactoring of existing code in scope of this fix.
Every bit of code looks very coupled here and I would suggest to keep it as separate task tbh to avoid unintended regressions.
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 152 at r3 (raw file):
My primary concern with this suggestion that other places like MaybeDumpStatsToLog
(and internal calls) still refer to s.mu.apps
, or container.Free
calls. It wold require to refactor this code even more. I'm not feeling confident to do this in scope of this change.
Maybe do some stuff with s.mu.mon as well to reset it?
I didn't find how mon
is cleared in current implementation, it is only initialised once 🤔. container.Free
doesn't handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koorosh thanks for the discussion.
Looping in @kevin-v-ngo for opinions on the brief impact to users in the UI (see individual comment thread).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @kevin-v-ngo, @koorosh, @maryliag, and @xinhaoz)
Previously, koorosh (Andrii Vorobiov) wrote…
In general, yes. Here's what it looks like if I understand the process of showing combined (in memory and persisted stats)
I see, thank you for the helpful diagram! I can imagine this being something that users eventually notice, so let's be sure we check with @kevin-v-ngo to understand if this is an acceptable trade-off.
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 151 at r1 (raw file):
Previously, koorosh (Andrii Vorobiov) wrote…
All of your suggestions are reasonable and make sense to me, the main point to push them bake is to avoid total refactoring of existing code in scope of this fix.
Every bit of code looks very coupled here and I would suggest to keep it as separate task tbh to avoid unintended regressions.
Understood - like I said in my other comment, I'm open to limiting the scope of the changes here. Let's do our best to outline the follow-up work, though, so that we understand the direction this code should move in.
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 152 at r3 (raw file):
Just wondering, is the plan to backport this? I'm open to minimizing the initial scope of these changes. I'd like for us to outline future plans for this code, though, since these changes are complex & sensitive. We should understand the roadmap for follow-up changes.
I suppose one benefit of dumping them one app at a time, is that the scope of the issue where stats briefly "disappear" from the UI is minimized to just one app at a time. I suppose the cost is that we have more contention around the SQLStats
mutex, and it's more difficult to reason about the operations because of the "sliding window" I mentioned in my previous comment.
I didn't find how
mon
is cleared in current implementation, it is only initialised once 🤔.container.Free
doesn't handle this.
It looks like the SQLStats.mon
is the parent monitor, from which we create child monitors in each Container
for the apps. When a container is Free()
d or Clear()
d, it looks like the monitor is cleared: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go#L627
It seems like the impact is only when we show both in-memory and persisted and 'glitches' surface as delay correct? Meaning that it may take longer than 10 minutes to see statistics? And in-memory only (when there is no persisted stats) is not impacted correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it might affect only last 1hour interval when request is dispatched exactly during the flushing process and when both in-memory and persisted stats requested.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @kevin-v-ngo, @maryliag, and @xinhaoz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @kevin-v-ngo, @maryliag, and @xinhaoz)
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 152 at r3 (raw file):
Just wondering, is the plan to backport this?
I suppose this fix should be backported.
I suppose the cost is that we have more contention around the
SQLStats
mutex
Agree, and entire process of persisting stats has more "eventual" nature rather than "atomic" (ie we assume that all stats will be persisted at some point of time, but even insertion of every stats record is performed sequentially). I assume that "sliding window" effect is not something that drastically affects how we show stats on UI, but increased contention around mutex has to be resolved for sure.
One more thing that bothers me is that glitching stats on UI should be resolved in the code that is responsible for displaying data on UI, not here during flushing stats to disk 🙂. I mean UI should be aware about this eventual nature of data flushing and properly handle this.
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
line 614 at r3 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: can we wrap this in an anon function so we can use defer? e.g.:
// Declare vars outside (I omitted types, but you get the idea!) var data, distSQLUsed, vectorized, fullScan, database, querySummary ... // Assign within an anon function call, so we can still use defer for the unlock. func () { stmt.mu.Lock() defer stmt.mu.Unlock() data := stmt.mu.data distSQLUsed := stmt.mu.distSQLUsed vectorized := stmt.mu.vectorized fullScan := stmt.mu.fullScan database := stmt.mu.database querySummary := stmt.mu.querySummary }()
We've had some nasty deadlock bugs from undeferred unlocks, so if possible I'd like to use.
Done.
986ca15
to
0b1342e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @kevin-v-ngo, @maryliag, and @xinhaoz)
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go
line 603 at r4 (raw file):
var stmts map[stmtKey]*stmtStats func() {
@abarganier , I reused your suggestion with small adjustment. I reassign map under the mutex on app level, it doesn't fix "sliding window" behavior but reduces the time of acquired lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is a step in the right direction, but will need more iteration.
This PR contains no tests. We should add them if they're missing. The diagram provided in the discussion is actually a great candidate for some examples that we should test with.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @kevin-v-ngo, @koorosh, @maryliag, and @xinhaoz)
Previously, abarganier (Alex Barganier) wrote…
I see, thank you for the helpful diagram! I can imagine this being something that users eventually notice, so let's be sure we check with @kevin-v-ngo to understand if this is an acceptable trade-off.
Thanks for the helpful diagram. I don't think this is behavior we should accept because it makes it impossible to rely on consistent outputs.
@koorosh is it possible to "copy to buffer" instead of dump, and then delete the in-memory batch as part of the UPSERT transaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @kevin-v-ngo, @maryliag, and @xinhaoz)
-- commits
line 24 at r3:
To make sure that result is consistent, we would need to revise entire logic how stats accumulated, and then flushed.
Initial logic, works in a way that stats flushed to disk and then cleaned from in-memory batch that creates opposite problem of having the same stats in-memory and persisted on a disk.
I would try to distinguish two problems that we have with current implementation:
- missing stats (that this PR is aimed to resolve);
- atomic operation of flushing stats;
is it possible to "copy to buffer"
New stats can be added to in-memory buffer after we copied values from it and it would lead to having both "copied" and new values at the same time. How to distinguish one from another one later is a problem.
0b1342e
to
5f089f0
Compare
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
5f089f0
to
f3a1d34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Added tests to validate both correctness of pulling stats from in-memory cache and another test simultaneous adding new stats and flushing them to make sure that no stats missed. With additional testing knobs, there's no interaction with DB and random test data generated to make distinct stats (we don't care about its content).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @kevin-v-ngo, @maryliag, and @xinhaoz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting closer, sorry for the terrible review delay.
I made a comment about how we might be able to improve the test's determinism. Let me know what you think.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @kevin-v-ngo, @koorosh, @maryliag, and @xinhaoz)
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 161 at r5 (raw file):
apps := s.getAppNames(false) for _, app := range apps { app := app
nit: why do we reassign app
here to itself?
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 167 at r5 (raw file):
} stmtStats := container.PopAllStatementsStats() txnStats := container.PopAllTransactionStats()
Can we combine these two functions so that both txn and stmt stats are popped together atomically? Something like:
stmtStats, txnStats := container.PopAllStats()
Currently, we acquire the container lock, dump the stmt stats, and then release the lock.
Then, afterwards, we acquire the container lock again, dump the txn stats, and then release the lock.
This makes the "popping" inconsistent across txn and statements. Maybe it's not a big deal, but doing both at the same time would be easy to implement (I think) and make things easier to reason about. This way, the container is "cleared" of everything all at once, instead of doing so in a staggered manner.
If we do this, we could probably also move the container.Free(ctx)
call into PopAllStats()
.
Code quote:
stmtStats := container.PopAllStatementsStats()
txnStats := container.PopAllTransactionStats()
pkg/sql/sqlstats/persistedsqlstats/flush_test.go
line 855 at r5 (raw file):
// This test aims to validate that stats popped from in-memory atomically and don't cause // conditions when some stats are lost or consumed multiple times. t.Run("concurrently add stats and flush them", func(t *testing.T) {
To clarify, the previous problem was something like:
- The SQL Stats flush process begins
- While the stats within an app container are being iterated+flushed, new stats are added to the same container.
- The iterator doesn't pickup on the newly added stats from step # 2.
- The app container is "cleared"
- The stats added in step # 2 are lost.
If so, my concern is that this test doesn't deterministically recreate this scenario.
IIUC, the problem in the old code occurred when new stats were added while the two flush goroutines were executing, is that right?
If so, could we use something like a lock or WaitGroup in the test interceptor functions to pause the stmt/txn flush goroutines to recreate this scenario? Something like:
- Add a bunch of app statistics.
- Trigger the flush, but make sure the interceptors block (for now). This causes the flush goroutines to hang.
- With the interceptors blocking, we now add more app stats. My understanding is that previously, this would trigger the bug since we're adding more statistics to an app before the flush has completed for that same app.
- Unblock the interceptors, let them complete the flush.
- Now, flush once more, and verify that the stats we added in step # 3 aren't lost.
Let me know what you think about this idea.
Code quote:
t.Run("concurrently add stats and flush them", func(t *testing.T) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 16 files at r10.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @kevin-v-ngo, @koorosh, and @xinhaoz)
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 16 at r10 (raw file):
"context" "fmt" "github.com/cockroachdb/cockroach/pkg/util/stop"
nit: order imports
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 145 at r9 (raw file):
Previously, koorosh (Andrii Vorobiov) wrote…
docstring needs more details about the "guarantees"
Done.
implement the
sqlstats.Reader
interfaceI'm not sure that this interface is what we need in this case.
sqlstats.Reader
interface allows to iterate over available instances only and it is not clear whyReader
interface should also be responsible to clear in-memory statistics after iteration.This method makes things confusing because now there's yet another way to iterate.
Agree, that having multiple implementations isn't make code easy to read/maintain, but at the same time I don't see that current implementations of
sqlstats.Reader
interface makes it easier. Here's existing structs that implement it:
sslocal.SQLStats
ssmemstorage.Container
All of them introduce overhead like instantiating memory monitor or other unrelated fields (or leaving with half initialised which doesn't feel correct and makes easy to reason why some of fields aren't initialised).
👍 thanks for taking a look.
pkg/sql/sqlstats/persistedsqlstats/flush_test.go
line 861 at r9 (raw file):
// ^ // should be flushed with | // next Flush --------------|
nit: what's going on with this comment? does the diagram need to be adjusted? I don't follow what's happening on the right side.
pkg/sql/sqlstats/persistedsqlstats/flush_test.go
line 934 at r9 (raw file):
init.Store(true) // Flush all available stats. sqlStats.Flush(ctx)
can we do a check here to verify that 1 stmt and 1 txn was flushed. that's what should happen, right?
pkg/sql/sqlstats/persistedsqlstats/scheduled_sql_stats_compaction_test.go
line 142 at r10 (raw file):
// the system table. helper.sqlDB.Exec(t, "SELECT 1; SELECT 1, 1") helper.server.ApplicationLayer().SQLServer().(*sql.Server).GetSQLStatsProvider().(*persistedsqlstats.PersistedSQLStats).Flush(ctx, helper.server.Stopper())
I think what you want here is helper.server.AppStopper()
or something like that. No need to pull the whole server in.
0dce983
to
a61a221
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @kevin-v-ngo, and @xinhaoz)
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 16 at r10 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: order imports
Done.
pkg/sql/sqlstats/persistedsqlstats/flush_test.go
line 861 at r9 (raw file):
Previously, dhartunian (David Hartunian) wrote…
nit: what's going on with this comment? does the diagram need to be adjusted? I don't follow what's happening on the right side.
😱 that what happens when TAB
and spaces used altogether to make offsets. Reviewable interprets TABs in different way than Goland.
Fixed.
pkg/sql/sqlstats/persistedsqlstats/flush_test.go
line 934 at r9 (raw file):
Previously, dhartunian (David Hartunian) wrote…
can we do a check here to verify that 1 stmt and 1 txn was flushed. that's what should happen, right?
Good point, done!
pkg/sql/sqlstats/persistedsqlstats/scheduled_sql_stats_compaction_test.go
line 142 at r10 (raw file):
Previously, dhartunian (David Hartunian) wrote…
I think what you want here is
helper.server.AppStopper()
or something like that. No need to pull the whole server in.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @kevin-v-ngo, and @xinhaoz)
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 176 at r12 (raw file):
wg.Add(2) err := stopper.RunAsyncTask(ctx, "sql-stmt-stats-flush", func(ctx context.Context) {
Using stopper.RunAsyncTask
instead of go
funcs causes following errors.
panic: attempting to start span with parent from different Tracer.
parent operation: sql-txn-stats-flush, tracer created at:
goroutine 13 [running]:
runtime/debug.Stack()
/opt/homebrew/opt/go/libexec/src/runtime/debug/stack.go:24 +0x64
github.com/cockroachdb/cockroach/pkg/util/tracing.NewTracer()
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:597 +0x3c
github.com/cockroachdb/cockroach/pkg/util/stop.NewStopper({0x0, 0x0, 0x0?})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:216 +0xe4
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.NewTestCluster({_, _}, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:265 +0x28c
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.testClusterFactoryImpl.NewTestCluster(...)
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1963
github.com/cockroachdb/cockroach/pkg/testutils/serverutils.NewCluster(...)
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_cluster_shim.go:303
github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartCluster({_, _}, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_cluster_shim.go:287 +0x84
github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats_test.createCluster(0x10a970910?)
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/reader_test.go:170 +0x15c
github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats_test.TestPersistedSQLStatsReadHybrid(0x14000781380)
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/reader_test.go:72 +0xd8
testing.tRunner(0x14000781380, 0x10a93c740)
/opt/homebrew/opt/go/libexec/src/testing/testing.go:1595 +0xe8
created by testing.(*T).Run in goroutine 1
/opt/homebrew/opt/go/libexec/src/testing/testing.go:1648 +0x33c
child operation: insert-txn-stats, tracer created at:
goroutine 13 [running]:
runtime/debug.Stack()
/opt/homebrew/opt/go/libexec/src/runtime/debug/stack.go:24 +0x64
github.com/cockroachdb/cockroach/pkg/util/tracing.NewTracer()
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:597 +0x3c
github.com/cockroachdb/cockroach/pkg/util/tracing.NewTracerWithOpt({0x10a992b10, 0x10db35f00}, {0x14005ac5dd0, 0x2, 0x0?})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:639 +0x8c
github.com/cockroachdb/cockroach/pkg/server.(*testServer).StartTenant(_, {_, _}, {{0x0, 0x0}, {0xa}, 0x0, 0x140090ee000, 0x0, {{0x0, ...}, ...}, ...})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:1654 +0xa8c
github.com/cockroachdb/cockroach/pkg/server.(*testServer).startDefaultTestTenant(0x14000ef7500, {0x10a992b10, 0x10db35f00})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:638 +0x3e0
github.com/cockroachdb/cockroach/pkg/server.(*testServer).maybeStartDefaultTestTenant(0x14000ef7500, {0x10a992b10?, 0x10db35f00?})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:710 +0x24c
github.com/cockroachdb/cockroach/pkg/server.(*testServer).Activate(0x14000ef7500, {0x10a992b10, 0x10db35f00})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:836 +0xbc
github.com/cockroachdb/cockroach/pkg/testutils/serverutils.(*wrap).Activate(0x10a970940?, {0x10a992b10, 0x10db35f00})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/ts_control_forwarder_generated.go:25 +0x54
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Start(0x14000ef6e00, {0x10a9c8ea0, 0x14000781380?})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:484 +0x634
github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartCluster({_, _}, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...})
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_cluster_shim.go:288 +0xa0
github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats_test.createCluster(0x10a970910?)
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/reader_test.go:170 +0x15c
github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats_test.TestPersistedSQLStatsReadHybrid(0x14000781380)
/Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/reader_test.go:72 +0xd8
testing.tRunner(0x14000781380, 0x10a93c740)
/opt/homebrew/opt/go/libexec/src/testing/testing.go:1595 +0xe8
created by testing.(*T).Run in goroutine 1
/opt/homebrew/opt/go/libexec/src/testing/testing.go:1648 +0x33c
[recovered]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @kevin-v-ngo, @koorosh, and @xinhaoz)
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 176 at r12 (raw file):
Previously, koorosh (Andrii Vorobiov) wrote…
Using
stopper.RunAsyncTask
instead ofgo
funcs causes following errors.panic: attempting to start span with parent from different Tracer. parent operation: sql-txn-stats-flush, tracer created at: goroutine 13 [running]: runtime/debug.Stack() /opt/homebrew/opt/go/libexec/src/runtime/debug/stack.go:24 +0x64 github.com/cockroachdb/cockroach/pkg/util/tracing.NewTracer() /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:597 +0x3c github.com/cockroachdb/cockroach/pkg/util/stop.NewStopper({0x0, 0x0, 0x0?}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:216 +0xe4 github.com/cockroachdb/cockroach/pkg/testutils/testcluster.NewTestCluster({_, _}, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:265 +0x28c github.com/cockroachdb/cockroach/pkg/testutils/testcluster.testClusterFactoryImpl.NewTestCluster(...) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1963 github.com/cockroachdb/cockroach/pkg/testutils/serverutils.NewCluster(...) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_cluster_shim.go:303 github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartCluster({_, _}, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_cluster_shim.go:287 +0x84 github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats_test.createCluster(0x10a970910?) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/reader_test.go:170 +0x15c github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats_test.TestPersistedSQLStatsReadHybrid(0x14000781380) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/reader_test.go:72 +0xd8 testing.tRunner(0x14000781380, 0x10a93c740) /opt/homebrew/opt/go/libexec/src/testing/testing.go:1595 +0xe8 created by testing.(*T).Run in goroutine 1 /opt/homebrew/opt/go/libexec/src/testing/testing.go:1648 +0x33c child operation: insert-txn-stats, tracer created at: goroutine 13 [running]: runtime/debug.Stack() /opt/homebrew/opt/go/libexec/src/runtime/debug/stack.go:24 +0x64 github.com/cockroachdb/cockroach/pkg/util/tracing.NewTracer() /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:597 +0x3c github.com/cockroachdb/cockroach/pkg/util/tracing.NewTracerWithOpt({0x10a992b10, 0x10db35f00}, {0x14005ac5dd0, 0x2, 0x0?}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/tracer.go:639 +0x8c github.com/cockroachdb/cockroach/pkg/server.(*testServer).StartTenant(_, {_, _}, {{0x0, 0x0}, {0xa}, 0x0, 0x140090ee000, 0x0, {{0x0, ...}, ...}, ...}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:1654 +0xa8c github.com/cockroachdb/cockroach/pkg/server.(*testServer).startDefaultTestTenant(0x14000ef7500, {0x10a992b10, 0x10db35f00}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:638 +0x3e0 github.com/cockroachdb/cockroach/pkg/server.(*testServer).maybeStartDefaultTestTenant(0x14000ef7500, {0x10a992b10?, 0x10db35f00?}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:710 +0x24c github.com/cockroachdb/cockroach/pkg/server.(*testServer).Activate(0x14000ef7500, {0x10a992b10, 0x10db35f00}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/server/testserver.go:836 +0xbc github.com/cockroachdb/cockroach/pkg/testutils/serverutils.(*wrap).Activate(0x10a970940?, {0x10a992b10, 0x10db35f00}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/ts_control_forwarder_generated.go:25 +0x54 github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Start(0x14000ef6e00, {0x10a9c8ea0, 0x14000781380?}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:484 +0x634 github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartCluster({_, _}, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...}) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_cluster_shim.go:288 +0xa0 github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats_test.createCluster(0x10a970910?) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/reader_test.go:170 +0x15c github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats_test.TestPersistedSQLStatsReadHybrid(0x14000781380) /Users/avor/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/reader_test.go:72 +0xd8 testing.tRunner(0x14000781380, 0x10a93c740) /opt/homebrew/opt/go/libexec/src/testing/testing.go:1595 +0xe8 created by testing.(*T).Run in goroutine 1 /opt/homebrew/opt/go/libexec/src/testing/testing.go:1648 +0x33c [recovered]
I think it's because the test passes a ctx.Background()
directly to the IterateTransactionStats
calls in the TestPersistedSQLStatsReadHybrid
test. The context contains the spans used for tracing operations. Modify the test to pass in the ctx
created with the cluster and I think it will work.
a61a221
to
96d3331
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @kevin-v-ngo, and @xinhaoz)
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 176 at r12 (raw file):
Previously, dhartunian (David Hartunian) wrote…
I think it's because the test passes a
ctx.Background()
directly to theIterateTransactionStats
calls in theTestPersistedSQLStatsReadHybrid
test. The context contains the spans used for tracing operations. Modify the test to pass in thectx
created with the cluster and I think it will work.
Just figured out, I passed cluster.Stopper()
instead of AppStopper()
. Updated all affected tests.
6d8a475
to
3b8c941
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @kevin-v-ngo, @koorosh, and @xinhaoz)
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 50 at r14 (raw file):
if err := s.SQLStats.Reset(ctx); err != nil { log.Warningf(ctx, "fail to reset in-memory SQL Stats: %s", err) }
Would it make more sense to call s.knobs.OnBeforeReset()
in this deferred function, instead of in each individual Container.Clear()
call?
Code quote:
if err := s.SQLStats.Reset(ctx); err != nil {
log.Warningf(ctx, "fail to reset in-memory SQL Stats: %s", err)
}
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 176 at r12 (raw file):
Previously, koorosh (Andrii Vorobiov) wrote…
Just figured out, I passed
cluster.Stopper()
instead ofAppStopper()
. Updated all affected tests.
Nice, thanks for using the Stopper library here! I was wondering if we should be aware of context cancellation here, but I think these tasks are short-lived enough where it's okay to just let them finish in the event of a shutdown.
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 144 at r14 (raw file):
} // ConsumeStats leverages the process of atomic pulling stats from in-memory storage, cleaning in-memory stats, and
nit: clearing (I think)
Code quote:
cleaning
3b8c941
to
954d9d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @kevin-v-ngo, and @xinhaoz)
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 50 at r14 (raw file):
Previously, abarganier (Alex Barganier) wrote…
Would it make more sense to call
s.knobs.OnBeforeReset()
in this deferred function, instead of in each individualContainer.Clear()
call?
This defer function is guarded by condition, and will be called only when stats flushing disabled so it won't handle default scenario when stats flushing will be called periodically.
We reset stats per app
so I felt it is more natural to call this hook right after each app's container reset.
It is also possible to use this hook in addition for entire in-memory stats, but it is not clear what to do when only some app containers are flushed (and others don't due to the error)?
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 144 at r14 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: clearing (I think)
Done. Thanks, that's my usual mistake (among many others) 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things look good, I just want to make sure the OnBeforeReset
is well documented and its usage is clear.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @kevin-v-ngo, @koorosh, and @xinhaoz)
pkg/sql/sqlstats/test_utils.go
line 69 at r15 (raw file):
// OnBeforeReset is invoked right before reset SQLStats from in-memory cache OnBeforeReset func()
nit: Let's expand the comment/documentation on this a bit more. I think explaining more about when specifically this gets called, as well as what it's useful for (e.g. an example test scenario where this is helpful) would help make the purpose of the callback clear.
Code quote:
// OnBeforeReset is invoked right before reset SQLStats from in-memory cache
OnBeforeReset func()
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 50 at r14 (raw file):
I see, that makes sense.
We reset stats per
app
so I felt it is more natural to call this hook right after each app's container reset.
Is the name OnBeforeReset a bit misleading in that case? Just because it seems like we invoke this after we invoke Clear()
. Also, if we're invoking Clear()
, then should it be OnBeforeClear
?
I just want to make sure that the testing callback's usage is clear.
pkg/sql/sqlstats/sslocal/sql_stats.go
line 155 at r15 (raw file):
for appName, statsContainer := range s.mu.apps { lastErr := s.MaybeDumpStatsToLog(ctx, appName, statsContainer, target) statsContainer.Clear(ctx)
If we're going to have something like OnBeforeReset
, then should we also call it here? I think the callback should consistently be invoked any time we're going to clear the memory stats.
Code quote:
statsContainer.Clear(ctx)
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 144 at r14 (raw file):
Previously, koorosh (Andrii Vorobiov) wrote…
Done. Thanks, that's my usual mistake (among many others) 😅
All good! We all do this 😆
954d9d9
to
6ff0bca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @kevin-v-ngo, and @xinhaoz)
pkg/sql/sqlstats/test_utils.go
line 69 at r15 (raw file):
Previously, abarganier (Alex Barganier) wrote…
nit: Let's expand the comment/documentation on this a bit more. I think explaining more about when specifically this gets called, as well as what it's useful for (e.g. an example test scenario where this is helpful) would help make the purpose of the callback clear.
Done.
pkg/sql/sqlstats/persistedsqlstats/flush.go
line 50 at r14 (raw file):
Previously, abarganier (Alex Barganier) wrote…
I see, that makes sense.
We reset stats per
app
so I felt it is more natural to call this hook right after each app's container reset.Is the name OnBeforeReset a bit misleading in that case? Just because it seems like we invoke this after we invoke
Clear()
. Also, if we're invokingClear()
, then should it beOnBeforeClear
?I just want to make sure that the testing callback's usage is clear.
Agree, naming is just confusing. Renamed to OnAfterClear.
pkg/sql/sqlstats/sslocal/sql_stats.go
line 155 at r15 (raw file):
Previously, abarganier (Alex Barganier) wrote…
If we're going to have something like
OnBeforeReset
, then should we also call it here? I think the callback should consistently be invoked any time we're going to clear the memory stats.
It is called within Clear
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r9, 17 of 17 files at r15.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @abarganier, @kevin-v-ngo, @koorosh, and @xinhaoz)
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 185 at r15 (raw file):
} }) if err != nil {
log warning here with error
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 199 at r15 (raw file):
} }) if err != nil {
log warning here with error
6ff0bca
to
62b6c08
Compare
This change refactors existing logic of flushing SQL stats. The main goal of this change is to fix an issue where flushing and clearing in-memory stats aren't "atomic" operation and may cause cases where not yet persisted stats are cleared unintentionally. Current change introduces following changes: 1. Introduces `PopAllStats` function that prepares stats to be persisted and then clears in-memory stats as an atomic operation. 2. the process of flushing stats is following: - pop all stats from in-memory storage - reset in-memory stats - use local copy of stats and persist them 3. before this process was like this: - iterate in-memory stats which could be updated during iteration; - persisting stats could take some time and iteration over stats slow; - after flushing all stats, in-memory storage is cleared, but there's no guaranties that at this moment nothing is added to SQL stats. New implementation does have disadvantage, it might cause glitches when we pop stats from in-memory storage and before persisting them - user might not see up to date information. It is assumed that this is better than having missing statistics permanently. Release note: None
This change refactors `SQLStats.ConsumeStats` function to rely on `stopper.RunAsyncTask` function instead of calling `go` directly in code which doesn't handle context cancellation properly. Most of the changes here are done in tests to comply new argument that is required by `ConsumeStats` func. Release note: None
62b6c08
to
046641b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @abarganier, @dhartunian, @kevin-v-ngo, and @xinhaoz)
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 185 at r15 (raw file):
Previously, dhartunian (David Hartunian) wrote…
log warning here with error
Done.
pkg/sql/sqlstats/sslocal/sslocal_provider.go
line 199 at r15 (raw file):
Previously, dhartunian (David Hartunian) wrote…
log warning here with error
Done.
bors r+ |
Build succeeded: |
This change refactors existing logic of flushing SQL stats. The main goal of this change is to fix an issue where flushing and clearing in-memory stats aren't "atomic" operation and may cause cases where not yet persisted stats are cleared unintentionally.
Current change introduces following changes:
PopAllStatementsStats
function that prepares stats to be persisted and then clears in-memory stats as an atomic operation.New implementation does have disadvantage, it might cause glitches when we pop stats from in-memory storage and before persisting them - user might not see up to date information. It is assumed that this is better than having missing statistics permanently.
Release note: None
Resolves: #115168