-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduledlogging: shorten TestCaptureIndexUsageStats run time #89319
scheduledlogging: shorten TestCaptureIndexUsageStats run time #89319
Conversation
cd40aa4
to
2ebdca3
Compare
// Verify that a second schedule has run after the enabled interval has passed. | ||
// Wait for channel value from end of 1st schedule. | ||
<-scheduleCompleteChan | ||
time.Sleep(timeBuffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? Can't you have the main code signal to the test when it's finished writing to logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main code signals to the channel here:
after the logging has completed here:
https://github.com/cockroachdb/cockroach/pull/89319/files#diff-567a23559822ff60b8d9d63b7fda0e82a78d11fa6dced531e3f6d24966282362R153.
For some reason - that I have not been able to figure out why - the logs don't appear if you check immediately, despite logging being completed. The 1 second buffer seems to consistently allow for the logs to appear in the log file, including under stress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you're missing log.Flush
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, was missing log.Flush
, will remove the time.Sleep
calls.
// Verify that a third schedule has run after the overlap duration has passed. | ||
// Wait for channel value from end of 2nd schedule. | ||
<-scheduleCompleteChan | ||
time.Sleep(timeBuffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed time.Sleep
}, sd.getOverlapDuration()+timeBuffer) | ||
// Wait for channel value from end of 3rd schedule. | ||
<-scheduleCompleteChan | ||
time.Sleep(timeBuffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed time.Sleep
2ebdca3
to
2f0dc3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection from me, but please get a review from your own team and teach them what's going on here.
Reviewed 3 of 3 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @THardy98)
2f0dc3f
to
4a3faf8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reasoning behind using a channel directly in the knobs over a callback function (that then does the same thing, but keeps the channel local to the test). Just wondering if a callback might offer some more flexibility in the future.
4a3faf8
to
127b34f
Compare
None, I hadn't considered a callback function though I think it's a better idea for the reasons you mentioned. Changed to use a callback instead. |
Can we save the result of Code quote: if s.getLoggingDuration() >= telemetryCaptureIndexUsageStatsInterval.Get(&s.st.SV) {
return s.durationOnOverlap()
}
// Otherwise, schedule the next interval normally.
return telemetryCaptureIndexUsageStatsInterval.Get(&s.st.SV) - s.getLoggingDuration() |
Previously, TestCaptureIndexUsageStats ran through 4 iterations of 20 seconds for a run time of over a minute. This change reduces the run time of the test to under 10 seconds. Release note: None
127b34f
to
bb18504
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @xinhaoz)
pkg/sql/scheduledlogging/captured_index_usage_stats.go
line 116 at r3 (raw file):
Previously, xinhaoz (Xin Hao Zhang) wrote…
Can we save the result of
s.getLoggingDuration()
prior to this block? I guess there is an off chance that the second call here will actually result in a negative value otherwise.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just wondering if anyone knows the answer to the question below.
@@ -146,22 +146,24 @@ func (s *CaptureIndexUsageStatsLoggingScheduler) start(ctx context.Context, stop | |||
case <-stopper.ShouldQuiesce(): | |||
return | |||
case <-timer.C: | |||
if !telemetryCaptureIndexUsageStatsEnabled.Get(&s.st.SV) { | |||
timer.Reset(telemetryCaptureIndexUsageStatsStatusCheckEnabledInterval.Get(&s.st.SV)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering out loud here, but I noticed this timer was created directly from the std lib over the one wrapped in timeutil pkg. I recall reading that the timeutil one had a fix relating to Reset or something. Does it matter which one we're using when the return value of Reset isn't being used? 🤔
TYFR :) |
bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from bb18504 to blathers/backport-release-22.1-89319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1.x failed. See errors above. error creating merge commit from bb18504 to blathers/backport-release-22.2-89319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. error creating merge commit from bb18504 to blathers/backport-release-22.2.0-89319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.0 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
looks like your backports failed to be created, can you create them? (you don't need one for 22.2.0) |
Resolves: #87772
Previously, TestCaptureIndexUsageStats ran through 4 iterations of 20 seconds for a run time of over a minute. This change reduces the run time of the test to under 10 seconds.
Release note: None