Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduledlogging: shorten TestCaptureIndexUsageStats run time #89319

Merged

Conversation

THardy98
Copy link

@THardy98 THardy98 commented Oct 4, 2022

Resolves: #87772

Previously, TestCaptureIndexUsageStats ran through 4 iterations of 20 seconds for a run time of over a minute. This change reduces the run time of the test to under 10 seconds.

Release note: None

@THardy98 THardy98 requested review from knz and a team October 4, 2022 18:54
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from cd40aa4 to 2ebdca3 Compare October 4, 2022 19:11
// Verify that a second schedule has run after the enabled interval has passed.
// Wait for channel value from end of 1st schedule.
<-scheduleCompleteChan
time.Sleep(timeBuffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? Can't you have the main code signal to the test when it's finished writing to logs?

Copy link
Author

@THardy98 THardy98 Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main code signals to the channel here:

https://github.com/cockroachdb/cockroach/pull/89319/files#diff-567a23559822ff60b8d9d63b7fda0e82a78d11fa6dced531e3f6d24966282362R165

after the logging has completed here:
https://github.com/cockroachdb/cockroach/pull/89319/files#diff-567a23559822ff60b8d9d63b7fda0e82a78d11fa6dced531e3f6d24966282362R153.

For some reason - that I have not been able to figure out why - the logs don't appear if you check immediately, despite logging being completed. The 1 second buffer seems to consistently allow for the logs to appear in the log file, including under stress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you're missing log.Flush?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, was missing log.Flush, will remove the time.Sleep calls.

// Verify that a third schedule has run after the overlap duration has passed.
// Wait for channel value from end of 2nd schedule.
<-scheduleCompleteChan
time.Sleep(timeBuffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed time.Sleep

}, sd.getOverlapDuration()+timeBuffer)
// Wait for channel value from end of 3rd schedule.
<-scheduleCompleteChan
time.Sleep(timeBuffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed time.Sleep

@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from 2ebdca3 to 2f0dc3f Compare October 5, 2022 15:11
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection from me, but please get a review from your own team and teach them what's going on here.

Reviewed 3 of 3 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @THardy98)

@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from 2f0dc3f to 4a3faf8 Compare October 5, 2022 15:49
Copy link
Member

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reasoning behind using a channel directly in the knobs over a callback function (that then does the same thing, but keeps the channel local to the test). Just wondering if a callback might offer some more flexibility in the future.

@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from 4a3faf8 to 127b34f Compare October 5, 2022 19:19
@THardy98
Copy link
Author

THardy98 commented Oct 5, 2022

What was the reasoning behind using a channel directly in the knobs over a callback function (that then does the same thing, but keeps the channel local to the test). Just wondering if a callback might offer some more flexibility in the future.

None, I hadn't considered a callback function though I think it's a better idea for the reasons you mentioned. Changed to use a callback instead.

@xinhaoz
Copy link
Member

xinhaoz commented Oct 5, 2022

pkg/sql/scheduledlogging/captured_index_usage_stats.go line 116 at r3 (raw file):

	}
	// Otherwise, schedule the next interval normally.
	return telemetryCaptureIndexUsageStatsInterval.Get(&s.st.SV) - s.getLoggingDuration()

Can we save the result of s.getLoggingDuration() prior to this block? I guess there is an off chance that the second call here will actually result in a negative value otherwise.

Code quote:

	if s.getLoggingDuration() >= telemetryCaptureIndexUsageStatsInterval.Get(&s.st.SV) {
		return s.durationOnOverlap()
	}
	// Otherwise, schedule the next interval normally.
	return telemetryCaptureIndexUsageStatsInterval.Get(&s.st.SV) - s.getLoggingDuration()

Previously, TestCaptureIndexUsageStats ran through 4 iterations of 20
seconds for a run time of over a minute. This change reduces the run
time of the test to under 10 seconds.

Release note: None
@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from 127b34f to bb18504 Compare October 6, 2022 14:02
Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @xinhaoz)


pkg/sql/scheduledlogging/captured_index_usage_stats.go line 116 at r3 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

Can we save the result of s.getLoggingDuration() prior to this block? I guess there is an off chance that the second call here will actually result in a negative value otherwise.

Done

Copy link
Member

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just wondering if anyone knows the answer to the question below.

@@ -146,22 +146,24 @@ func (s *CaptureIndexUsageStatsLoggingScheduler) start(ctx context.Context, stop
case <-stopper.ShouldQuiesce():
return
case <-timer.C:
if !telemetryCaptureIndexUsageStatsEnabled.Get(&s.st.SV) {
timer.Reset(telemetryCaptureIndexUsageStatsStatusCheckEnabledInterval.Get(&s.st.SV))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering out loud here, but I noticed this timer was created directly from the std lib over the one wrapped in timeutil pkg. I recall reading that the timeutil one had a fix relating to Reset or something. Does it matter which one we're using when the return value of Reset isn't being used? 🤔

@THardy98
Copy link
Author

THardy98 commented Oct 6, 2022

TYFR :)

@THardy98
Copy link
Author

THardy98 commented Oct 6, 2022

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 6, 2022

Build succeeded:

@craig craig bot merged commit ea43f89 into cockroachdb:master Oct 6, 2022
@blathers-crl
Copy link

blathers-crl bot commented Oct 6, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from bb18504 to blathers/backport-release-22.1-89319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


error creating merge commit from bb18504 to blathers/backport-release-22.2-89319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


error creating merge commit from bb18504 to blathers/backport-release-22.2.0-89319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.0 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@maryliag
Copy link
Contributor

looks like your backports failed to be created, can you create them? (you don't need one for 22.2.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scheduledlogging: TestCaptureIndexUsageStats is unreasonably long to run
5 participants