-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
profiler: goroutineswait saw-stop mechanism #942
Conversation
I liked these changes. Especially the addition of a test to check this setting.
|
Thanks for the suggestions.
Will ping you on slack about this.
Not sure how you'd split it? One file per test? The way it's currently done is pretty idiomatic in Go, i.e. the tests for code implement in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple questions/suggestions.
Looks good in general.
profiler/profile_test.go
Outdated
launched <- struct{}{} | ||
stopped <- struct{}{} | ||
}() | ||
<-launched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of launched
? It looks to me like if you removed it, this would function the same way, since the goroutines would all block on stopped
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intend behind launched
is to ensure that the goroutines have started executing before spawnGoroutines()
returns. Without launched
this invariant would be subject to a race condition.
In practice launched
could be removed without repercussions because the go runtime immediately registers newly created goroutines as _Grunnable
. That being said, this behavior seems to be an implementation detail that isn't covered by the language specification. I.e. a future version of Go might decide to put the goroutines into a new _Glaunchable
state that isn't included in runtime.NumGoroutines()
. So launched
could prevent our test from breaking in the future.
That being said, I don't expect the runtime details of this to change anytime soon, so I don't feel strongly about it. I can remove the launched
if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. That makes sense and seems to be a good solution.
My only comment is that it does not seem obvious why this code is like it is unless you know the go runtime internals. It might be good to make a small comment here about launched
and how we need the goroutines to be running before spawnGoroutines
returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, PTAL at 2e95053 to see if this does a good enough job in clarifying the intend.
This is a safety mechanism for the new goroutineWaitProfile that automatically disables the profile if the number of goroutines is too high in order to avoid excessive stop-the-world pauses. The default limit is 1000 which is somewhat arbitrary and should limit STW pauses to ~30ms. It can be overwritten with DD_PROFILING_WAIT_PROFILE_MAX_GOROUTINES. WARNING: The goroutineWaitProfile is still experimental and not meant to be used by users outside of datadog. Fixes PROF-3234
d87eb26
to
768a8b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knusbaum thanks for the review, I applied 2/3 changes you suggested. PTAL.
profiler/profile_test.go
Outdated
launched <- struct{}{} | ||
stopped <- struct{}{} | ||
}() | ||
<-launched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intend behind launched
is to ensure that the goroutines have started executing before spawnGoroutines()
returns. Without launched
this invariant would be subject to a race condition.
In practice launched
could be removed without repercussions because the go runtime immediately registers newly created goroutines as _Grunnable
. That being said, this behavior seems to be an implementation detail that isn't covered by the language specification. I.e. a future version of Go might decide to put the goroutines into a new _Glaunchable
state that isn't included in runtime.NumGoroutines()
. So launched
could prevent our test from breaking in the future.
That being said, I don't expect the runtime details of this to change anytime soon, so I don't feel strongly about it. I can remove the launched
if you like.
@knusbaum I'm struggling to get CI to pass, it seems like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
profiler/profile_test.go
Outdated
launched <- struct{}{} | ||
stopped <- struct{}{} | ||
}() | ||
<-launched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. That makes sense and seems to be a good solution.
My only comment is that it does not seem obvious why this code is like it is unless you know the go runtime internals. It might be good to make a small comment here about launched
and how we need the goroutines to be running before spawnGoroutines
returns.
@felixge Yeah, I think it's because consul just released a version that's incompatible with <go1.16 I might pin the version in CI until they fix this. |
The new name is technically more accurate as the goroutine may not have stopped yet when a message appears in the channel.
This is a safety mechanism for the new goroutineWaitProfile that
automatically disables the profile if the number of goroutines is too
high in order to avoid excessive stop-the-world pauses.
The default limit is 1000 which is somewhat arbitrary and should limit
STW pauses to ~30ms. It can be overwritten with
DD_PROFILING_WAIT_PROFILE_MAX_GOROUTINES.
WARNING: The goroutineWaitProfile is still experimental and not meant to
be used by users outside of datadog.
Fixes PROF-3234