-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make EventPipeProviderCallbackData own the filter data #42307
Conversation
@tommcdon should I attempt to port this to 5.0 at this point? The hit rate for the failed test is extremely low, it would only fail once every week or two. |
b03d466
to
e9985fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we aren't using smart pointer? Using bare pointers leads to unclear ownership semantics and is currently not considered best practice.
I don't have a great answer other than we aren't currently using them. We have a couple things like NewArrayHolder and ReleaseHolder, but they are not really the same thing as a shared_ptr. They don't do ref counting and don't support copying, only construction or moving. They are typically used either in a method body or as a field on a class that never gets copied. In EventPipe we don't use them for fields, they are currently all manually managed. My assumption (for what it's worth) is that this is intentional, in the runtime we prefer to have clear ownership over memory lifetime. I.e. a piece of memory is owned by one object and when that object goes away so does all the memory it uses, and a shared_ptr style smart pointer would break that. The thing that bothers me about this particular problem is that the session provider conceptually owns the filter data string, and having to copy it is a sign that the session provider is being freed before it's really out of scope. The original code where it takes a raw pointer to the string is the same coding pattern used predominantly in the runtime, it's just that the session providers are freed too early. On the other side of the argument is the shutdown path is already complicated and hard to reason about, and adding another place where we partially defer work is only making that worse. |
@davmason does this issue repro when starting /stopping an event pipe session, or only at CLR shutdown? If the latter I suggest that we fix for 6.0, otherwise we should consider for 5.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that bothers me about this particular problem is that the session provider conceptually owns the filter data string, and having to copy it is a sign that the session provider is being freed before it's really out of scope.
Agreed
On the other side of the argument is the shutdown path is already complicated and hard to reason about, and adding another place where we partially defer work is only making that worse.
Also agreed.
The perf impact of one extra copy should be negligible, and it makes the code much simpler than the other alternatives.
I imagine this will have effectively no perf impact in the grand scheme of things. Session shutdown is not a high frequency code path and the time it takes to disable a session is dominated by rundown and other work.
This patch looks good to me. I'm surprised this failure doesn't hit more often, especially in CI. There is a clear ownership/lifetime issue though, so we should try to address that as well.
I agree with @tommcdon for backporting. We should at least see if meets bar. If the patch is too large, we could scale back to just doing the wcspy_s
call with the existing code without too much effort.
Thanks for fixing this @davmason!
This will happen when stopping an EventPipe session. IMO, the fix is simple enough to be considered for 5.0 fix. |
@@ -205,12 +205,12 @@ void EventPipeProvider::AddEvent(EventPipeEvent &event) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe add a precondition that the input pointer parameter is not null.
/backport to release/5.0-rc2 |
Started backporting to release/5.0-rc2: https://github.com/dotnet/runtime/actions/runs/258967226 |
CC @lateralusX |
Fixes #39669
There is a use after free bug in EventPipe shutdown. The important steps in shutdown are:
EventPipe::DisableInternal
we calls_config.Disable(*pSession, pEventPipeProviderCallbackDataQueue)
to generate a list ofEventPipeProviderCallbackData
to make callbacks to the providers after the session is closed. This copies some data from theEventPipeSessionProvider
like keywords, level, filter data, etc.pSession->Disable()
which tears the session down and, importantly, deletes the list ofEventPipeSessionProvider
for the sessionpFilterData
field onEventPipeProviderCallbackData
is pointing to freed memory since theEventPipeSessionProvider
is what used to own the memory. It works for all the other fields because they are simple types, but the filter data is a dynamically allocated string.This would cause a crash every once in a while when the memory would be overwritten and we would try to convert garbage in to a UTF-8 string to parse it.
This change fixes the issue by making EventPipeProviderCallbackData own the filter data string (it is copied from the session provider). I made sure to make the copy constructors as deleted so it can't be copied more then the initial time. The only other alternatives I see would be to mess with the order of shutdown and try to keep the session partially alive until the provider callback can happen, or to change the ownership of the filter data string to some other data structure.
The perf impact of one extra copy should be negligible, and it makes the code much simpler than the other alternatives.