Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does locking for EventSource.DoCommand need to be a global lock? #45059

Open
Tracked by #44598
benaadams opened this issue Nov 21, 2020 · 27 comments
Open
Tracked by #44598

Does locking for EventSource.DoCommand need to be a global lock? #45059

benaadams opened this issue Nov 21, 2020 · 27 comments
Labels
Milestone

Comments

@benaadams
Copy link
Member

benaadams commented Nov 21, 2020

On startup FrameworkEventSource and RuntimeEventSource; while running on different threads, contend on EventSource.Initialize waiting for the DoCommand of the other to complete, so run sequentially (intermixed), also PortableThreadPoolEventSource and ArrayPoolEventSource:

image

image

/cc @stephentoub

Related to #44598

@benaadams benaadams added the tenet-performance Performance related issue label Nov 21, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Nov 21, 2020
@benaadams
Copy link
Member Author

Also PortableThreadPoolEventSource and ArrayPoolEventSource can collide forcing sequential initialization and waits on alternative threads

@benaadams
Copy link
Member Author

image

@benaadams
Copy link
Member Author

benaadams commented Nov 22, 2020

InitializeProviderMetadata and CreateManifestAndDescriptors look to be the problematic methods called under a global lock as they are very reflection heavy

@josalem
Copy link
Contributor

josalem commented Nov 23, 2020

To be clear, you're talking about the EventListenerLock?

// Register the provider with EventPipe
var eventPipeProvider = new OverideEventProvider(this, EventProviderType.EventPipe);
lock (EventListener.EventListenersLock)
{
eventPipeProvider.Register(this);
}

lock (EventListener.EventListenersLock)
{
// If there are any deferred commands, we can do them now.
// This is the most likely place for exceptions to happen.
// Note that we are NOT resetting m_deferredCommands to NULL here,
// We are giving for EventHandler<EventCommandEventArgs> that will be attached later
EventCommandEventArgs? deferredCommands = m_deferredCommands;
while (deferredCommands != null)
{
DoCommand(deferredCommands); // This can never throw, it catches them and reports the errors.
deferredCommands = deferredCommands.nextCommand;
}
}

The lock is used for registration of providers with the EventPipe, EventListener, and EventSource infrastructure. The generation of the manifest and descriptors isn't concurrency sensitive as far as I can recall. Theoretically, we could always separate that generation step from the registration to minimize time under the lock. The issue is that we shouldn't attempt to do that generation until we know the source is going to be used, which is what leads to it being done on registration.

The generation logic seems like a prime candidate for source generation at compile time.

That being said, there should only be deferred commands for an EventSource (the trigger for calling DoCommand under a lock in Initialize) if it is requested to be enabled before Initialize in its constructor is called. If you don't enable those providers, you shouldn't pay the penalty for generating their manifest/descriptors on construction or get into this serialization situation. Are you enabling those providers when you collect this data?

CC @sywhang @noahfalk

@benaadams
Copy link
Member Author

PortableThreadPoolEventSource always seems to have a couple updates to send (on Windows) even if I'm just running at command line (added a Console.WriteLine to check) /cc @kouvel

That causes the manifest to be generated; and its generation uses ArrayPool, which calls ArrayPoolEventSource.Log to check if its enabled which then causes initialization of the ArrayPoolEventSource which blocks because it needs the same EventListenersLock lock that the DoCommands in the other Initalize is using which is what you see here:

image

EventSource.Initalize does grab and release the global EventListenersLock lock 3 times:

  1. For eventPipeProvider.Register(this);
  2. Inside EventListener.AddEventSource(this);
  3. For handling the deferredCommands (which then triggers the lengthy Manifest generation)

@josalem
Copy link
Contributor

josalem commented Nov 23, 2020

Cases 1 and 2 shouldn't be too expensive, but 3 certainly can be due to the reflection required to generate the data.

I find it curious that these providers have deferred commands without being explicitly enabled. Do you observe this on non-Windows platforms? I'm wondering if ETW requests the manifest for all providers that register themselves. That would cause case 3 to fire every time for every EventSource on Windows even if the source isn't enabled by the user.

@tommcdon tommcdon removed the untriaged New issue has not been triaged by the area owner label Nov 23, 2020
@tommcdon tommcdon added this to the 6.0.0 milestone Nov 23, 2020
@josalem
Copy link
Contributor

josalem commented Nov 23, 2020

Another thought: That UI looks like dotTrace. It's possible that dotTrace specifically uses the ThreadPool event source.

@benaadams
Copy link
Member Author

EtwEventRegister is triggering the command

image

Not sure why it only seems to be PortableThreadPoolEventSource and not RuntimeEventSource or NativeRuntimeEventSource

image

Perhaps because it shares the same Guid and Name as NativeRuntimeEventSource; so its the second registration?

@tommcdon
Copy link
Member

Related to #45518

@kouvel
Copy link
Member

kouvel commented Dec 11, 2020

After #45901 the PortableThreadPoolEventSource would not be used during startup unless the thread pool is used by the user's startup code. It was being used by tiered compilation.

I'm not sure if NativeRuntimeEventSource is being used on this path. There is a work item to move events from PortableThreadPoolEventSource into NativeRuntimeEventSource and use the latter instead, but I suspect the manifest generation issue would not go away just from that.

@benaadams
Copy link
Member Author

I haven't worked out why PortableThreadPoolEventSource gets triggered for the manifest generation and NativeRuntimeEventSource doesn't as they both seem to do the same startup and same flags etc

@kouvel
Copy link
Member

kouvel commented Dec 11, 2020

Can you share a code sample to repro this? I'm curious how NativeRuntimeEventSource is being used during startup. Last I checked (which was a while back) it didn't have any events of its own, so I don't yet see how it would be used.

@benaadams
Copy link
Member Author

Can you share a code sample to repro this? I'm curious how NativeRuntimeEventSource is being used during startup.

Happens with a Console.WriteLine("Hello world") app on Windows

NativeRuntimeEventSource gets created by EventSource .cctor

static EventListener()
{
#if FEATURE_PERFTRACING
// Ensure that NativeRuntimeEventSource is initialized so that EventListeners get an opportunity to subscribe to its events.
// This is required because NativeRuntimeEventSource never emit events on its own, and thus will never be initialized
// in the normal way that EventSources are initialized.
GC.KeepAlive(NativeRuntimeEventSource.Log);
#endif // FEATURE_PERFTRACING

Which is triggered by RuntimeEventSource being initialized in StartupHookProvider.ProcessStartupHooks

private static void ProcessStartupHooks()
{
if (!IsSupported)
return;
// Initialize tracing before any user code can be called.
System.Diagnostics.Tracing.RuntimeEventSource.Initialize();

image

@benaadams
Copy link
Member Author

Doesn't explain why the PortableThreadPoolEventSource Etw registration triggers an immediate OnControllerCommand callback which then causes the manifest to be generated; but the RuntimeEventSource and NativeRuntimeEventSource Etw registrations don't

image

@benaadams
Copy link
Member Author

The only difference I can see is Etw SetInformation is called after the Etw EventRegister

fixed (byte* providerMetadata = this.providerMetadata)
{
m_etwProvider.SetInformation(
Interop.Advapi32.EVENT_INFO_CLASS.SetTraits,
providerMetadata,
(uint)this.providerMetadata.Length);

So when NativeRuntimeEventSource does its register; SetInformation has not been called for that name/guid; however when PortableThreadPoolEventSource calls the EventRegister NativeRuntimeEventSource has already called Etw SetInformation for that name/guid (as they share the same one), so it triggers the callback?

@kouvel
Copy link
Member

kouvel commented Dec 11, 2020

I'll defer to the experts on this, perhaps it has something to do with the double-registration, which wouldn't happen after the consolidation work item is done. I don't know the details about what kind of side effects may occur as a result of the current situation. It does look troubling though.

@benaadams
Copy link
Member Author

Added PR to merge the two #45974 does look to stop the manifest generation (though may also be the change to not use it for tiering)

@benaadams
Copy link
Member Author

Looks like the ThreadPool is still being used; whether or not user code uses it, so avoiding double registration does look like the fix for the manifest generation

image

Gets set up by a call to UnsafeQueueUnmanagedWorkItem from ThreadpoolMgr::TimerThreadStart

image

@kouvel
Copy link
Member

kouvel commented Dec 11, 2020

Yea that's the path that tiering uses. I haven't looked at the PR yet, but my understanding was that there were some other issues that needed to be resolved in order to be able to merge those event sources, otherwise we would not get thread pool events, or there would be other issues.

@benaadams
Copy link
Member Author

Hmm... I've clearly got something on my machine that's pre-registered for the portable threadpool (native eventsource)'s Guid; when I switch of all the Jetbrains stuff including the ETW Host Service its still triggering; so not sure what it is

@kouvel
Copy link
Member

kouvel commented Dec 12, 2020

I noticed before that VS enables some providers when it's open

@josalem
Copy link
Contributor

josalem commented Dec 12, 2020

You can check what's creating ETW sessions using tracelog -l -lp > etw.log (-l -> list sessions, -lp -> list providers on those sessions). That ought to tell you if there are any sessions open.

docs for tracelog

@benaadams
Copy link
Member Author

I noticed before that VS enables some providers when it's open

Yep; looks like just having VS open on a project will register it for verbose

Logger Name:            Microsoft-VisualStudio-Telemetry-PerfWatson2-16356
Logger Id:              0x1d
Logger Thread Id:       000000000000421C
Guid:                   522e1583-3cbe-11eb-ad29-982cbc21eb61
Session Security:       ...
Buffer Size:            512 Kb
Maximum Buffers:        160
Minimum Buffers:        40
Number of Buffers:      40
Free Buffers:           34
Buffers Written:        162
Events Lost:            0
Log Buffers Lost:       0
Real Time Buffers Lost: 0
Real Time Consumers:    1
ClockType:              PerfCounter
Log Mode:               Sequential Realtime 
Hybrid Shutdown:        Stop
Maximum File Size:      not set
Buffer Flush Timer:     1 secs
Log Filename:           

Enabled Providers:
Guid                                  Level  Flags               Enable Properties
----------------------------------------------------------------------------------
641d7f6c-481c-42e8-ab7e-d18dc5e5cb9e      5  0xffffffffffffffff  EnableKeywordZero 
4e17e413-3c0c-4c2a-a531-c1799a05ad7c      5  0x0000000000000067  EnableKeywordZero 
e13c0d23-ccbc-4e12-931b-d9cc2eee27e4      5  0x0000000000000099  EnableKeywordZero 
589491ba-4f15-53fe-c376-db7f020f5204      5  0xffffffffffffffff  EnableKeywordZero 
a669021c-c450-4609-a035-5af59af4df18      5  0x0000000000000098  EnableKeywordZero 
ee328c6f-4c94-45f7-acaf-640c6a447654      5  0x0000000000000004  EnableKeywordZero 
Total of 6 providers

3rd provider e13c0d23-ccbc-4e12-931b-d9cc2eee27e4

@benaadams
Copy link
Member Author

Suppose that still counts as "part of a developer's inner loop development process" for #44598 😅

#amirite

@josalem
Copy link
Contributor

josalem commented Dec 15, 2020

That makes sense. I'm curious if we observe the slowdown without VS open or on non-Windows platforms where we don't register with a system-wide tracing service.

CC @karpinsn

@benaadams
Copy link
Member Author

Note this is from just having VS open; running an app in VS also attaches 2 more listeners to the EventSource; so 3 listeners in total from VS

Logger Name:            DiagHubEtwSession.16.0492B62A-A2B9-4B93-BCC6-ADA193F84587.1.user_paged

Guid                                  Level  Flags               Enable Properties
----------------------------------------------------------------------------------
e13c0d23-ccbc-4e12-931b-d9cc2eee27e4      4  0x0000000000000001  EnableKeywordZero 
Logger Name:            DiagHubEtwSession.16.A471A57C-135D-4BEA-B550-F9A1768249CF.1.user_paged

Guid                                  Level  Flags               Enable Properties
----------------------------------------------------------------------------------
e13c0d23-ccbc-4e12-931b-d9cc2eee27e4      4  0x0000000100000000  EnableKeywordZero 

I'm curious if we observe the slowdown without VS open

No; hence my inconsistent results on "fixing" the issue, it was more to do with if I had VS open when profiling or using Console.WriteLine to see if it was triggered by profiler.

or on non-Windows platforms

Is only triggered by default from the EtwEventProvider.EventRegister causing the callback (due to pre-registration by VS) so don't think so unless you are subscribing to the events; however it does always trigger manifest generation on the first DoCommand regardless of whether than is an Enable or SendManifest event as its tied up with the EnsureDescriptorsInitialized code that happens at the start.

OTOH InitializeProviderMetadata does always get triggered when checking .Log.Enabled() which then Jits and initializes all Encoding.UTF8's stuff; but that's resolved by the source generator PR #45699

Using the EventSources #43390 would still trigger the issue with manifest generation unless they were changed to SelfDescribing (though the ManifestBuilder would still be populated in EnsureDescriptorsInitialized); but what the different types of event source modes are and what impact changing them would have is beyond my knowledge.

The reason why consolidating NativeRuntimeEventSource and PortableThreadPoolEventSource stopped the Manifest generation is because NativeRuntimeEventSource already has an on disk manifest it uses (and probably sends instead?), so merging them isn't as straightforward #45974 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants