Support EventPipe high core count CPU scalability #1412

noahfalk · 2019-04-17T22:57:31Z

We had a report from a user on a machine with a 20 core CPU group that a hot lock in EventPipe (EventPipeBufferManager::m_lock) was causing excessive spinning in EventPipeBufferManager::AllocateBufferForThread. There was conflicting evidence about whether or not the system was sustaining high event load. Regardless of the specifics of that case, we need to validate that EventPipe scales well.

We should create a performance test that will EventPipe use from a large number of cores (>= 20) reading with both EventListener and IPC. We should then compare a low load case (1 event/sec/thread) with a high load case (maximum sustainable event rate on all threads) and compare the latency distribution in calls to WriteEvent. A first stab at a goal is probably that P50 latency increases no more than 10% and P99 latency increases no more than 50%.

brianrob · 2019-04-17T23:34:11Z

One thing to consider when investigating this - if in fact there is high contention on EventPipeBufferManager::m_lock in EventPipeBufferManager::AllocateBufferForThread it's likely to be because there is a high enough event load that lots of threads are trying to allocate buffers at the same time and getting stuck behind this lock. The simplest way to solve this is to increase the buffer size or modify the buffer size growth algorithm which would mean that threads would come back through this slow path less often.

noahfalk · 2020-06-16T12:16:25Z

We had another report of this issue come in. I was suspicious that it would be a case where the buffer was saturated but looking over traces events aren't getting dropped. Best guess we've got a lock convoy as the number of threads rise and because this is a spin lock it is very CPU intensive to wait. We either need to decrease contention on the lock, eliminate it completely using lockless strategies, or replace it with lock that has bounded spinning.

gleocadie · 2020-08-26T09:16:15Z

@noahfalk @brianrob
I experienced the same issue but maybe specific to us: variation of number of threads during low/high traffic.

In our setup, we have a companion application that setups the EventPipe session and retrieves them from the stream/socket. The companion application configures the EventPipe session to get informational events from GC,Contention,Exception keywords. The main application runs in a container with 30 CPUs allocated.
The application can live several days before restarting/new version deployment/... After a few days, the application latency is becoming worse and worse.

When deactivating the event emission, the issue was not occurring. We took cpu samples and a lot of them were like that:

dotnet    84/2403  1747745.955015:    1001001 cpu-clock:u:
            7f764b9e73b7 SpinLock::GetLock
            7f764bbab656 EventPipeBufferManager::AllocateBufferForThread
            7f764bbac055 EventPipeBufferManager::WriteEvent
            7f764ba5041c EventPipeSession::WriteEventBuffered
            7f764ba4a41d EventPipe::WriteEventInternal
            7f764ba4a162 EventPipe::WriteEvent
            7f764bdb9b8a EventPipeWriteEventContentionStart_V1
            7f764b9f13f2 AwareLock::EnterEpilogHelper
            7f764b9f10de AwareLock::Enter
            7f764ba66286 JIT_MonEnter_Helper
            7f764ba666a8 JIT_MonReliableEnter_Portable
            7f75d6ef4ee9 instance void [System.Private.CoreLib] System.Threading.TimerQueueTimer::Fire(bool)[OptimizedTier1]
            7f75d6e8125c bool [System.Private.CoreLib] System.Threading.ThreadPoolWorkQueue::Dispatch()[OptimizedTier1]
            7f764baf51d3 CallDescrWorkerInternal
            7f764ba26d8b MethodDescCallSite::CallTargetWorker
            7f764bb9e580 QueueUserWorkItemManagedCallback
            7f764b9fa61a ManagedThreadBase_DispatchOuter
            7f764b9fac7d ManagedThreadBase::ThreadPool
            7f764bb88a15 ManagedPerAppDomainTPCount::DispatchWorkItem
            7f764ba16559 ThreadpoolMgr::WorkerThreadStart
            7f764bd9b312 CorUnix::CPalThread::ThreadEntry
            7f764d4edea5 start_thread

Which means that every event emission was impacted by this lock. We measure the time between the emission time and the time the event is actually processed by the companion application. We can see that this time is growing over time.

I exposed the m_pThreadSessionStateList size and I saw that this list was continuously growing: new threads were added but dead threads were never removed from the list.
As a workaround, our companion application, that gets events in remote, restarts the session every 24h. This seems fixing the issue for now.

Hope it helps.

noahfalk · 2020-08-27T01:41:58Z

Thanks @gleocadie! The intended operation is that a thread would reuse a thread-local buffer for ~99.9% of events and only in the 0.1% case would it exhaust the local buffer and need to acquire the global lock to get a new block of memory. Your report is part of growing evidence that either 0.1% is still too much contention or we have bad behavior causing the lock to get accessed more frequently than intended.

I exposed the m_pThreadSessionStateList size and I saw that this list was continuously growing

That matches my understanding too. As far as I can tell the time the lock is held shouldn't vary depending on the size of that list so I wasn't expecting it to be the root cause here. However its always possible I've overlooked a connection. Even if it isn't the root cause, that growing list could cause other issues and it deserves to be fixed.

gleocadie · 2020-08-27T07:09:37Z

@noahfalk
Thanks for the explanation, it helps in confirming the intent I read in the code.

That matches my understanding too. As far as I can tell the time the lock is held shouldn't vary depending on the size of that list so I wasn't expecting it to be the root cause here. However its always possible I've overlooked a connection. Even if it isn't the root cause, that growing list could cause other issues and it deserves to be fixed.

I agree it's not the root cause and does deserve to be fixed.
I read that when events are written to the wire, a call is made to MoveNextEventAnyThread. In a first pass, the lock is taken and a loop goes over the whole list to find the next buffer to send. This would make the time the lock is held depending on the size of the list. This will have, at some point, impacts on the thread(s) waiting on the lock if the size of the list is still growing.

noahfalk · 2020-08-27T08:00:16Z

@gleocadie - Yep, I stand corrected! That definitely looks like a spot where the duration the lock is held is proportional to the number of threads that are on that list : ) Even though that acquisition isn't on the event writing path it still contends with the lock acquisitions that are. Nice code spelunking!

gleocadie · 2020-10-28T08:58:41Z

@noahfalk - I'm currently testing a fix where I filter out the session state associated to dead threads. I was thinking (after validating the test) I will push my fix upstream (PR). What do you think ?

noahfalk · 2020-10-29T10:53:52Z

Please do : ) I can't evaluate the merits of a specific implementation without seeing it of course but in principle this is the kind of thing we want to fix. You can feel free to push a PR before you have validated it as well, just mark it as a draft and make a note that you are still testing it.

FYI, I might be heads down for a few days here but looping in some other folks on my team who can take a look - @sywhang @davmason @josalem

gleocadie · 2020-11-13T11:39:17Z

@noahfalk - Sorry I did not yet push the PR, my current fix is missing few things (the test was inconclusive) and I'm working on it. I might push it anyway so we can discuss about it (maybe next week). I will ping you and your team when it's done.

noahfalk · 2020-11-13T12:03:53Z

No worries @gleocadie, we can do it whenever you are ready. After next week I'll be on vacation for the remainder of the year but some of my teammates should still be around.

gleocadie · 2020-11-13T12:19:51Z

Thanks, no pb.

josalem · 2021-05-24T18:29:52Z

solved via dotnet/runtime#48435

tommcdon assigned sywhang Apr 19, 2019

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

tommcdon transferred this issue from dotnet/runtime Aug 6, 2020

noahfalk mentioned this issue Aug 13, 2020

CPU utilization steadily grows in asp.net core 3.1 dotnet/aspnetcore#24318

Closed

tommcdon added this to the 6.0 milestone Dec 18, 2020

tommcdon unassigned sywhang Dec 18, 2020

tommcdon mentioned this issue Dec 18, 2020

Improve Eventing Fundamentals dotnet/runtime#45518

Closed

6 tasks

tommcdon added the testing Test work item label Dec 18, 2020

noahfalk assigned josalem Jan 12, 2021

tommcdon changed the title ~~Verify EventPipe scalability~~ Support EventPipe high core count CPU scalability Jan 23, 2021

tommcdon removed the testing Test work item label Jan 23, 2021

noahfalk mentioned this issue Feb 19, 2021

Prevent unbounded lock holds in BufferManager of EventPipe dotnet/runtime#48435

Merged

josalem closed this as completed May 24, 2021

ghost locked as resolved and limited conversation to collaborators Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support EventPipe high core count CPU scalability #1412

Support EventPipe high core count CPU scalability #1412

noahfalk commented Apr 17, 2019

brianrob commented Apr 17, 2019

noahfalk commented Jun 16, 2020

gleocadie commented Aug 26, 2020 •

edited

Loading

noahfalk commented Aug 27, 2020

gleocadie commented Aug 27, 2020 •

edited

Loading

noahfalk commented Aug 27, 2020

gleocadie commented Oct 28, 2020

noahfalk commented Oct 29, 2020

gleocadie commented Nov 13, 2020

noahfalk commented Nov 13, 2020

gleocadie commented Nov 13, 2020

josalem commented May 24, 2021

Support EventPipe high core count CPU scalability #1412

Support EventPipe high core count CPU scalability #1412

Comments

noahfalk commented Apr 17, 2019

brianrob commented Apr 17, 2019

noahfalk commented Jun 16, 2020

gleocadie commented Aug 26, 2020 • edited Loading

noahfalk commented Aug 27, 2020

gleocadie commented Aug 27, 2020 • edited Loading

noahfalk commented Aug 27, 2020

gleocadie commented Oct 28, 2020

noahfalk commented Oct 29, 2020

gleocadie commented Nov 13, 2020

noahfalk commented Nov 13, 2020

gleocadie commented Nov 13, 2020

josalem commented May 24, 2021

gleocadie commented Aug 26, 2020 •

edited

Loading

gleocadie commented Aug 27, 2020 •

edited

Loading