-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error using dotnet-gcdump
on Android device
#4348
Comments
This appears to be working fine for an Android emulator:
So must be related to device? |
@jonathanpeppers Yes, we seen similar issue where the diagnostics tooling made incorrect assumption of sizes on some fields returned back from runtime in IPC messages, in this case its process id handled as an int, and it is typed as a pid_t that should map int, but maybe Android running on device uses the upper bit in pid_t that cause side effects. The pid field in the trace event is parsed as a 4 byte int, but it is parsed using unsafe code that doesn't use overflow checks. Most of the diagnostic tooling from dotnet/diagnostics on the other hand build with overflow checks as default, so I assume it will return an int with high bit set and then when it is used in code with overflow check this is detected and you get an overflow exception. The assert you get is something that I fixed end of last week and its caused by incorrect size of session id, #4345, but it didn't have any runtime consequences, just firring an assert in debug builds. Android running on device tends to put allocated memory on higher addresses and is probably related to this, https://source.android.com/docs/security/test/tagged-pointers. It is default since level 30 on arm64 hardware, but it can be disabled if it cause issues. Will look a little deeper to see how we best can mitigate this issue. |
@jonathanpeppers Would it be possible for you to do a gc dump through dotnet-trace targeting the device that you get the error and attach the trace to this issue? Then I will be able to load that exact trace into dotnet-gcdump and debug it "live". You collect the same trace as dotnet-gcdump does by running:
You will see that dotnet-trace writes content into the file and then it will wait for you to stop the session by pressing enter. You can validate that the generated .nettrace file give you the same error by running:
That should give you the same error and if it does, please share the .nettrace file and I can debug it to detect more details about the exact format of the process id. |
Yes, it appears to be the same error:
|
Thanks, looks like the first exception trace line numbers was a little of, so its not the process id, but the module id, its typed as uint64 in the event but dotnet trace library handles it as a long that in turn is casted to a ulong and since the diagnostic tooling is build using overflow checking by default, it will fail to do the explicit cast. Unfortunately we can't change Microsoft.Diagnostics.Tracing.Parsers.Clr.ModuleLoadUnloadTraceData.ModuleID since its a public property, so we will need to temporary disable the checked cast where this property is used. |
@jonathanpeppers Possible to take #4354 for a test drive? |
I tried using # Just to compare, the released, 7.0.447801 version
> dotnet-gcdump collect -p 6432
Writing gcdump to 'D:\src\20231023_104524_6432.gcdump'...
[ERROR] System.ApplicationException: RootIndex not set.
at Graphs.Graph.AllowReading() in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/Graph.cs:line 241
at Graphs.MemoryGraph.AllowReading() in /_/src/Tools/dotnet-gcdump/DotNetHeapDump/MemoryGraph.cs:line 100
at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.TryCollectMemoryGraph(CancellationToken ct, Int32 processId, String diagnosticPort, Int32 timeout, Boolean verbose, MemoryGraph& memoryGraph) in /_/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 130
at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.<>c__DisplayClass1_0.<Collect>b__0() in /_/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 83
at System.Threading.Tasks.Task`1.InnerInvoke()
at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
at Microsoft.Diagnostics.Tools.GCDump.CollectCommandHandler.Collect(CancellationToken ct, IConsole console, Int32 processId, String output, Int32 timeout, Boolean verbose, String name, String diagnosticPort) in /_/src/Tools/dotnet-gcdump/CommandLine/CollectCommandHandler.cs:line 92 # Local build
D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump [lateralusX/fix-moduleid-cast]> dotnet run -- collect -p 6432
Writing gcdump to 'D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\20231023_104605_6432.gcdump'...
Process terminated. Assertion failed.
at Microsoft.Diagnostics.NETCore.Client.EventPipeSession.TryCreateStopMessage(IpcMessage& stopMessage) in D:\src\dotnet\diagnostics\src\Microsoft.Diagnostics.NETCore.Client\DiagnosticsClient\EventPipeSession.cs:line 110
at Microsoft.Diagnostics.NETCore.Client.EventPipeSession.Stop() in D:\src\dotnet\diagnostics\src\Microsoft.Diagnostics.NETCore.Client\DiagnosticsClient\EventPipeSession.cs:line 50
at Microsoft.Diagnostics.Tools.GCDump.EventPipeSessionController.EndSession() in D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\DotNetHeapDump\EventPipeDotNetHeapDumper.cs:line 369
at Microsoft.Diagnostics.Tools.GCDump.EventPipeDotNetHeapDumper.<>c__DisplayClass3_2.<DumpFromEventPipe>b__7() in D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump\DotNetHeapDump\EventPipeDotNetHeapDumper.cs:line 145
at System.Threading.Tasks.Task.InnerInvoke()
at System.Threading.Tasks.Task.<>c.<.cctor>b__272_0(Object obj)
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
at System.Threading.Tasks.Task.ExecuteEntryUnsafe(Thread threadPoolThread)
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback() The convert step works, though: D:\src\dotnet\diagnostics\src\Tools\dotnet-gcdump [lateralusX/fix-moduleid-cast]> dotnet run -- convert D:\src\dotnet-dsrouter.exe_20231023_084435.nettrace
Writing gcdump to 'D:\src\dotnet-dsrouter.exe_20231023_084435.gcdump'... So, this gets partway there, but maybe there is one further problem? |
@jonathanpeppers. that should be a debug build only issue and fixed by #4345. So either build and run in release or apply that patch to fix the assert in debug build. |
Ok, yes adding
|
… really a ulong member. (#4354) The ModuleLoadUnload events ModuleID is typed as a uint64 in the EventPipe manifest and emitted as a uint64 in the event payload. However, the parsing logic in ModuleLoadUnloadTraceDataevent in trace event: https://github.com/microsoft/perfview/blob/a6c87911fe1aef8f59c9ce54aa4e16a1be6db91e/src/TraceEvent/Parsers/ClrEtwAll.cs.base#L4963 handles it as a long. Android devices running arm64 can use pointer tagging meaning that high bits can be set in 64-bit addresses and since module id is a memory address, it will be returned as a negative long from ModuleLoadUnload event and since all diagnostic tooling is build with overflow checking enabled by default, casting it to a ulong will trigger and overflow exception when high bit is set. Fix is to do an unchecked cast in this case since the value should be threated as a ulong in first place. Fixes #4348.
Description
This is working for some apps, but I'm trying to grab a GC dump of a test application where I've reproduced a memory leak in MAUI:
dotnet/maui@main...jonathanpeppers:maui:CarouselViewLeaks
When I run
dotnet-gcdump
I get:Using a local, debug build of
dotnet-gcdump
, I get a different message:I believe I have
dotnet-dsrouter
setup correctly, asdotnet-trace
can connect successfully:Configuration
Regression?
No, this is newly implemented in .NET 8.
/cc @lateralusX
The text was updated successfully, but these errors were encountered: