-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msquic crashing on ARM32 machines #3958
Comments
cc: @ManickaP @CarnaViire |
I'm not sure why there is a problem. As far as I can tell we aren't packing any structs or doing anything that should mess with alignment. The struct in question: typedef struct DATAPATH_RX_IO_BLOCK {
//
// The pool owning this recv block.
//
CXPLAT_POOL* OwningPool;
//
// Represents the network route.
//
CXPLAT_ROUTE Route;
//
// Ref count of receive data/packets that are using this block.
//
long RefCount; And the code that is crashing: void
RecvDataReturn(
_In_ CXPLAT_RECV_DATA* RecvDataChain
)
{
CXPLAT_RECV_DATA* Datagram;
while ((Datagram = RecvDataChain) != NULL) {
RecvDataChain = RecvDataChain->Next;
DATAPATH_RX_PACKET* Packet =
CXPLAT_CONTAINING_RECORD(Datagram, DATAPATH_RX_PACKET, Data);
if (InterlockedDecrement(&Packet->IoBlock->RefCount) == 0) { <==== HERE
CxPlatPoolFree(Packet->IoBlock->OwningPool, Packet->IoBlock);
}
}
}
But this opens a bigger question about the importance of 32-bit ARM support (perhaps 32-bit support in general). For instance, Windows has completely dropped support. What is the practical expectation on Posix operating systems? |
According to ChatGPT adding typedef struct DATAPATH_RX_IO_BLOCK {
//
// The pool owning this recv block.
//
CXPLAT_POOL* OwningPool;
//
// Represents the network route.
//
CXPLAT_ROUTE Route;
//
// Ref count of receive data/packets that are using this block.
//
long __attribute__((aligned(4))) RefCount; |
I'll give it try. It more looked like some null pointer to me. the |
added assert
when --- a/src/platform/datapath_epoll.c
+++ b/src/platform/datapath_epoll.c
@@ -2064,6 +2064,9 @@ RecvDataReturn(
RecvDataChain = RecvDataChain->Next;
DATAPATH_RX_PACKET* Packet =
CXPLAT_CONTAINING_RECORD(Datagram, DATAPATH_RX_PACKET, Data);
+
+ assert(Packet != NULL);
+ assert(Packet->IoBlock != NULL);
if (InterlockedDecrement(&Packet->IoBlock->RefCount) == 0) {
CxPlatPoolFree(Packet->IoBlock->OwningPool, Packet->IoBlock);
} |
By shortening the Watchdog to Timeout/2 can avoid the "Aborted". msquic/src/test/bin/quic_gtest.cpp Line 47 in 55830d3
|
What about the assert Tomas was hitting in datapath_epoll.c? Does that not repro any more? |
Describe the bug
It seems like regression from #3826 @nibanks.
I did binary search on changes and previous commit (cd49fbb) works fine.
This is likely root cause for dotnet/runtime#91757 causing .NET CI failures.
While reported on Alpine, I can reproduce on my Raspberry PI with Debian 10.
Affected OS
Additional OS information
I tried main and commit 972e677
Comment cd49fbb seems ok.
MsQuic version
main
Steps taken to reproduce bug
I can reproduce it running .NET Quic test suite.
Expected behavior
no crash and everything runs as before
Actual outcome
Additional details
I don't tank if this is because of speed tf the machine or if this is because ARM has weak memory model. (or something completely else)
The text was updated successfully, but these errors were encountered: