You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed recently that with a very modern Dell Vostro laptop we have at work, chaos crashes completely on startup, which is kind of interesting. I debugged this a bit and concluded that it's actually the pci server that crashes when setting up the SMBus device.
I disabled this device in eab58d6, since "not detected" is much better than "crashing the machine". Someone with a strong love for the PCI hardware (...) would be very welcome to dig in and do a proper fix for this. I can volunteer to test any fix you make on this hardware (I have only seen it manifested one one single PC, ever.)
Steps taken to find this
While I debugged this, I first tried with this patch which will ignore certain PCI devices in the scanning and setup.
diff --git a/servers/system/pci/pci.c b/servers/system/pci/pci.c
index 53c1de8..7fe5210 100644
--- a/servers/system/pci/pci.c+++ b/servers/system/pci/pci.c@@ -528,7 +528,7 @@ static pci_device_type *pci_scan_slot(pci_device_type *input_device)
bool is_multi = FALSE;
uint8_t header_type;
- for (function = 0; function < 8; function++, input_device->device_function++)+ for (function = 0; function < 4 /*8*/; function++, input_device->device_function++)
{
if (function != 0 && !is_multi)
{
This is just a thought, but maybe it's wrong to assume that all PCI hosts supports 8 functions per device and this is causing the problem? It could be that there is a flag that we could read somehow, that determines how many functions that should be scanned per device, and by not honoring that flag, we use the hardware incorrectly which it doesn't like and crashes in our face. Just a thought but maybe worth investigating.
Finding the failing device
I continued the investigation and, interestingly enough, it seems to be an SMBus device that doesn't like the way we probe its PCI slot:
This is a known "bad device" in terms of our PCI setup code. Or rather; our PCI setup code breaks with this device, likely because it hasn''t caught up with the last 20 years of development in the PC world. :-)
This will do for now; I have verified on the machine in question that we don't reboot on startup when this device is exempt from the PCI setup.
Issue for fixing this long-term: #134.
I noticed recently that with a very modern Dell Vostro laptop we have at work, chaos crashes completely on startup, which is kind of interesting. I debugged this a bit and concluded that it's actually the pci server that crashes when setting up the SMBus device.
I disabled this device in eab58d6, since "not detected" is much better than "crashing the machine". Someone with a strong love for the PCI hardware (...) would be very welcome to dig in and do a proper fix for this. I can volunteer to test any fix you make on this hardware (I have only seen it manifested one one single PC, ever.)
Steps taken to find this
While I debugged this, I first tried with this patch which will ignore certain PCI devices in the scanning and setup.
(We could do like MINIX3 has done it (which was written after chaos had its peak years) and borrow the PCI scanning code from NetBSD instead of trying to write it on our own. Their implementation (the MINIX3 one, which is based on the NetBSD code) can be found here: https://github.com/Stichting-MINIX-Research-Foundation/minix/blame/master/sys/dev/pci/pci_subr.c)
This is just a thought, but maybe it's wrong to assume that all PCI hosts supports 8 functions per device and this is causing the problem? It could be that there is a flag that we could read somehow, that determines how many functions that should be scanned per device, and by not honoring that flag, we use the hardware incorrectly which it doesn't like and crashes in our face. Just a thought but maybe worth investigating.
Finding the failing device
I continued the investigation and, interestingly enough, it seems to be an SMBus device that doesn't like the way we probe its PCI slot:
The code above excludes this device/function from the scanning.
Does this SMBus device need to be probed in some special way or what's the deal here?
More details about the PCI subsystem on this machine
For reference, here is the full output of lspci:
The text was updated successfully, but these errors were encountered: