-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AST1300 PCIe device produces freeze/fence #257
Comments
Parsing the EEH register dump:
So the freeze is due to a config write timeout, which is... unusual. My best guess is that the AST firmware is slow to enable the VGA component so it's not ready to handle the CFG write when we start scanning the bus. If you're ok with building your own firmware can you try this patch?
Reporting this to Raptor is probably a good idea, but I think this going to be unique to that specific adapters so unless Raptor have one I'm not sure they can help much. If you want more detailed instructions about how to patch the Talos' firmware let me know and I'll do a a writeup. |
@oohal Thanks for the patch. I was able to get it to build, though I increased the wait from 1 second to 10 seconds on the advice of someone in #talos-workstation IRC (basically to increase the safety margin). Unfortunately, no change in behavior: the VGA device still doesn't show up in |
Thanks for the logs. I noticed there's some odd stuff in there:
That last write definitely isn't supposed to be happening. I think those writes are coming from phb4_endpoint_init() which enables error reporting for the device. The AST1300 doesn't appear to implement the Advance Error Reporting capability so when we initialise that we end up trashing config offsets 0x18..0x1b since the saved Can you try this patch (keep the LOG_CFG change too):
|
@oohal Thanks for the new patch. Applying it on top of the previous patch I applied yields the following log (unfortunately no change in visible behavior; the VGA device still doesn't show up in skiboot-with-unrecognized-ast1300-with-10s-wait-and-AERcap.txt |
Adding a data point: I tried using a different MiniPCIe device (a WiFi card) with the StarTech PEX2MPEX in my Talos II, and it worked fine. So that at least confirms that the PEX2MPEX isn't responsible for the issue (although we already guessed that). |
I have a Talos II workstation (latest 2.00 firmware from Raptor), and am trying to use an IGCME-1300-R10 GPU (chipset is AST1300) with it (in conjunction with a StarTech PEX2MPEX to attach the MiniPCIe GPU to a standard PCIe slot on the Talos). Unfortunately, while the bridge device component of the IGCME-1300-R10 is detected successfully, the VGA controller component produces a freeze/fence in Skiboot logs, and does not subsequently show up in
lspci
output.I've tried connecting the PEX2MPEX with IGCME-1300-R10 to an x86 machine (running Windows) and the VGA device does show up as a PCIe device in Windows, which indicates that there is something POWER-specific about this problem.
Curiously, on the older firmware that the Talos II shipped with (not sure of the firmware version, but it's whatever the first batch of Talos II machines shipped with, as it was a pre-order), the bridge device component of the IGCME-1300-R10 isn't present in
lspci
output either, which suggests that the situation has at least improved between those firmware versions.I'm attaching Skiboot and
lspci
output from both the latest 2.00 firmware and the firmware that the Talos II shipped with. (Lest anyone get confused, please note that these logs refer to 2 different AST GPU devices: the AST2500 that's part of the built-in BMC (which works fine), and the AST1300 that this issue is about.) Let me know if there's anything I can do to help debug it. (Or, if you think I should be reporting this to Raptor rather than to you, let me know and I'll do so.)lspci-with-unrecognized-ast1300-firmware-2.00.txt
lspci-with-unrecognized-ast1300-stock-firmware.txt
skiboot-with-unrecognized-ast1300-firmware-2.00.txt
skiboot-with-unrecognized-ast1300-stock-firmware.txt
The text was updated successfully, but these errors were encountered: