Blind fix of raspberry halt due to unhandled int #195

spiliot · 2013-01-18T15:11:20Z

Following this
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=23544&sid=5e54cf47d02ac7f344957b19d4f944b2&start=132
(first message should come up as by spiliot, 6th page)

I blind edited the int handler to do something with the interrupt instead of just exiting. This now works in my case without any implications.

The first time I send something to /dev/usb/lp0 (debug enabled) I get a
usblp0: nonzero read bulk status received: -5
which probably explains why the dwc_otg driver was entering the INT loop in the first place, but I lack the knowledge to pursue this further.

Following this http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=23544&sid=5e54cf47d02ac7f344957b19d4f944b2&start=132 (first message should come up as by spiliot, 6th page) I blind edited the int handler to do something with the interrupt instead of just exiting. This now works in my case without any implications. The first time I send something to /dev/usb/lp0 (debug enabled) I get a usblp0: nonzero read bulk status received: -5 which probably explains why the dwc_otg driver was entering the INT loop in the first place, but I lack the knowledge to pursue this further.

P33M · 2013-01-18T22:26:06Z

Ahoy there.

I think we went down this route previously when figuring out why USB-serial was so broken on r-pi.

From http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=16280&start=25

The gist of it is, a data toggle error interrupt can occur at the same time as a channel halted interrupt. The hc_chltd handler doesn't check for anything to do with data toggle errors, does nothing to reset the interrupt and therefore loops forever. According to "the silicon documentation" which I have yet to see, a data toggle error should not create a halt in the host channel.

As your case is reproducible on your setup, could you apply the patch

diff --git a/drivers/usb/host/dwc_otg/dwc_otg_hcd_intr.c b/drivers/usb/host/dwc_otg/dwc_otg_hcd_intr.c
index 3e762e2..df03c3f 100644
--- a/drivers/usb/host/dwc_otg/dwc_otg_hcd_intr.c
+++ b/drivers/usb/host/dwc_otg/dwc_otg_hcd_intr.c
@@ -1918,8 +1918,32 @@ static int32_t handle_hc_datatglerr_intr(dwc_otg_hcd_t * hcd,
                                         dwc_otg_hc_regs_t * hc_regs,
                                         dwc_otg_qtd_t * qtd)
 {
+       /* A data toggle error in a BULK or INTR transaction is benign and continuing
+        * the transaction will (as per USB spec) result in resynchronisation.
+        * In DMA mode the channel may also be halted automatically by the host -
+         * Therefore there is nothing to do here but cleanup host-side and try again
+        */
+       // FIXME: This code is for TEST PURPOSES to solve infinite looping in an interrupt
+       char * eptype;
        DWC_DEBUGPL(DBG_HCDI, "--Host Channel %d Interrupt: "
                    "Data Toggle Error--\n", hc->hc_num);
+       switch (hc->ep_type) {
+               case DWC_OTG_EP_TYPE_BULK:
+                       eptype = "BULK";
+                       break;
+               case DWC_OTG_EP_TYPE_INTR:
+                       eptype = "INTERRUPT";
+                       break;
+               case DWC_OTG_EP_TYPE_ISOC:
+                       eptype = "ISOCHRONOUS";
+                       break;
+               case DWC_OTG_EP_TYPE_CONTROL:
+                       eptype = "CONTROL";
+                       break;
+               default:
+                       eptype = "NO IDEA";
+       }
+       DWC_ERROR("Data Toggle Error - on endpoint type %s\n", eptype);

        if (hc->ep_is_in) {
                qtd->error_count = 0;
@@ -1927,9 +1951,10 @@ static int32_t handle_hc_datatglerr_intr(dwc_otg_hcd_t * hcd,
                DWC_ERROR("Data Toggle Error on OUT transfer,"
                          "channel %d\n", hc->hc_num);
        }
-
+
+       /* No choice but to disable and restart DMA channel as core has halted */
        disable_hc_int(hc_regs, datatglerr);
-
+       halt_channel(hcd, hc, qtd, DWC_OTG_HC_XFER_NO_HALT_STATUS);
        return 1;
 }

@@ -2078,6 +2103,8 @@ static void handle_hc_chhltd_intr_dma(dwc_otg_hcd_t * hcd,
                handle_hc_babble_intr(hcd, hc, hc_regs, qtd);
        } else if (hcint.b.frmovrun) {
                handle_hc_frmovrun_intr(hcd, hc, hc_regs, qtd);
+       } else if (hcint.b.datatglerr) {
+               handle_hc_datatglerr_intr(hcd, hc, hc_regs, qtd);
        } else if (!out_nak_enh) {
                if (hcint.b.nyet) {
                        /*
@@ -2120,6 +2147,7 @@ static void handle_hc_chhltd_intr_dma(dwc_otg_hcd_t * hcd,
                                halt_channel(hcd, hc, qtd,
                                             DWC_OTG_HC_XFER_PERIODIC_INCOMPLETE);
                        } else {
+                               /* BULK or CONTROL */
                                DWC_ERROR
                                    ("%s: Channel %d, DMA Mode -- ChHltd set, but reason "
                                     "for halting is unknown, hcint 0x%08x, intsts 0x%08x\n",
@@ -2127,6 +2155,8 @@ static void handle_hc_chhltd_intr_dma(dwc_otg_hcd_t * hcd,
                                     DWC_READ_REG32(&hcd->
                                                    core_if->core_global_regs->
                                                    gintsts));
+                               dump_stack();
+                               halt_channel(hcd, hc, qtd, DWC_OTG_HC_XFER_NO_HALT_STATUS);
                        }

                }

Note that this was vs. a very old version of the pi kernel. Some mix and match may be required to reproduce the same result.

spiliot · 2013-01-19T16:53:44Z

Indeed looks like a data toggle error is occurring.

After incorporating the debug code I get an endless loop of ERROR::handle_hc_datatglerr_intr:1948 Data Toggle Error - on endpoint type BULK

ghollingworth · 2013-07-17T20:18:45Z

This should be closed, this was fixed with a bunch of error handling

test_bpf tail call tests end up as: test_bpf: #0 Tail call leaf jited:1 85 PASS test_bpf: #1 Tail call 2 jited:1 111 PASS test_bpf: #2 Tail call 3 jited:1 145 PASS test_bpf: #3 Tail call 4 jited:1 170 PASS test_bpf: #4 Tail call load/store leaf jited:1 190 PASS test_bpf: #5 Tail call load/store jited:1 BUG: Unable to handle kernel data access on write at 0xf1b4e000 Faulting instruction address: 0xbe86b710 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash PowerMac Modules linked in: test_bpf(+) CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195 Hardware name: PowerMac3,1 750CL 0x87210 PowerMac NIP: be86b710 LR: be857e88 CTR: be86b704 REGS: f1b4df20 TRAP: 0300 Not tainted (6.1.0-rc4+) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28008242 XER: 00000000 DAR: f1b4e000 DSISR: 42000000 GPR00: 00000001 f1b4dfe0 c11d2280 00000000 00000000 00000000 00000002 00000000 GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8 GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000 GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00 NIP [be86b710] 0xbe86b710 LR [be857e88] __run_one+0xec/0x264 [test_bpf] Call Trace: [f1b4dfe0] [00000002] 0x2 (unreliable) Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 0000000000000000 ]--- This is a tentative to write above the stack. The problem is encoutered with tests added by commit 38608ee ("bpf, tests: Add load store test case for tail call") This happens because tail call is done to a BPF prog with a different stack_depth. At the time being, the stack is kept as is when the caller tail calls its callee. But at exit, the callee restores the stack based on its own properties. Therefore here, at each run, r1 is erroneously increased by 32 - 16 = 16 bytes. This was done that way in order to pass the tail call count from caller to callee through the stack. As powerpc32 doesn't have a red zone in the stack, it was necessary the maintain the stack as is for the tail call. But it was not anticipated that the BPF frame size could be different. Let's take a new approach. Use register r4 to carry the tail call count during the tail call, and save it into the stack at function entry if required. This means the input parameter must be in r3, which is more correct as it is a 32 bits parameter, then tail call better match with normal BPF function entry, the down side being that we move that input parameter back and forth between r3 and r4. That can be optimised later. Doing that also has the advantage of maximising the common parts between tail calls and a normal function exit. With the fix, tail call tests are now successfull: test_bpf: #0 Tail call leaf jited:1 53 PASS test_bpf: #1 Tail call 2 jited:1 115 PASS test_bpf: #2 Tail call 3 jited:1 154 PASS test_bpf: #3 Tail call 4 jited:1 165 PASS test_bpf: #4 Tail call load/store leaf jited:1 101 PASS test_bpf: #5 Tail call load/store jited:1 141 PASS test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] Suggested-by: Naveen N. Rao <[email protected]> Fixes: 51c66ad ("powerpc/bpf: Implement extended BPF on PPC32") Cc: [email protected] Signed-off-by: Christophe Leroy <[email protected]> Tested-by: Naveen N. Rao <[email protected] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/757acccb7fbfc78efa42dcf3c974b46678198905.1669278887.git.christophe.leroy@csgroup.eu

commit 89d21e2 upstream. test_bpf tail call tests end up as: test_bpf: #0 Tail call leaf jited:1 85 PASS test_bpf: #1 Tail call 2 jited:1 111 PASS test_bpf: #2 Tail call 3 jited:1 145 PASS test_bpf: #3 Tail call 4 jited:1 170 PASS test_bpf: #4 Tail call load/store leaf jited:1 190 PASS test_bpf: #5 Tail call load/store jited:1 BUG: Unable to handle kernel data access on write at 0xf1b4e000 Faulting instruction address: 0xbe86b710 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash PowerMac Modules linked in: test_bpf(+) CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195 Hardware name: PowerMac3,1 750CL 0x87210 PowerMac NIP: be86b710 LR: be857e88 CTR: be86b704 REGS: f1b4df20 TRAP: 0300 Not tainted (6.1.0-rc4+) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28008242 XER: 00000000 DAR: f1b4e000 DSISR: 42000000 GPR00: 00000001 f1b4dfe0 c11d2280 00000000 00000000 00000000 00000002 00000000 GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8 GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000 GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00 NIP [be86b710] 0xbe86b710 LR [be857e88] __run_one+0xec/0x264 [test_bpf] Call Trace: [f1b4dfe0] [00000002] 0x2 (unreliable) Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 0000000000000000 ]--- This is a tentative to write above the stack. The problem is encoutered with tests added by commit 38608ee ("bpf, tests: Add load store test case for tail call") This happens because tail call is done to a BPF prog with a different stack_depth. At the time being, the stack is kept as is when the caller tail calls its callee. But at exit, the callee restores the stack based on its own properties. Therefore here, at each run, r1 is erroneously increased by 32 - 16 = 16 bytes. This was done that way in order to pass the tail call count from caller to callee through the stack. As powerpc32 doesn't have a red zone in the stack, it was necessary the maintain the stack as is for the tail call. But it was not anticipated that the BPF frame size could be different. Let's take a new approach. Use register r4 to carry the tail call count during the tail call, and save it into the stack at function entry if required. This means the input parameter must be in r3, which is more correct as it is a 32 bits parameter, then tail call better match with normal BPF function entry, the down side being that we move that input parameter back and forth between r3 and r4. That can be optimised later. Doing that also has the advantage of maximising the common parts between tail calls and a normal function exit. With the fix, tail call tests are now successfull: test_bpf: #0 Tail call leaf jited:1 53 PASS test_bpf: #1 Tail call 2 jited:1 115 PASS test_bpf: #2 Tail call 3 jited:1 154 PASS test_bpf: #3 Tail call 4 jited:1 165 PASS test_bpf: #4 Tail call load/store leaf jited:1 101 PASS test_bpf: #5 Tail call load/store jited:1 141 PASS test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] Suggested-by: Naveen N. Rao <[email protected]> Fixes: 51c66ad ("powerpc/bpf: Implement extended BPF on PPC32") Cc: [email protected] Signed-off-by: Christophe Leroy <[email protected]> Tested-by: Naveen N. Rao <[email protected] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/757acccb7fbfc78efa42dcf3c974b46678198905.1669278887.git.christophe.leroy@csgroup.eu Signed-off-by: Greg Kroah-Hartman <[email protected]>

…nslate the same key [ Upstream commit 5476fcf ] The hid-apple driver does not support chaining translations or dependencies on other translations. This creates two problems: 1 - In Non-English keyboards of Macs, KEY_102ND and KEY_GRAVE are swapped and the APPLE_ISO_TILDE_QUIRK is used to work around this problem. The quirk is not set for the Macs where these bugs happen yet (see the 2nd patch for that), but this can be forced by setting the iso_layout parameter. Unfortunately, this only partially works. KEY_102ND gets translated to KEY_GRAVE, but KEY_GRAVE does not get translated to KEY_102ND, so both of them end up functioning as KEY_GRAVE. This is because the driver translates the keys as if Fn was pressed and the original is sent if it is not pressed, without any further translations happening on the key[raspberrypi#463]. KEY_GRAVE is present at macbookpro_no_esc_fn_keys[raspberrypi#195], so this is what happens: - KEY_GRAVE -> KEY_ESC (as if Fn is pressed) - KEY_GRAVE is returned (Fn isn't pressed, so translation is discarded) - KEY_GRAVE -> KEY_102ND (this part is not reached!) ... 2 - In case the touchbar does not work, the driver supports sending Escape when Fn+KEY_GRAVE is pressed. As mentioned previously, KEY_102ND is actually KEY_GRAVE and needs to be translated before this happens. Normally, these are the steps that should happen: - KEY_102ND -> KEY_GRAVE - KEY_GRAVE -> KEY_ESC (Fn is pressed) - KEY_ESC is returned Though this is what happens instead, as dependencies on other translations are not supported: - KEY_102ND -> KEY_ESC (Fn is pressed) - KEY_ESC is returned This patch fixes both bugs by ordering the translations correctly and by making the translations continue and not return immediately after translating a key so that chained translations work and translations can depend on other ones. This patch also simplifies the implementation of the swap_fn_leftctrl option a little bit, as it makes it simply use a normal translation instead adding extra code to translate a key to KEY_FN[raspberrypi#381]. This change wasn't put in another patch as the code that translates the Fn key needs to be changed because of the changes in the patch, and those changes would be discarded with the next patch anyway (the part that originally translates KEY_FN to KEY_LEFTCTRL needs to be made an else-if branch of the part that transltes KEY_LEFTCTRL to KEY_FN). Note: Line numbers (#XYZ) are for drivers/hid/hid-apple.c at commit 20afcc4 ("HID: apple: Add "GANSS" to the non-Apple list"). Note: These bugs are only present on Macs with a keyboard with no dedicated escape key and a non-English layout. Signed-off-by: Kerem Karabay <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

…nslate the same key [ Upstream commit 5476fcf ] The hid-apple driver does not support chaining translations or dependencies on other translations. This creates two problems: 1 - In Non-English keyboards of Macs, KEY_102ND and KEY_GRAVE are swapped and the APPLE_ISO_TILDE_QUIRK is used to work around this problem. The quirk is not set for the Macs where these bugs happen yet (see the 2nd patch for that), but this can be forced by setting the iso_layout parameter. Unfortunately, this only partially works. KEY_102ND gets translated to KEY_GRAVE, but KEY_GRAVE does not get translated to KEY_102ND, so both of them end up functioning as KEY_GRAVE. This is because the driver translates the keys as if Fn was pressed and the original is sent if it is not pressed, without any further translations happening on the key[#463]. KEY_GRAVE is present at macbookpro_no_esc_fn_keys[#195], so this is what happens: - KEY_GRAVE -> KEY_ESC (as if Fn is pressed) - KEY_GRAVE is returned (Fn isn't pressed, so translation is discarded) - KEY_GRAVE -> KEY_102ND (this part is not reached!) ... 2 - In case the touchbar does not work, the driver supports sending Escape when Fn+KEY_GRAVE is pressed. As mentioned previously, KEY_102ND is actually KEY_GRAVE and needs to be translated before this happens. Normally, these are the steps that should happen: - KEY_102ND -> KEY_GRAVE - KEY_GRAVE -> KEY_ESC (Fn is pressed) - KEY_ESC is returned Though this is what happens instead, as dependencies on other translations are not supported: - KEY_102ND -> KEY_ESC (Fn is pressed) - KEY_ESC is returned This patch fixes both bugs by ordering the translations correctly and by making the translations continue and not return immediately after translating a key so that chained translations work and translations can depend on other ones. This patch also simplifies the implementation of the swap_fn_leftctrl option a little bit, as it makes it simply use a normal translation instead adding extra code to translate a key to KEY_FN[#381]. This change wasn't put in another patch as the code that translates the Fn key needs to be changed because of the changes in the patch, and those changes would be discarded with the next patch anyway (the part that originally translates KEY_FN to KEY_LEFTCTRL needs to be made an else-if branch of the part that transltes KEY_LEFTCTRL to KEY_FN). Note: Line numbers (#XYZ) are for drivers/hid/hid-apple.c at commit 20afcc4 ("HID: apple: Add "GANSS" to the non-Apple list"). Note: These bugs are only present on Macs with a keyboard with no dedicated escape key and a non-English layout. Signed-off-by: Kerem Karabay <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

P33M mentioned this pull request Mar 2, 2013

dwc_otg: driver does not handle data toggle errors on SPLIT transactions #241

Closed

ghollingworth closed this Jul 17, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blind fix of raspberry halt due to unhandled int #195

Blind fix of raspberry halt due to unhandled int #195

spiliot commented Jan 18, 2013

P33M commented Jan 18, 2013

spiliot commented Jan 19, 2013

ghollingworth commented Jul 17, 2013

Blind fix of raspberry halt due to unhandled int #195

Blind fix of raspberry halt due to unhandled int #195

Conversation

spiliot commented Jan 18, 2013

P33M commented Jan 18, 2013

spiliot commented Jan 19, 2013

ghollingworth commented Jul 17, 2013