STM32 CAN attach RxIrq unusable. #9495

TobinHall · 2019-01-24T19:22:17Z

Description of defect

In the CAN implementation for STM32 microcontrollers the CAN::read() function is responsible for clearing the flag that generates the Rx interrupt. This is because the flag is reset by a register that also controls the FIFO buffer. If the data is not pulled from the buffer before the flag is reset then the data will not be available.

The issue is that CAN::read() has uses a mutex which can't be used in an interrupt context.
Because you need to read the data before resetting the flag which is generating the interrupt and you cant read the data in an interrupt because of the mutex in the read function, it is impossible to use CAN with an interrupt.

As far as I can tell, this will affect all STM32 microcontrollers. It may also effect other platforms depending on their implementation.

Either the mutex needs to be removed or the data must be read and cached by a lower level ISR.

Target(s) affected by this defect ?

STM32

Toolchain(s) (name and version) displaying this defect ?

n/a

What version of Mbed-os are you using (tag or sha) ?

n/a

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

n/a

How is this defect reproduced ?

n/a

TobinHall · 2019-01-24T19:24:11Z

#5267 is likely a related issue.

cmonr · 2019-01-30T20:18:59Z

@ARMmbed/mbed-os-hal @ARMmbed/team-st-mcd

ARMmbed#9495

TacoGrandeTX · 2019-03-11T17:05:17Z

@ARMmbed/team-st-mcd Could you please review the commit Flowm@c8a37ff?

Flowm · 2019-03-11T18:28:46Z

Unfortunately this commit is just a workaround, not a proper fix.

It simply disables the mutex in CAN::read() instead of fixing the underlying issue which was properly described by #9495 (comment).

Afterwards CAN communication works properly with can.attach(callback(), IrqType::RxIrq); and can.read() in the callback.

bikeNomad · 2019-12-14T00:30:38Z

You could subclass mbed::CAN and override lock() and unlock() to do nothing.

pilotak · 2019-12-14T08:46:19Z

https://github.com/pilotak/NoMutexCAN

jeromecoutant · 2019-12-16T09:05:08Z

Hi
I advice you to push some pull request in order to get a full review by HAL experts.
regards,

pilotak · 2019-12-16T09:56:12Z

@jeromecoutant is mutex really needed in CAN driver?

jeromecoutant · 2019-12-16T10:01:18Z

@kjbracey-arm

kjbracey · 2019-12-16T10:30:39Z

I was recently discussing CAN and interrupts here: #6714

I suggest reading through that, as it covers pretty much everything about interrupts and locks.

I was not aware of this issue at the time, and I'm not quite sure how to read it. Is it the case that the example code over in that issue won't work on STM? (Signal event in IRQ handler, do read from thread woken by that event?)

Locally overriding CAN to make an UnlockedCAN that can do a read from interrupt is valid for an application to do, but an application should not have to do that - an application that wakes a thread to do the read should be valid.

The HAL CAN API does seem to be woefully underspecified, so this is debatable.

I'm not familiar with CAN, or typical hardware implementations, but it seems to be close analog to the UART. There we let the HAL be "thin" for typical UARTs - we don't require there to be buffering in the HAL, but because the HW buffers are small we in practice do need data to be read from interrupt.

This is then covered by the separate classes RawSerial, which lets you do an interrupt read, and UARTSerial, which does the interrupt read for the application, and provides buffering.

We don't have such a split for CAN - maybe it is a bit like my nemesis Serial - something that has to read from interrupt but can't. Hopefully there's enough hardware buffering that doing read from a woken thread is sufficient.

pilotak · 2019-12-16T11:37:04Z

something that has to read from interrupt but can't

that's exactly where the problem is - with current implementation.

It would be nice if mbed would use buffering supported by HW (3 messages on STM platform) or maybe SW queue and split CAN as Serial - UnbufferedCAN and BufferedCAN

kjbracey · 2019-12-16T11:40:22Z

Yes, it needs to do one or the other. If hardware normally has sufficient buffers, and you're saying STM does, then apps relying on buffering to do reads from threads should be fine.

The STM HAL should be fixed so that the example code here works, if it doesn't.

#6714 (comment)

Mixerito · 2020-06-02T14:52:37Z

It is hard to believe, that this issue still exists and nobody offered a fix or some working solution except disablling and reenabling CAN IRQ flag.

AGlass0fMilk · 2020-06-19T11:53:04Z

Wow, just ran into this issue myself on a NUCLEO_F413ZH

I am surprised the history of this is so long, detailed, and unsolved! Mbed certainly needs a refactor of its CAN HAL API... It's workable for now but there are many improvements that could be made.

It would be nice if mbed would use buffering supported by HW (3 messages on STM platform) or maybe SW queue and split CAN as Serial - UnbufferedCAN and BufferedCAN

From what I can tell Mbed does use the 3 message FIFO on STM -- the RX FIFOs are managed entirely by hardware. The STM can_read implementation does, in theory, support reading from either 3-message FIFO (FIFO0 or FIFO1) based on the given handle argument. However, it is not currently possible for the application to enable storing messages in FIFO1 because of the way can_filter is implemented -- every filter is configured to place matching messages into FIFO0. This would be easy to fix but I think STM is using handle incorrectly in their driver.

My understanding of the intent behind the handle parameter in the can_api.h is that it is a unique identifier to a certain filter the user has previously configured. STM is using inconsistently in can_read and can_filter. I can't say I blame them too much: It seems the can_api.h file does not even have comments explaining what each function/argument is supposed to do...

@kjbracey-arm @maclobdell I think it's about time the CAN API in Mbed gets some love. Maybe it could be worked onto the back end of the mysterious Mbed schedule? I would be willing to participate in defining an updated HAL spec.

Mixerito · 2020-06-19T12:36:46Z

I have also noticed more unfinished or badly implemented CAN features. For example it is impossible to attach interrupt on Error IRQs. (EwIrq, EpIrq, BeIrq), because of always disabled Global ERRIE flag. Moreover LEC interrupt is not even defined. And IRQ on TX Overrun is also not possible to catch really. STM CAN really needs to be reimplemented from scratch.

pilotak · 2020-06-22T09:14:53Z

I'm happy to test new CAN things

ciarmcom · 2020-10-13T13:39:35Z

@TobinHall thank you for raising this issue.Please take a look at the following comments:

Could you add some more detail to the description? A good description should be at least 25 words.
What target(s) are you using?
What toolchain(s) are you using?
What version of Mbed OS are you using (tag or sha)?
It would help if you could also specify the versions of any tools you are using?
How can we reproduce your issue?

NOTE: If there are fields which are not applicable then please just add 'n/a' or 'None'.This indicates to us that at least all the fields have been considered.
Please update the issue header with the missing information, the issue will not be mirroredto our internal defect tracking system or investigated until this has been fully resolved.

adbridge · 2020-10-15T12:07:49Z

We've updated our automation, I will fix the requirements .

ciarmcom · 2020-10-15T13:24:24Z

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/IOTOSM-2465

haarkon · 2020-11-25T16:12:50Z

I believe there won't be any fix for this problem (as it's linked with STM32 hardware and Mbed "one for all" library) except a buffered/unbuffered solution.

As Mutex prevent using read method in ISR context, but the only way that STM32 FDCAN have chosen to lower the IRQ flag is a reading. So, using attach method brings you to a deadlock.

Thus, this deadlock cannot be solved as it’s may be impossible to remove the mutex from Mbed lib because some others CAN controllers may have the use of it, and STM32 won’t change their hardware (and I don’t wish to have a flag that can be cleared while message is still unread).

The best solution at that time (that will remove any portability of Mbed code) is to remove the useless mutex for STM32 (because mutex is just useless with FDCAN, as, you just write or read in memory, it’s thread safe).

I know it’s not “clean” but it’s the only solution that ensure a valid CAN use of IRQ for STM32 at least before Mbed create a buffered class.

jeromecoutant · 2021-04-06T09:50:20Z

The best solution at that time (that will remove any portability of Mbed code) is to remove the useless mutex

@donatieng please comment
Thx

MubeenHCLite · 2021-04-07T07:12:15Z

@donatieng to add to @jeromecoutant I have tested CAN using RxIRQ after removing mutex from CAN::read(). It works fine.

pilotak · 2021-04-07T07:48:02Z

I can confirm it works fine, i created a lib which remove mutex https://github.com/pilotak/NoMutexCAN

donatieng · 2021-04-23T10:42:16Z

Hi folks,

Reading back on this thread - the only "proper" solutions I can see are:

Stick with the current API but have the application use a "bottom-half" pattern to call the read() function in a thread that get notified by the IRQ. I'd be keen to understand if this wouldn't work for timing reasons (FIFO gets overflowed if the bottom-half is not processed fast enough)
Alternatively we implement a similar architecture to the Serial API as suggested by @kjbracey-arm in STM32 CAN attach RxIrq unusable. #9495 (comment)

Removing the mutex will of course work fine until you start using the instance concurrently. Not sure how much of a use case this is for CAN though, but definitely the current Mbed OS pattern for drivers.

kjbracey · 2021-04-23T13:52:32Z

Reading back myself, I'm still not 100% clear. I'm potentially reading it differently to Donatien - we don't have to do an IRQ read for latency (which is UART's problem), but rather the STM HAL continuously asserts a level-based IRQ until a read? So we deadlock unless the app reads from the IRQ?

I would say that that behaviour is contrary to any other attach method implementation I can think of in the Mbed HAL. It's generally assumed that HAL interrupts act as "edge-based", simulated in HAL software if necessary. IRQs should be free to defer reads to threads. Interrupts should be delatched in the HAL before making the attach upcall. (Not after, as that's racy if the attachee is reading directly).

Can that not be done in the STM CAN implementation? Alternatively, mask the interrupt until the read happens?

MubeenHCLite · 2021-05-10T07:58:24Z

@kjbracey-arm
The problem here is the receive interrupt flags (FMP0[1:0]) are read only and are only cleared on reading a mailbox.

The reference manual states that
FMP is increased each time the hardware stores a new message in to the FIFO. FMP is
decreased each time the software releases the output mailbox by setting the RFOM0 bit.

Hence delatching the interrupt is not possible until a read happens, not possible in HAL.

kjbracey · 2021-05-10T08:16:45Z

You may not be able to suppress the interrupt source, but can it not be masked from the interrupt output, either in the FDCAN itself, or the NVIC? Mask it when asserted, and unmask after a read?

I'm looking at the STM32G4 reference manual, which doesn't seem to tally with your text above, but the FDCAN_IE register?

ned-pcs · 2021-05-11T13:31:12Z

Masking might be the best bet. Looking at the STM32F072 reference manual (RM0091) (which has the bxCAN controller), the CAN_IER interrupt enable register has an interrupt enable bit for each interrupt source, including FMP[1:0] which signal one or more RX message(s) pending.

Of course, the Mbed CAN API's documentation (and intent, I think) about the RX filters doesn't agree with the STM32 HAL implementation, in which the filters don't work, and one FIFO is unavailable.

AGlass0fMilk · 2021-05-12T14:13:11Z

Sorry if I'm a bit naive, I've been following this thread for some time now though.

Why is it not possible to cache the received CAN messages in a FIFO and clear the flag in the interrupt?

haarkon · 2021-05-12T16:26:42Z

Because :

There is a mutex in the read

The read is the only way to clear the flag

The ISR context doesn't allow Mutex.

Remove the mutex, and problem is solved.

But you are not using the mbed lib anymore.

kjbracey · 2021-05-14T06:04:37Z

But what AGlassOfMilk suggests would be possible inside the HAL implementation - it could unload messages from the hardware into a FIFO for later delivery via the read call.

Although what do you do when your FIFO's full? Maybe just throw stuff away, but not sure if that makes sense in the protocol. I'm unfamiliar with CAN.

(I've seen things doing that for TCP - which is Very Bad. Turning a reliable protocol with end-to-end flow control forcing the sender to wait until you read, into something that just chucks data away if it's not read fast enough.)

AGlass0fMilk · 2021-05-14T14:35:47Z

CAN is typically used in a periodic fashion (eg: broadcast temperature reading every 1s).

Most protocols treat CAN as a lossy network and have retry/ACK logic if delivery must be guaranteed. There's also filters so you can route critical messages to a separate FIFO.

Missing a packet typically means you will just get the same information next time it is sent.

At some point there has to be a constraint on memory -- no machine has infinite RAM (though desktop applications sometimes seem like they assume so).

The network speeds are also relatively slow (1Mbps max in classic CAN) so your application should have plenty of time to empty the FIFO and process messages.

I think the FIFO approach would be appropriate in this case.

pilotak · 2021-05-14T18:49:59Z

I actually think that it will be good for me/user of mbed to decide if you want to catch all the messages ie.: generate IRQ that has to be cleared (and cached by circular buffer for example) or if you don't mind some looseness. On the other side @AGlass0fMilk is right - if it's a critical message there is alwas some kind of ACK mechanism so i can agree with him that FIFO approach should the way to go.
1Mbps is really maximum of classic CAN you can find, most likely 500k of 250k is used. If you connect to a car or truck CAN bus you can see there are even messages with 20ms send rate. There is also an FDCAN growing up in popularity which can go 5Mbps or even higher.

MubeenHCLite · 2021-05-20T07:52:52Z

Interrupt handling in STM HAL is not functional. Interrupt enabling and IRQ handling are done in STM CAN API itself.

I have tried masking interrupts for legacy/bxCAN in STM CAN API IRQhandler, unmasking in can_read and deferring reads to thread context. This works fine. I have tried this on NUCLEO_F207ZG and NUCLEO-F767ZI platforms. It works without any problems.

FDCAN does not have issues deferring reads to thread context, as masking and unmasking of interrupt is already been taken care in CAN API. I have verified this on NUCLEO-G474RE platform.

MubeenHCLite · 2021-05-21T10:49:57Z

Raised a PR for the implementation where can_read can be deferred to thread context as mentioned in the comment just above
#14688

MubeenHCLite · 2021-07-13T05:13:02Z

PR is now merged.
This is to be closed.

ciarmcom added type: bug Jira status: OPEN labels Jan 24, 2019

TacoGrandeTX added the devices: st label Jan 25, 2019

Flowm added a commit to Flowm/mbed-os that referenced this issue Mar 7, 2019

Remove mutex in CAN driver to fix read in CAN RxIrq

c8a37ff

ARMmbed#9495

ARMmbed deleted a comment from ciarmcom Oct 5, 2020

adbridge removed Jira status: OPEN labels Oct 5, 2020

ciarmcom added Jira status: OPEN labels Oct 15, 2020

ciarmcom added the IOTOSM-2465 label Oct 15, 2020

jeromecoutant mentioned this issue Apr 12, 2021

STM32G474 - CAN TxIRq callback never called #14528

Closed

ciarmcom added component: untriaged priority: untriaged and removed IOTOSM-2465 labels May 21, 2021

adbridge removed the component: untriaged label May 26, 2021

edward-noe mentioned this issue Jun 3, 2021

Configure CAN RX Triggered Interrupt to Replace Current Polling Receive Behavior solarcaratuva/Rivanna2#18

Closed

0xc0170 closed this as completed Jul 13, 2021

PeterBowman mentioned this issue Aug 23, 2023

New Mbed/CAN-based JR3 driver roboticslab-uc3m/yarp-devices#263

Closed

STM32 CAN attach RxIrq unusable. #9495

STM32 CAN attach RxIrq unusable. #9495

Comments

TobinHall commented Jan 24, 2019 • edited by adbridge Loading

Description of defect

Target(s) affected by this defect ?

Toolchain(s) (name and version) displaying this defect ?

What version of Mbed-os are you using (tag or sha) ?

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

How is this defect reproduced ?

TobinHall commented Jan 24, 2019

cmonr commented Jan 30, 2019

TacoGrandeTX commented Mar 11, 2019

Flowm commented Mar 11, 2019

bikeNomad commented Dec 14, 2019

pilotak commented Dec 14, 2019

jeromecoutant commented Dec 16, 2019

pilotak commented Dec 16, 2019

jeromecoutant commented Dec 16, 2019

kjbracey commented Dec 16, 2019 • edited Loading

pilotak commented Dec 16, 2019

kjbracey commented Dec 16, 2019

Mixerito commented Jun 2, 2020

AGlass0fMilk commented Jun 19, 2020

Mixerito commented Jun 19, 2020

pilotak commented Jun 22, 2020

ciarmcom commented Oct 13, 2020

adbridge commented Oct 15, 2020

ciarmcom commented Oct 15, 2020

haarkon commented Nov 25, 2020 • edited Loading

jeromecoutant commented Apr 6, 2021

MubeenHCLite commented Apr 7, 2021

pilotak commented Apr 7, 2021

donatieng commented Apr 23, 2021

kjbracey commented Apr 23, 2021

MubeenHCLite commented May 10, 2021

kjbracey commented May 10, 2021

ned-pcs commented May 11, 2021

AGlass0fMilk commented May 12, 2021

haarkon commented May 12, 2021

kjbracey commented May 14, 2021

AGlass0fMilk commented May 14, 2021

pilotak commented May 14, 2021

MubeenHCLite commented May 20, 2021

MubeenHCLite commented May 21, 2021

MubeenHCLite commented Jul 13, 2021

TobinHall commented Jan 24, 2019 •

edited by adbridge

Loading

kjbracey commented Dec 16, 2019 •

edited

Loading

haarkon commented Nov 25, 2020 •

edited

Loading