Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STM32 CAN attach RxIrq unusable. #9495

Closed
TobinHall opened this issue Jan 24, 2019 · 37 comments
Closed

STM32 CAN attach RxIrq unusable. #9495

TobinHall opened this issue Jan 24, 2019 · 37 comments

Comments

@TobinHall
Copy link

TobinHall commented Jan 24, 2019

Description of defect

In the CAN implementation for STM32 microcontrollers the CAN::read() function is responsible for clearing the flag that generates the Rx interrupt. This is because the flag is reset by a register that also controls the FIFO buffer. If the data is not pulled from the buffer before the flag is reset then the data will not be available.

The issue is that CAN::read() has uses a mutex which can't be used in an interrupt context.
Because you need to read the data before resetting the flag which is generating the interrupt and you cant read the data in an interrupt because of the mutex in the read function, it is impossible to use CAN with an interrupt.

As far as I can tell, this will affect all STM32 microcontrollers. It may also effect other platforms depending on their implementation.

Either the mutex needs to be removed or the data must be read and cached by a lower level ISR.

Target(s) affected by this defect ?

STM32

Toolchain(s) (name and version) displaying this defect ?

n/a

What version of Mbed-os are you using (tag or sha) ?

n/a

What version(s) of tools are you using. List all that apply (E.g. mbed-cli)

n/a

How is this defect reproduced ?

n/a

@TobinHall
Copy link
Author

#5267 is likely a related issue.

@cmonr
Copy link
Contributor

cmonr commented Jan 30, 2019

@ARMmbed/mbed-os-hal @ARMmbed/team-st-mcd

Flowm added a commit to Flowm/mbed-os that referenced this issue Mar 7, 2019
@TacoGrandeTX
Copy link
Contributor

@ARMmbed/team-st-mcd Could you please review the commit Flowm@c8a37ff?

@Flowm
Copy link

Flowm commented Mar 11, 2019

Unfortunately this commit is just a workaround, not a proper fix.

It simply disables the mutex in CAN::read() instead of fixing the underlying issue which was properly described by #9495 (comment).

Afterwards CAN communication works properly with can.attach(callback(), IrqType::RxIrq); and can.read() in the callback.

@bikeNomad
Copy link
Contributor

You could subclass mbed::CAN and override lock() and unlock() to do nothing.

@pilotak
Copy link
Contributor

pilotak commented Dec 14, 2019

@jeromecoutant
Copy link
Collaborator

Hi
I advice you to push some pull request in order to get a full review by HAL experts.
regards,

@pilotak
Copy link
Contributor

pilotak commented Dec 16, 2019

@jeromecoutant is mutex really needed in CAN driver?

@jeromecoutant
Copy link
Collaborator

@kjbracey-arm

@kjbracey
Copy link
Contributor

kjbracey commented Dec 16, 2019

I was recently discussing CAN and interrupts here: #6714

I suggest reading through that, as it covers pretty much everything about interrupts and locks.

I was not aware of this issue at the time, and I'm not quite sure how to read it. Is it the case that the example code over in that issue won't work on STM? (Signal event in IRQ handler, do read from thread woken by that event?)

Locally overriding CAN to make an UnlockedCAN that can do a read from interrupt is valid for an application to do, but an application should not have to do that - an application that wakes a thread to do the read should be valid.

The HAL CAN API does seem to be woefully underspecified, so this is debatable.

I'm not familiar with CAN, or typical hardware implementations, but it seems to be close analog to the UART. There we let the HAL be "thin" for typical UARTs - we don't require there to be buffering in the HAL, but because the HW buffers are small we in practice do need data to be read from interrupt.

This is then covered by the separate classes RawSerial, which lets you do an interrupt read, and UARTSerial, which does the interrupt read for the application, and provides buffering.

We don't have such a split for CAN - maybe it is a bit like my nemesis Serial - something that has to read from interrupt but can't. Hopefully there's enough hardware buffering that doing read from a woken thread is sufficient.

@pilotak
Copy link
Contributor

pilotak commented Dec 16, 2019

something that has to read from interrupt but can't

that's exactly where the problem is - with current implementation.

It would be nice if mbed would use buffering supported by HW (3 messages on STM platform) or maybe SW queue and split CAN as Serial - UnbufferedCAN and BufferedCAN

@kjbracey
Copy link
Contributor

Yes, it needs to do one or the other. If hardware normally has sufficient buffers, and you're saying STM does, then apps relying on buffering to do reads from threads should be fine.

The STM HAL should be fixed so that the example code here works, if it doesn't.

#6714 (comment)

@Mixerito
Copy link

Mixerito commented Jun 2, 2020

It is hard to believe, that this issue still exists and nobody offered a fix or some working solution except disablling and reenabling CAN IRQ flag.

@AGlass0fMilk
Copy link
Member

Wow, just ran into this issue myself on a NUCLEO_F413ZH

I am surprised the history of this is so long, detailed, and unsolved! Mbed certainly needs a refactor of its CAN HAL API... It's workable for now but there are many improvements that could be made.

It would be nice if mbed would use buffering supported by HW (3 messages on STM platform) or maybe SW queue and split CAN as Serial - UnbufferedCAN and BufferedCAN

From what I can tell Mbed does use the 3 message FIFO on STM -- the RX FIFOs are managed entirely by hardware. The STM can_read implementation does, in theory, support reading from either 3-message FIFO (FIFO0 or FIFO1) based on the given handle argument. However, it is not currently possible for the application to enable storing messages in FIFO1 because of the way can_filter is implemented -- every filter is configured to place matching messages into FIFO0. This would be easy to fix but I think STM is using handle incorrectly in their driver.

My understanding of the intent behind the handle parameter in the can_api.h is that it is a unique identifier to a certain filter the user has previously configured. STM is using inconsistently in can_read and can_filter. I can't say I blame them too much: It seems the can_api.h file does not even have comments explaining what each function/argument is supposed to do...

@kjbracey-arm @maclobdell I think it's about time the CAN API in Mbed gets some love. Maybe it could be worked onto the back end of the mysterious Mbed schedule? I would be willing to participate in defining an updated HAL spec.

@Mixerito
Copy link

I have also noticed more unfinished or badly implemented CAN features. For example it is impossible to attach interrupt on Error IRQs. (EwIrq, EpIrq, BeIrq), because of always disabled Global ERRIE flag. Moreover LEC interrupt is not even defined. And IRQ on TX Overrun is also not possible to catch really. STM CAN really needs to be reimplemented from scratch.

@pilotak
Copy link
Contributor

pilotak commented Jun 22, 2020

I'm happy to test new CAN things

@ARMmbed ARMmbed deleted a comment from ciarmcom Oct 5, 2020
@ciarmcom
Copy link
Member

@TobinHall thank you for raising this issue.Please take a look at the following comments:

Could you add some more detail to the description? A good description should be at least 25 words.
What target(s) are you using?
What toolchain(s) are you using?
What version of Mbed OS are you using (tag or sha)?
It would help if you could also specify the versions of any tools you are using?
How can we reproduce your issue?

NOTE: If there are fields which are not applicable then please just add 'n/a' or 'None'.This indicates to us that at least all the fields have been considered.
Please update the issue header with the missing information, the issue will not be mirroredto our internal defect tracking system or investigated until this has been fully resolved.

@adbridge
Copy link
Contributor

We've updated our automation, I will fix the requirements .

@ciarmcom
Copy link
Member

Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers.
Internal Jira reference: https://jira.arm.com/browse/IOTOSM-2465

@haarkon
Copy link

haarkon commented Nov 25, 2020

I believe there won't be any fix for this problem (as it's linked with STM32 hardware and Mbed "one for all" library) except a buffered/unbuffered solution.

As Mutex prevent using read method in ISR context, but the only way that STM32 FDCAN have chosen to lower the IRQ flag is a reading. So, using attach method brings you to a deadlock.

Thus, this deadlock cannot be solved as it’s may be impossible to remove the mutex from Mbed lib because some others CAN controllers may have the use of it, and STM32 won’t change their hardware (and I don’t wish to have a flag that can be cleared while message is still unread).

The best solution at that time (that will remove any portability of Mbed code) is to remove the useless mutex for STM32 (because mutex is just useless with FDCAN, as, you just write or read in memory, it’s thread safe).

I know it’s not “clean” but it’s the only solution that ensure a valid CAN use of IRQ for STM32 at least before Mbed create a buffered class.

@jeromecoutant
Copy link
Collaborator

The best solution at that time (that will remove any portability of Mbed code) is to remove the useless mutex

@donatieng please comment
Thx

@MubeenHCLite
Copy link
Contributor

@donatieng to add to @jeromecoutant I have tested CAN using RxIRQ after removing mutex from CAN::read(). It works fine.

@pilotak
Copy link
Contributor

pilotak commented Apr 7, 2021

I can confirm it works fine, i created a lib which remove mutex https://github.com/pilotak/NoMutexCAN

@donatieng
Copy link
Contributor

Hi folks,

Reading back on this thread - the only "proper" solutions I can see are:

  • Stick with the current API but have the application use a "bottom-half" pattern to call the read() function in a thread that get notified by the IRQ. I'd be keen to understand if this wouldn't work for timing reasons (FIFO gets overflowed if the bottom-half is not processed fast enough)
  • Alternatively we implement a similar architecture to the Serial API as suggested by @kjbracey-arm in STM32 CAN attach RxIrq unusable.  #9495 (comment)

Removing the mutex will of course work fine until you start using the instance concurrently. Not sure how much of a use case this is for CAN though, but definitely the current Mbed OS pattern for drivers.

@kjbracey
Copy link
Contributor

Reading back myself, I'm still not 100% clear. I'm potentially reading it differently to Donatien - we don't have to do an IRQ read for latency (which is UART's problem), but rather the STM HAL continuously asserts a level-based IRQ until a read? So we deadlock unless the app reads from the IRQ?

I would say that that behaviour is contrary to any other attach method implementation I can think of in the Mbed HAL. It's generally assumed that HAL interrupts act as "edge-based", simulated in HAL software if necessary. IRQs should be free to defer reads to threads. Interrupts should be delatched in the HAL before making the attach upcall. (Not after, as that's racy if the attachee is reading directly).

Can that not be done in the STM CAN implementation? Alternatively, mask the interrupt until the read happens?

@MubeenHCLite
Copy link
Contributor

@kjbracey-arm
The problem here is the receive interrupt flags (FMP0[1:0]) are read only and are only cleared on reading a mailbox.

The reference manual states that
FMP is increased each time the hardware stores a new message in to the FIFO. FMP is
decreased each time the software releases the output mailbox by setting the RFOM0 bit.

Hence delatching the interrupt is not possible until a read happens, not possible in HAL.

@kjbracey
Copy link
Contributor

You may not be able to suppress the interrupt source, but can it not be masked from the interrupt output, either in the FDCAN itself, or the NVIC? Mask it when asserted, and unmask after a read?

I'm looking at the STM32G4 reference manual, which doesn't seem to tally with your text above, but the FDCAN_IE register?

@ned-pcs
Copy link

ned-pcs commented May 11, 2021

Masking might be the best bet. Looking at the STM32F072 reference manual (RM0091) (which has the bxCAN controller), the CAN_IER interrupt enable register has an interrupt enable bit for each interrupt source, including FMP[1:0] which signal one or more RX message(s) pending.

Of course, the Mbed CAN API's documentation (and intent, I think) about the RX filters doesn't agree with the STM32 HAL implementation, in which the filters don't work, and one FIFO is unavailable.

@AGlass0fMilk
Copy link
Member

Sorry if I'm a bit naive, I've been following this thread for some time now though.

Why is it not possible to cache the received CAN messages in a FIFO and clear the flag in the interrupt?

@haarkon
Copy link

haarkon commented May 12, 2021

Because :

There is a mutex in the read

The read is the only way to clear the flag

The ISR context doesn't allow Mutex.

Remove the mutex, and problem is solved.

But you are not using the mbed lib anymore.

@kjbracey
Copy link
Contributor

But what AGlassOfMilk suggests would be possible inside the HAL implementation - it could unload messages from the hardware into a FIFO for later delivery via the read call.

Although what do you do when your FIFO's full? Maybe just throw stuff away, but not sure if that makes sense in the protocol. I'm unfamiliar with CAN.

(I've seen things doing that for TCP - which is Very Bad. Turning a reliable protocol with end-to-end flow control forcing the sender to wait until you read, into something that just chucks data away if it's not read fast enough.)

@AGlass0fMilk
Copy link
Member

CAN is typically used in a periodic fashion (eg: broadcast temperature reading every 1s).

Most protocols treat CAN as a lossy network and have retry/ACK logic if delivery must be guaranteed. There's also filters so you can route critical messages to a separate FIFO.

Missing a packet typically means you will just get the same information next time it is sent.

At some point there has to be a constraint on memory -- no machine has infinite RAM (though desktop applications sometimes seem like they assume so).

The network speeds are also relatively slow (1Mbps max in classic CAN) so your application should have plenty of time to empty the FIFO and process messages.

I think the FIFO approach would be appropriate in this case.

@pilotak
Copy link
Contributor

pilotak commented May 14, 2021

I actually think that it will be good for me/user of mbed to decide if you want to catch all the messages ie.: generate IRQ that has to be cleared (and cached by circular buffer for example) or if you don't mind some looseness. On the other side @AGlass0fMilk is right - if it's a critical message there is alwas some kind of ACK mechanism so i can agree with him that FIFO approach should the way to go.
1Mbps is really maximum of classic CAN you can find, most likely 500k of 250k is used. If you connect to a car or truck CAN bus you can see there are even messages with 20ms send rate. There is also an FDCAN growing up in popularity which can go 5Mbps or even higher.

@MubeenHCLite
Copy link
Contributor

Interrupt handling in STM HAL is not functional. Interrupt enabling and IRQ handling are done in STM CAN API itself.

I have tried masking interrupts for legacy/bxCAN in STM CAN API IRQhandler, unmasking in can_read and deferring reads to thread context. This works fine. I have tried this on NUCLEO_F207ZG and NUCLEO-F767ZI platforms. It works without any problems.

FDCAN does not have issues deferring reads to thread context, as masking and unmasking of interrupt is already been taken care in CAN API. I have verified this on NUCLEO-G474RE platform.

@MubeenHCLite
Copy link
Contributor

Raised a PR for the implementation where can_read can be deferred to thread context as mentioned in the comment just above
#14688

@MubeenHCLite
Copy link
Contributor

PR is now merged.
This is to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests