Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solution for mutex problem in CAN read ISR #14688

Merged
merged 12 commits into from
Jul 12, 2021

Conversation

MubeenHCLite
Copy link
Contributor

@MubeenHCLite MubeenHCLite commented May 21, 2021

Summary of changes

Implementation for can_read to be deferred to thread context when rx interrupt is enabled.

As can_read is protected by mutex(mutex is not allowed in ISRs in Mbed) and read is only possible way to clear interrupts in bxCAN in STM controllers, reads are to be deferred to thread context.

In addition RawCAN (unlocked apis) is added, which can be used in case if CAN object is accessed by a single thread.

Impact of changes

Migration actions required

Documentation


Pull request type

[] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[x] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers


@ciarmcom ciarmcom added the release-type: patch Indentifies a PR as containing just a patch label May 21, 2021
@ciarmcom ciarmcom requested a review from a team May 21, 2021 11:00
@ciarmcom
Copy link
Member

@MubeenHCLite, thank you for your changes.
@ARMmbed/mbed-os-maintainers please review.

static uint32_t can_irq_ids[CAN_NUM] = {0};
static can_irq_handler irq_handler;
static uint32_t rx_irq_status = DISABLED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not need to be either an array [CAN_NUM] or a flag inside the can_t object? Probably the former, but I'm not quite sure how this is handling multiple instances.

@@ -646,8 +646,12 @@ void can_irq_set(can_t *obj, CanIrqType type, uint32_t enable)
#include <string.h>
#include <inttypes.h>

#define ENABLED true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than inventing your own values, just name the variable so booleans work:

static bool rx_irq_enabled = false;

@@ -998,6 +1003,10 @@ int can_read(can_t *obj, CAN_Message *msg, int handle)
can->RF1R |= CAN_RF1R_RFOM1;
}

if(rx_irq_status == ENABLED) {
__HAL_CAN_ENABLE_IT(&obj->CanHandle, CAN_IT_FMP0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we believe this is race clear? If new data has already arrived before this line, will this generate an interrupt?

Hopefully this is solid, but it's possible you may have to do

enter critical section
enable FMP0 interrupt
release FIFO
disable critical section

to make sure the thing is ready to generate interrupts as the FIFO is released. I don't know if it's worth doing that just in case.

@haarkon
Copy link

haarkon commented May 21, 2021

Disabling IRQ, your serious ?

You brings it to a thread that has a latency of at least 1ms and give it the mission to treat frames that can be sent at 1Mbit/s with between 45 et 129 bits per frame (so 45 to 129µs per frame) and a fifo of only 3 frames inside the chips... This will never works if bus is loaded at more than 10%.

That is not a good idea !

@0xc0170
Copy link
Contributor

0xc0170 commented May 21, 2021

@haarkon Counter proposal for this pull request?

@haarkon
Copy link

haarkon commented May 21, 2021

Not sure to be good enough to be able to create it, but the solution is :
Create a circular buffer feeded by the CAN RxISR in background and setting a flag or activating a callback (like it's done in SPI).
And also not sure to have time for it

@kjbracey
Copy link
Contributor

kjbracey commented May 24, 2021

You brings it to a thread that has a latency of at least 1ms

"At least"? It shouldn't be anywhere near that bad. If it's a high priority thread, can wake a thread with only tens of microseconds of latency. IRQ latency due to critical sections still dominates in practice (in Mbed OS), not the thread switching/waking time. People tend to disable IRQs globally for many tens of microseconds or worse.

But yes, you would want it to be a high priority thread. This is the technique used in various SPI and Ethernet drivers. For example, we've got a number of IEEE 802.15.4 radio drivers that have to be able to do a TX->RX turnaround within 192us reliably to catch an ack, and they do this from high-priority thread context without issue.

A normal priority thread might end up having to wait for round-robin scheduling, which is maybe the scenario you're thinking of for "1ms".

It may be that it still isn't possible to achieve performance from thread reliably, but you need to recheck the numbers on your estimates.

If that is the case, then this RX API becomes like Serial was - paradoxical in having to be used from interrupt to reliably get data, but not usable from interrupt due to the mutex. We dealt with that by splitting it into UnbufferedSerial for interrupt use and BufferedSerial with the built-in FIFO. BufferedSerial centralises the FIFO - neither targets nor apps need to worry about it, it's common code.

A BufferedCAN may make sense via the same logic.

@haarkon
Copy link

haarkon commented May 25, 2021

Ok maybe less than 1ms if I increase the thread priority, but:

  • Still there is no solution to put this thread in wait state (if IRQ is disabled, you need to poll to see if something is readable, so this thread must be sent in wait state with a sleep_for unless you find a way to activate a eventFlag, a semaphore or anything else witch require an IRQ). So this thread will starve all the others or will have to sleep for 1ms (minimum quantum for sleep_for), maybe there is another way but it doesn't look obvious to me.

  • CAN is not SPI or I2C it's more "fire and forget". While in SPI or I2C you need to wait for the whole transmission, in CAN, sending is just writing to memory. The CAN controler automaticaly perform the transmition without any help from processor. Thus there is no wait in CAN bus. To transmit, you write in memory, to receive, you read from memory.
    Using a Hi-priority thread brings you to the same prerequisite (create a thread with hi-priority or it won't work) than using an IRQ with no Mutex: "You have to take care". Read method should be a "only in ISR context" (like this no need for a MUTEX!).

  • Mbed is becoming a software HAL for computer programmers, not for embeded electronic developer. They remove ISR access to ensure that dummy are able to create programs regardless of the goal of time controled systems (tell me for example how to create a Z-transform filter with Mbed). Now ISR must activate semaphores to allow a thread to perform an operation, etc. adding uncontrolable latency to uncontrolable latency. This might be something normal when the OS treat data in background, but it's inefficient for realtime low power systems (witch are the world for embeded electronics). Removing ISR access, adding MUTEX is something made by people that doesn't understand embeded electronics. Say a different way: if your too stupid to unsecure your application, then go play with Scratch on Arduino or RASPBIAN, Mbed is not for general purpose, Risk has to be understood and avoided by design, not avoided as a prerequisite.

For me all MUTEX should be removed and user have to add them or not. End of the story!

@kjbracey
Copy link
Contributor

Your IRQ handler should indeed be waking your thread from a wait state using a semaphore or event flag. In this patch I believe the IRQ is only disabled between IRQ signalling and the thread doing the read.

I agree that needing to make a high-priority "data-pump" thread is awkward, and apps shouldn't have to worry about that level of complexity. Just as they shouldn't have to worry about needing interrupt handlers at all - blocking until data arrives is easier. Which is why the BufferedSerial is the way it is - providing FIFO to permit reads from normal-priority threads. It seems like CAN is more analogous to UART than SPI or I2C.

You've always been able to attempt direct ISR HAL access if you want by simply overriding the mutex - all(?) of the HAL objects have their lock and unlock methods virtual to let you make a derived RawCAN where these are no-ops for direct IRQ use. If you believe you have a safe use case. We don't usually provide them by default - RawSerial is an exception, and that's because Serial RX requires interrupt response latency.

Radio drivers wanting to do SPI direct from IRQ used to have an UnlockedSPI to do that, but now do it from a high-priority thread just to be nicer to the rest of the system - not lose UART traffic.

The "mutex by default" on the objects is a first line of defense against multithreaded programs performing re-entrant access on the single-threaded HAL. For some people it's too much, and in some cases it's not enough.

If there are cases where an API is only viable from IRQ, but still has the mutex fitted, then that is indeed paradoxical.

It sounds like you would rather have RawCAN and do everything from IRQ - the originator of this patch probably wants BufferedCAN so he can just do reads from a thread. That split may be the best thing to do, as for serial. At least providing the simple RawCAN would be straightforward.

@haarkon
Copy link

haarkon commented May 25, 2021

Your IRQ handler should indeed be waking your thread from a wait state using a semaphore or event flag. In this patch I believe the IRQ is only disabled between IRQ signalling and the thread doing the read.

Yes, it could be a good idea to activate a flag in the ISR (to bring a thread from Wait to Ready) and disabling the IRQ then to be able to exit the ISR... But it's not what has been proposed here. Just to disable the IRQ... I haven't seen any flag raised, neither the creation of an ISR that does all the things you talk about, that are to me, a good point of view.

But, if your proposal may work, it's completly crazy to do such things when the problem is comming from the almost useless mutex... So yes a RawCAN / BufferedCAN should be better than this solution whitch is swipping dust under the carpet (IRQ causes problem? Then action is: remove the IRQ)...

Yes, I'm for the removal of most of the Mutex, I beleive that if you decide to play with Mbed you need to deal with your own Mutex (and add them or not, depending of your code : do you share this ressource? Does it interfere with an ISR? etc.), not to deal with mutex that has been widely put everywhere, even if not needed, just "in case of".

@jeromecoutant
Copy link
Collaborator

@ARMmbed/team-st-mcd

@kjbracey
Copy link
Contributor

But it's not what has been proposed here. Just to disable the IRQ... I haven't seen any flag raised, neither the creation of an ISR that does all the things you talk about, that are to me, a good point of view.

Yes, this is just a change to the HAL, temporarily disabling the IRQ before calling the user IRQ handler. The originator of the patch is creating it to make their own thread-waking IRQ handler work, I believe.

I think you've been somewhat misinterpreting the effect of the patch. This shouldn't stop anyone's existing IRQ handlers from working, I hope. (People already using unlocked objects or driving the HAL API otherwise...)

Yes, I'm for the removal of most of the Mutex, I beleive that if you decide to play with Mbed you need to deal with your own Mutex (and add them or not, depending of your code : do you share this ressource? Does it interfere with an ISR? etc.), not to deal with mutex that has been widely put everywhere, even if not needed, just "in case of".

I'm somewhat sympathetic to this view. The HAL object mutexes were strapped in very early in Mbed OS 5, and there's not been a lot of follow-up or review. However things like "do you share this resource" requires system-level knowledge - when assembling libraries and subsystem drivers, they won't know that.

The existing system where most things get the default mutex and no-one has to worry about conflicts, but 1 time-critical driver has a dedicated bus and unlocked HAL object has worked out well in practice.

But API consistency and backwards compatibility is a thing, so just stripping mutexes from 1 class in isolation isn't really an option. Whereas adding RawCAN along the existing scheme fits in fine.

@ciarmcom
Copy link
Member

ciarmcom commented Jun 2, 2021

This pull request has automatically been marked as stale because it has had no recent activity. @MubeenHCLite, please carry out any necessary work to get the changes merged. Thank you for your contributions.

@ciarmcom ciarmcom added the stale Stale Pull Request label Jun 2, 2021
@ciarmcom ciarmcom added stale Stale Pull Request and removed stale Stale Pull Request labels Jun 4, 2021
@ciarmcom ciarmcom added stale Stale Pull Request and removed stale Stale Pull Request labels Jun 11, 2021
@0xc0170
Copy link
Contributor

0xc0170 commented Jun 14, 2021

Besides there is fundamental issue with CAN (tracked in the issue tracker).

@MubeenHCLite Can you respond t o reviewers comment (from @kjbracey-arm to the changes in this pull request) ?

@MubeenHCLite
Copy link
Contributor Author

The implementation of suggested changes are in progress, will be pushed soon

@ciarmcom ciarmcom added stale Stale Pull Request and removed stale Stale Pull Request labels Jun 18, 2021
0xc0170
0xc0170 previously approved these changes Jul 8, 2021
@mergify mergify bot added needs: CI and removed needs: review labels Jul 8, 2021
@0xc0170
Copy link
Contributor

0xc0170 commented Jul 8, 2021

CI started

@mergify mergify bot added needs: work and removed needs: CI labels Jul 8, 2021
@mbed-ci
Copy link

mbed-ci commented Jul 8, 2021

Jenkins CI Test : ❌ FAILED

Build Number: 1 | 🔒 Jenkins CI Job | 🌐 Logs & Artifacts

CLICK for Detailed Summary

jobs Status
jenkins-ci/mbed-os-ci_unittests ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-GCC_ARM
jenkins-ci/mbed-os-ci_cmake-example-ARM
jenkins-ci/mbed-os-ci_build-example-ARM
jenkins-ci/mbed-os-ci_build-example-GCC_ARM

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 8, 2021

There are errors in building related to RawCAN

In file included from ../../../../mbed-os/mbed.h:71:
../../../../mbed-os/drivers/./include/drivers/RawCAN.h:32:22: error: base 'CAN' is marked 'final'
class RawCAN: public CAN {
                     ^
../../../../mbed-os/drivers/./include/drivers/CAN.h:39:7: note: 'CAN' declared here
class CAN
      ^
In file included from ../../../../mbed-os/platform/FEATURE_EXPERIMENTAL_API/FEATURE_PSA/TARGET_MBED_PSA_SRV/src/default_random_seed.cpp:1:
In file included from ../../../../mbed-os/mbed.h:71:
../../../../mbed-os/drivers/./include/drivers/RawCAN.h:81:11: error: using declaration refers into 'CAN::', which is not a base class of 'RawCAN'
    using CAN::CAN;
          ^~~~~
../../../../mbed-os/drivers/./include/drivers/RawCAN.h:85:17: error: only virtual member functions can be marked 'override'
    void lock() override {};
                ^~~~~~~~~
../../../../mbed-os/drivers/./include/drivers/RawCAN.h:86:19: error: only virtual member functions can be marked 'override'
    void unlock() override {};

@mergify mergify bot dismissed 0xc0170’s stale review July 9, 2021 06:06

Pull request has been modified.

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 9, 2021

CI restarted

@mbed-ci
Copy link

mbed-ci commented Jul 9, 2021

Jenkins CI Test : ✔️ SUCCESS

Build Number: 2 | 🔒 Jenkins CI Job | 🌐 Logs & Artifacts

CLICK for Detailed Summary

jobs Status
jenkins-ci/mbed-os-ci_unittests ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_greentea-test ✔️

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 9, 2021

As @kjbracey-arm also approved previously, only one remaining action to take: documentation update for this new RawCAN: https://github.com/ARMmbed/mbed-os-5-docs/tree/development/docs/api/drivers - new .md file should be created. @MubeenHCLite Can you please

@MubeenHCLite
Copy link
Contributor Author

MubeenHCLite commented Jul 9, 2021

@0xc0170

documentation update for this new RawCAN: https://github.com/ARMmbed/mbed-os-5-docs/tree/development/docs/api/drivers - new .md file should be created. @MubeenHCLite Can you please

The documentation update mentioned here is in a different repository. Cannot be updated as a part of this PR
I will post a request for with a document in that repo.

This can be merged.

@0xc0170
Copy link
Contributor

0xc0170 commented Jul 15, 2021

I've just noticed the merge commit in the history, please rebase the next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants