Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow deck and Loco deck in TWR mode makes Loco deck hang #368

Closed
tobbeanton opened this issue Sep 14, 2018 · 27 comments
Closed

Flow deck and Loco deck in TWR mode makes Loco deck hang #368

tobbeanton opened this issue Sep 14, 2018 · 27 comments
Milestone

Comments

@tobbeanton
Copy link
Member

If the Flow deck and the Loco deck are mounted together after a while ~1min the Loco deck hangs and the leds turn off.

Guesses of this behavior is collision on the SPI bus, interrupts not services fast enough, or stat machine bug. Note also that flow deck continue to work as it should so SPI bus is not blocked.

@miracatici
Copy link

We're currently use same setup. Do you confirm that you're using latest firmware (not released)? You can clone the repo and compile yourself.

@krichardsson
Copy link
Contributor

@miracatici This was probably on the codebase of Sep 14 (or there about).

@ataffanel
Copy link
Member

ataffanel commented Dec 6, 2018

I think I have verified that this is a timing problem (so a state machine bug) and not an SPI problem: I have had the problem when developing another deck that was only using the UART.

@dastoqc
Copy link

dastoqc commented Apr 15, 2019

using the latest firmware on all our units, we also experienced the issue with our Crazyflies 2.0+Flow deck 1.0. We need to reboot 2 to 4 times the Crazyflie in order to get the UWB deck not to crash. The issue is not present with our newer Crazyflies 2.1+ Flow deck 2.0.

Any solution?

@krichardsson
Copy link
Contributor

@dastoqc that is interesting. Are you running the same firmware on the CF 2.0 and 2.1?
Have you tried CF 2.0 + Flow 2.0 and CF2.1 + Flow 1.0?

There is a difference in how the IMU is read between CF 2.0 and 2.1 that could change the timing and support what @ataffanel found.

Unfortunately there is no solution available at the moment.

@dastoqc
Copy link

dastoqc commented Apr 17, 2019

Hi @krichardsson

We made some more tests today, not sure what is related to this issue:

  • we have a hard time writing to the anchors using the CF (neither positions, nor change mode), it works well only with CF2.1+Flow2 (or no flow deck)
  • all anchors and CF are running the latest firmware (pulled this morning), compiled from source
  • all CF are running on 2Mhz band and the UWB deck crashes happen in both TWR and TDoA2 modes
  • CF2.1+Flow1: still experience crashes and unable to write position to anchors
  • CF2.0+Flow2: seems to be all good...

@krichardsson
Copy link
Contributor

Thanks @dastoqc for the testing!
We'll digest it and see what we can do

@shushuai3
Copy link

By adding a task delay in uwbTask function, the timing problem can be solved such that flow deck and loco deck can be used together.

static void uwbTask(void* parameters)
{
...
while(1) {
vTaskDelay(5); // the line to be added
...

@dastoqc
Copy link

dastoqc commented Jun 26, 2019

thanks @shushuai3 it seems to fix the issue indeed! However, now the 'kalman.resetEstimation' always converge to a wrong position.... will look into it.

@dastoqc
Copy link

dastoqc commented Jun 26, 2019

looks like a delay of 3 works to solve the LPS+Flow issue and still have the kalman reset working!

@krichardsson
Copy link
Contributor

Thanks @dastoqc

@shushuai3
Copy link

Wow! The latest firmware solves this issue, the timing problem when using UWB and Flow decks at the same time. Tests are done with CF2.0+UWB+Flow2.0

@krichardsson
Copy link
Contributor

@shushuai3 Thanks for the report! I'm happy it works for you :-)

@knmcguire
Copy link
Contributor

I guess we need to test this out on a 2.1 and see if this has been completely fixed.

@youngbin-song
Copy link

youngbin-song commented Jul 6, 2020

I'm working on a Crazyflie 2.1 and it doesn't seem to be working for it. My current setup is CF2.1+Crazyradio+Flow2.0.

While waiting for the fix, where should I implement shushuai3's workaround for this issue?

@knmcguire
Copy link
Contributor

Hi @youngbin-song . This Github issue is about the combination of the flow deck and the LPS deck, not only the Flowdeck. Could you please ask your question on forum.bitcraze.io, with an output of the console tab of the cfclient and a more detailed description of your problem (which LEDS on the crazyflie, when it happens etc etc).

@youngbin-song
Copy link

Apologies for being unclear. Yes, my issue also includes the LPS deck as well.

@knmcguire
Copy link
Contributor

ahh oke yes then you are at the right place! Yes go for it, try out Shushuais fix, but it is good to know that apperently there is still thing to be done in this github issue. I will try to investigate this week a well.

@youngbin-song
Copy link

youngbin-song commented Jul 6, 2020

Sorry to ask but where should I integrate with Shushuai's fix, specifically which code should I add it into?

Edit: I've figured it out. Thanks!

@knmcguire
Copy link
Contributor

Great! the fix still seems to work?

@knmcguire
Copy link
Contributor

I also checked it out and it seems to work for me as well.

Although I'm not quite sure why this hasn't been implemented yet, I will probably still wait a bit. There are several issues to fix for LPS I think, so it is good to do it when the rest is back at the office as I haven't worked or developed for the LPS system myself.

@youngbin-song
Copy link

Sorry, I wanted to give my reply after I've tested and made sure that it works.

It seems to work fine for me. There are some occasions that it the flowdeck turns off after a crash/sequence but I assume that it's because of some connection issue.

@knmcguire
Copy link
Contributor

thanks for letting us know. I was able to contact my colleagues about this and the reason that this hasn't been implemented is that this fix is just a patch on the symptom of the real issue that needs to be fixed. So we need to investigate it further.

@shushuai3
Copy link

Just a clarification: the fix of adding a task delay (vTaskDelay(5)) in the uwbTask function will decrease the ranging frequency roughly from 400Hz to 120Hz. Luckily, the latest firmware has no issue on CF2.0+Flow2.0+LocoDeck such that the ranging frequency is not influenced. But I am not sure if the latest firmware works on CF2.1.

@knmcguire
Copy link
Contributor

Yeah indeed figured that it would slow it down. It does work on the CF2.1 but I will need to wait until the others are back to the office in August to go into this again, since it is believed that there is more to it.

I might also test out if the same happens with the zranger, to see if the motion sensor has anything to do with it or not.

@krichardsson
Copy link
Contributor

I have put some time into this issue and have found some interesting information:

First some basics about the implementation:
There is a semaphore protecting access to the SPI bus to make sure it is only used to communicate with one chip at a time. While one SPI transaction is ongoing, other tasks have to wait for the semaphore to be given before they can access the SPI bus (tasks are blocked).
The LPS deck is implemented using an interrupt to indicate that there is new data from the DWM1000 module. When the interrupt is triggered, the service routine gives a binary semaphore, which in turn releases the uwb task that will handle the interrupt. The task does all the SPI communication, the interrupt service routine simply gives the semaphore.

It seems as the flow deck hogs the SPI bus for a fair amount of time when reading data from the flow sensor. The current theory is that this blocks the uwb task from accessing the SPI buss (and thus takes longer to complete), and that during this time a second interrupt from the DWM1000 chip is triggered, which is not handled properly.
The LPS TWR algorithm relies heavily on a continuous flow of events from the DWM1000 chip, and since no transmission is scheduled (due to the missed interrupt) the process stops.

If this is actually the problem, the best solution is probably to use task notification with counting functionality instead of the binary semaphore.

krichardsson added a commit that referenced this issue Aug 18, 2020
There was a mixup of pin definitions which caused the wrong pin to be read. This in turn caused the interrupt handling task to exit the handling function, when the status register was updated during handing.
krichardsson added a commit that referenced this issue Aug 18, 2020
@krichardsson
Copy link
Contributor

krichardsson commented Aug 18, 2020

The previous comment was not too far off, but not correct.

There was already a mechanism for checking if new events arrived while the interrupt is handled, the problem was that it did not work.
The interrupt mechanism in the DWM1000 is based on a status register and a mask. A pin on the chip is set high whenever there is one or more bits in status register that is set and the corresponding bits are set in the mask. The pin is used to assert an interrupt in the STM CPU and start the interrupt handling routine. In the handling routine various events are detected and the appropriate status bits are cleared. At the end of the handling routine the interrupt pin is read and if it is still set the handler is run once more.
The problem was that the pin that was read at the end of the handler was the wrong pin, always set to 0. There was a mixup of pin definitions: there are definitions GPIO pins in the STM library, and a second definition in the deck pin API (deck_constants.h). A GPIO definition was used together with the deck pin API function which caused the wrong pin to be read.

We should possibly consider a check to make sure the correct definition is used to avoid similar problems in the future?

@krichardsson krichardsson added this to the next-release milestone Aug 18, 2020
krichardsson added a commit that referenced this issue Aug 19, 2020
…el 5 when interrupt handling works correctly
krichardsson added a commit that referenced this issue Aug 19, 2020
… with pin definitions for the CPU. This change will catch errors at compile time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants