Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a TXBnCTRL bit checker before sending the next CAN frame #54

Closed
wants to merge 2 commits into from
Closed

Added a TXBnCTRL bit checker before sending the next CAN frame #54

wants to merge 2 commits into from

Conversation

pepeRossRobotics
Copy link

This PR addresses the use of the library while transmitting UAVCAN frames at high speed (single and multi frame messages).

The issue was found when tryint to transmit IMU data at +50Hz and it was observed that many frames where propped on the transport layer. Investigation with candump showed that the error was in the message itself. by adding a small delay between frame transmission, they number of error layers dropeed significantly.

I did this the simplest way I could think of, so if there are any mods that are needed for more compatibility, or better ways to acheive this, please let me know and I am happy to make the changes.

Before the fix 1152 Transport Layer errors in less than a minute of up time
image

After applying the fix
image

@pepeRossRobotics
Copy link
Author

For context, this library is being used with an RP2040 using the Arduino RP2040 library https://github.com/earlephilhower/arduino-pico.

@aentinger
Copy link
Member

Hi 👋

Thank you for the PR, however the change does not make sense to me. With this changing you are waiting every time until all CAN transmit buffers are empty. This code checks if a transmit buffer is empty and only then proceeds to load the corresponding buffer. Possibly there are problems with frequency stability for your MCP2515 board (i.e. bad crystal), if the CAN timings are off things can go bad real fast. The other thing is, that maybe the status register contains not up-to-date data so maybe its better to check the individual TxBnCTRL instead of status in afore-linked code?

@pepeRossRobotics
Copy link
Author

Hello, thanks for coming back to this PR so fast :)

I am new to this stack and maybe that is why I took this approach.

I found that just adding a wait between consecutive CAN frames reduced the errors, and the longer the wait the more errors were reduced. Because the frames were being transmitted with errors (usinf candump to visualise this), it made me think that the actual frame data was being corrupted somehow. And even on low frequency messages, that were multi frame, this was happening. so decided to slow down the actual frame transmission.

As everything is called from a LockGuard, I just made a small time wasting routine to test this assumption.
The way that I added the prototype wait was:

.
.
.
  /* Write to transmit buffer */
  _io.loadTxBuffer(txb, tx_buffer.buf);

  /* Request transmission */
  _io.requestTx(txb)
  for(uint32_t i = 0;i<25000;i++)
  {
    __asm__("nop\n\t");
  }

Of course having a random wait in the code was far from ideal, I check what I could check for signs that the CAN frame was transmitted and the next can frame was ready to be loaded.

My logic was that based on the flowchart on page 17 of the MCP2515 datasheet (copied below) that checking which was the state of the TXREQ bit when errors happened. From checking this bit I found out that the errors where mostly happening when one or more TXREQ bits (TXBnCTRL[3]) were set. So decided not to go to the next frame unil the bit is not set and transport layer errors plummeted. As you can see from the image above, there are still errors, but my calculation is that there is only error on 0.03% of the transmission, which is good enough for my application. The fact that there are still transport layer errors might mean that the root of he issue has not been addressed or that there is a compounding issue.
image

Would love to hear your thoughts.

Best,
Pepe

@pepeRossRobotics
Copy link
Author

Just as extra information, the erros on the transport layer were present both with a COTS MCP2515 board, and a custom one. which might reduce the chances for it to be due to hardware issues, but would not rule them out.

@aentinger
Copy link
Member

Superseded by #55.

@aentinger aentinger closed this Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants