Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the efficiency of the ROS side read_message #89

Closed
P3TE opened this issue Aug 16, 2021 · 4 comments
Closed

Improving the efficiency of the ROS side read_message #89

P3TE opened this issue Aug 16, 2021 · 4 comments
Assignees

Comments

@P3TE
Copy link
Contributor

P3TE commented Aug 16, 2021

Is your feature request related to a problem? Please describe.

As part of the changes in Unity-Technologies/ROS-TCP-Connector#152 I've been testing scenarios where the queue fills and messages are dropped.
So I've tested the code on two machines with Unity running two 720p stereo pairs as fast as it can (This is typical of one of the robots I'm aiming to simulate and If I could get it efficient enough, I am aiming for possibly multiple robots simultaneously). This would roughly equate to a 1440p resolution every frame (Currently 30-40fps). The data is being published to the ROS network over loopback (localhost / 127.0.0.1).
PC1 can handle full frame rate, and PC2 drops messages as the queue fills up.

  • PC1: i7 9700k, 32GB 2133Mhz ddr4 rtx2080.
  • PC2: i7 3770k, 32GB 1600Mhz ddr3 gtx1080.

I added some timing logs to the output of the ROS-TCP-Endpoint for more information and it revealed that on the older less powerful machine, it was taking too long to read the data which surprised me this is running on loopback.

I was thinking about ways to mitigate this issue and was considering:

  1. Setting it up to have a thread for each publisher both on the send (Unity) and receive (ROS python) side. The downside to this is that the struggling machine is already running out of available processing headroom. Also implementing this would require a fair amount of code changes.
  2. Seeing if I can tweak the reading thread on the ROS-TCP-Endpoint to be more efficient. The downside to this is that looking at the code, it already looks pretty efficient. There's an existing note on line 101 of client.py that reads "# Only grabs max of 1024 bytes TODO: change to TCPServer's buffer_size". So maybe implementing this would fix my issues.

Anyhow, I'm considering implementing the second option as it is by far the easier to implement, but I thought I'd ask about the first option. What are the thoughts on the dev team about having a separate thread for each publisher? Have you considered implementing this? Also if you had any other suggestions, I'm all ears!

@mrpropellers
Copy link
Contributor

Tagging @peifeng-unity and @LaurieCheers-unity on this one since I think they've both looked into this before. Getting more efficient with image read/writes are definitely high-value improvements right now, but I'm not sure what's already been tried and what the low-hanging fruit might be.

@P3TE
Copy link
Contributor Author

P3TE commented Aug 19, 2021

I can say on the Unity side I've tried:

  1. Async Readback - This one really improves performance but introduces frame delay, I think it's worth it in most cases.
  2. Render Texture Packing: Setup all the cameras to render into different parts of the same render texture so that you only need 1 readback.
  3. Reusing byte buffers by pooling messages - good for reducing GC calls
  4. Even had a crack at writing an XR plugin to allow Single Pass Stereo renders. Run into a lot of issues relating to the XR Render texture being different and incompatible with normal a Render Textures and so far have been unable to copy the data across from GPU to CPU... It was taking too much time so it's been left it for now and may come back to it in the future.

On the ROS connection side, it seems in some cases I'm generating too much data in Unity for the connection to handle (On the low powered machine anyway), but yeah, I'm always keen to hear about your experience and anything you've found to improve performance with image simulation!

@LaurieCheers-unity
Copy link
Contributor

Yeah, we have seen similar performance limitations. Improving this further hasn't been our priority so far (we're focussing on feature work right now, since we're already getting speeds up to 100x faster than Ros Sharp for large messages) but we'd be very interested in whatever you find; I'd definitely start by just seeing what happens if you change that 1024 constant in the Ros-Tcp-Endpoint, it seems like an obvious starting point to see how much the buffer size matters.

One other idea to try: to my surprise, some users have reported speedups when they simply used more than one TCP connection talking to more than one Endpoint. (Just place two RosConnections in your scene, and send different topics via different connections. Presumably you'd get similar results if you sent images to alternate connections). It's not obvious what bottleneck that technique is overcoming, but I'd love to know. Possibly it's doing exactly what you wanted to try with the multi-threaded publisher.

From your description it's not 100% clear what your setup is, but I think you're saying you have one native Ubuntu box (well, two separate boxes for two separate tests) that's running both Unity and Ros-TCP-Endpoint connected to each other, and no other ROS stuff going on?
And... I see, you really do mean you're sending two pairs of 720p images, so four 720p images, which is why they're the size of one 1440p image. Ok. Presumably it's 32 bits per pixel, and uncompressed?

@P3TE
Copy link
Contributor Author

P3TE commented Aug 25, 2021

So some good news, I finally got around to trying this and there was a quick and easy way to improve performance substantially.
Here's my testing:
Test machine: i7 9700k, 32Gb ddr4 2133MHz ram.
So before the change, it took between 3.5ms and 4.9ms to read a message of length 2764858 byes (24-bit 720p uncompressed image message)
After the change, it took between 0.5ms and 1.1ms.
Nice.
So I've made a pull request: #90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants