Improving the efficiency of the ROS side read_message #89

P3TE · 2021-08-16T07:04:28Z

Is your feature request related to a problem? Please describe.

As part of the changes in Unity-Technologies/ROS-TCP-Connector#152 I've been testing scenarios where the queue fills and messages are dropped.
So I've tested the code on two machines with Unity running two 720p stereo pairs as fast as it can (This is typical of one of the robots I'm aiming to simulate and If I could get it efficient enough, I am aiming for possibly multiple robots simultaneously). This would roughly equate to a 1440p resolution every frame (Currently 30-40fps). The data is being published to the ROS network over loopback (localhost / 127.0.0.1).
PC1 can handle full frame rate, and PC2 drops messages as the queue fills up.

PC1: i7 9700k, 32GB 2133Mhz ddr4 rtx2080.
PC2: i7 3770k, 32GB 1600Mhz ddr3 gtx1080.

I added some timing logs to the output of the ROS-TCP-Endpoint for more information and it revealed that on the older less powerful machine, it was taking too long to read the data which surprised me this is running on loopback.

I was thinking about ways to mitigate this issue and was considering:

Setting it up to have a thread for each publisher both on the send (Unity) and receive (ROS python) side. The downside to this is that the struggling machine is already running out of available processing headroom. Also implementing this would require a fair amount of code changes.
Seeing if I can tweak the reading thread on the ROS-TCP-Endpoint to be more efficient. The downside to this is that looking at the code, it already looks pretty efficient. There's an existing note on line 101 of client.py that reads "# Only grabs max of 1024 bytes TODO: change to TCPServer's buffer_size". So maybe implementing this would fix my issues.

Anyhow, I'm considering implementing the second option as it is by far the easier to implement, but I thought I'd ask about the first option. What are the thoughts on the dev team about having a separate thread for each publisher? Have you considered implementing this? Also if you had any other suggestions, I'm all ears!

mrpropellers · 2021-08-19T00:10:14Z

Tagging @peifeng-unity and @LaurieCheers-unity on this one since I think they've both looked into this before. Getting more efficient with image read/writes are definitely high-value improvements right now, but I'm not sure what's already been tried and what the low-hanging fruit might be.

P3TE · 2021-08-19T00:50:59Z

I can say on the Unity side I've tried:

Async Readback - This one really improves performance but introduces frame delay, I think it's worth it in most cases.
Render Texture Packing: Setup all the cameras to render into different parts of the same render texture so that you only need 1 readback.
Reusing byte buffers by pooling messages - good for reducing GC calls
Even had a crack at writing an XR plugin to allow Single Pass Stereo renders. Run into a lot of issues relating to the XR Render texture being different and incompatible with normal a Render Textures and so far have been unable to copy the data across from GPU to CPU... It was taking too much time so it's been left it for now and may come back to it in the future.

On the ROS connection side, it seems in some cases I'm generating too much data in Unity for the connection to handle (On the low powered machine anyway), but yeah, I'm always keen to hear about your experience and anything you've found to improve performance with image simulation!

LaurieCheers-unity · 2021-08-19T18:35:11Z

Yeah, we have seen similar performance limitations. Improving this further hasn't been our priority so far (we're focussing on feature work right now, since we're already getting speeds up to 100x faster than Ros Sharp for large messages) but we'd be very interested in whatever you find; I'd definitely start by just seeing what happens if you change that 1024 constant in the Ros-Tcp-Endpoint, it seems like an obvious starting point to see how much the buffer size matters.

One other idea to try: to my surprise, some users have reported speedups when they simply used more than one TCP connection talking to more than one Endpoint. (Just place two RosConnections in your scene, and send different topics via different connections. Presumably you'd get similar results if you sent images to alternate connections). It's not obvious what bottleneck that technique is overcoming, but I'd love to know. Possibly it's doing exactly what you wanted to try with the multi-threaded publisher.

From your description it's not 100% clear what your setup is, but I think you're saying you have one native Ubuntu box (well, two separate boxes for two separate tests) that's running both Unity and Ros-TCP-Endpoint connected to each other, and no other ROS stuff going on?
And... I see, you really do mean you're sending two pairs of 720p images, so four 720p images, which is why they're the size of one 1440p image. Ok. Presumably it's 32 bits per pixel, and uncompressed?

P3TE · 2021-08-25T05:45:48Z

So some good news, I finally got around to trying this and there was a quick and easy way to improve performance substantially.
Here's my testing:
Test machine: i7 9700k, 32Gb ddr4 2133MHz ram.
So before the change, it took between 3.5ms and 4.9ms to read a message of length 2764858 byes (24-bit 720p uncompressed image message)
After the change, it took between 0.5ms and 1.1ms.
Nice.
So I've made a pull request: #90

mrpropellers assigned mrpropellers and LaurieCheers-unity and unassigned LaurieCheers-unity Aug 19, 2021

P3TE mentioned this issue Aug 25, 2021

Improving the performance of the read_message by not splitting the da… #90

Merged

7 tasks

mrpropellers closed this as completed Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the efficiency of the ROS side read_message #89

Improving the efficiency of the ROS side read_message #89

P3TE commented Aug 16, 2021 •

edited

Loading

mrpropellers commented Aug 19, 2021

P3TE commented Aug 19, 2021

LaurieCheers-unity commented Aug 19, 2021

P3TE commented Aug 25, 2021

Improving the efficiency of the ROS side read_message #89

Improving the efficiency of the ROS side read_message #89

Comments

P3TE commented Aug 16, 2021 • edited Loading

mrpropellers commented Aug 19, 2021

P3TE commented Aug 19, 2021

LaurieCheers-unity commented Aug 19, 2021

P3TE commented Aug 25, 2021

P3TE commented Aug 16, 2021 •

edited

Loading