Message sizes greater than around 262 kB drop out and don't get received #3053

calvertdw · 2022-10-28T20:59:19Z

Hello there, we are having trouble with large message sizes. We've tried increasing the socket buffer sizes but it doesn't seem to have any effect.

pqos.transport().send_socket_buffer_size
pqos.transport().listen_socket_buffer_size

wqos.reliability().kind = BEST_EFFORT_RELIABILITY_QOS;

Expected behavior

Sending messages that are large get over the network. For instance, 4k video messages of 1 MB, or colored point cloud data from a Realsense L515 that are 4 MB.

Current behavior

Messages larger than ~262 kB maybe send for a second or two, but then stop.

Steps to reproduce

Modify the HelloWorldPublisher and HelloWorldSubscriber, adding a RawCharMessage.idl.

struct RawCharMessage
{
	sequence<char, 10000000> data;
};

Double the message data size every 1 second.

// epoch time in seconds
static std::time_t lastDataSizeIncreaseTime = -1;
static int dataSize = 1;

bool HelloWorldPublisher::publish(
        bool waitForListener)
{
    if (listener_.firstConnected_ || !waitForListener || listener_.matched_ > 0)
    {
        data.data().clear();
        for (int i = 0; i < dataSize; i++) {
            data.data().push_back('a');
        }
        if ((std::time(0) - lastDataSizeIncreaseTime) >= 1) {
            dataSize = dataSize * 2;
            lastDataSizeIncreaseTime = std::time(0);
        }
        writer_->write(&data);
        return true;
    }
    return false;
}

Fast DDS version/commit

master

Platform/Architecture

Other. Please specify in Additional context section.

Transport layer

UDPv4

Additional context

Arch Linux and Fedora, up-to-date.

The text was updated successfully, but these errors were encountered:

calvertdw · 2022-10-28T21:26:06Z

We changed from sending over a large company network to a direct wired connection between our machines and we were able to send up to 16 MB messages.

contr4l · 2022-11-02T02:01:39Z

We changed from sending over a large company network to a direct wired connection between our machines and we were able to send up to 16 MB messages.

Hi, I got a similar issue, IDL is

struct HelloWorld
{
	unsigned long index;
	sequence<char> message;
};

publish code is

bool HelloWorldPublisher::publish(
        bool waitForListener)
{   
    std::vector<char> msg;
    for (int i=0; i<500*1024; i++)
        msg.push_back('A');
    if (listener_.firstConnected_ || !waitForListener || listener_.matched_ > 0)
    {
        hello_.index(hello_.index() + 1);
        hello_.message(msg);
        writer_->write(&hello_);
        return true;
    }
    return false;
}

When the size is large such as bigger than 50KB, subscriber cannot receive any data,
But when the size reduce to 50Byte or 100Byte, it works normally.

If I change the sequence to char message[500*1024] or sequence<char, 500*1024>, it works fine as well.

So, I'm not sure it's a problem when using unfixed length vector in IDL.

calvertdw · 2022-11-02T20:19:17Z

Does anyone know what kind of thing comes into play here? It seems that reliable mode doesn't work to mitigate this issue. It seems the message transmission quality has a steep dropoff, where trying to resend is only making things worse. On what kind of networks have you guys seen that reliable mode is effective. Is it just on long time scales and 99.99% reliable networks?

ds58 · 2022-11-03T18:08:00Z

We changed from sending over a large company network to a direct wired connection between our machines and we were able to send up to 16 MB messages.

Hi, I got a similar issue, IDL is
struct HelloWorld
{
	unsigned long index;
	sequence<char> message;
};
publish code is
bool HelloWorldPublisher::publish(
        bool waitForListener)
{   
    std::vector<char> msg;
    for (int i=0; i<500*1024; i++)
        msg.push_back('A');
    if (listener_.firstConnected_ || !waitForListener || listener_.matched_ > 0)
    {
        hello_.index(hello_.index() + 1);
        hello_.message(msg);
        writer_->write(&hello_);
        return true;
    }
    return false;
}
When the size is large such as bigger than 50KB, subscriber cannot receive any data, But when the size reduce to 50Byte or 100Byte, it works normally.

If I change the sequence to char message[5001024] or sequence<char, 5001024>, it works fine as well.

So, I'm not sure it's a problem when using unfixed length vector in IDL.

I was able to replicate this exactly. The subscriber doesn't receive the data unless you specify the sequence size in the IDL. I wouldn't expect that to be normal behavior, but maybe it is?

EDIT: after further experimentation, the limit for this specific example (with an unfixed length char vector) seems to be a length of 100. A message with 101 chars in the char vector will not be received by the subscriber.

ds58 · 2022-11-03T19:19:41Z

Looks like by default, Fast-DDS-Gen imposes this limit.

See on the left, I've set my IDL to:

struct RawCharMessage
{
	unsigned long index;
	sequence<char, 1000> data;
};

On the right, I've left it as unbounded:

struct RawCharMessage
{
	unsigned long index;
	sequence<char> data;
};

JLBuenoLopez · 2022-11-04T08:05:55Z

There are a two different problems being reported on this ticket. On the one hand the one reported by the one who opened it (@calvertdw) which relates with sending large data message over lossy networks. In this case it is necessary to understand that several layers are involved with this issue.

Fast DDS library splits the large data message to fit the UDP datagram which size is of ~65kB.
The transport layer, IP protocol, fragments the UDP datagram at the same time depending on the network MTU. Usually the MTU is 1500, and consequently, each UDP datagram is split into ~40 IP fragments.

In order to receive any sample, you must receive every one of the UDP datagrams. If BEST_EFFORT is used, the UDP datagrams are sent once and only once. If the communication is RELIABLE the UDP datagrams are going to be resent (unless they are overwritten in the DataWriter's History which depends on how it has been configured with the HistoryQosPolicy). Depending on the publication rate and the network bandwidth this can also cause an overflowing of data that worsens the situation instead of improving it.

@calvertdw, you may try to increase the MTU if your network hardware allows it or limiting the maxMessageSize under the MTU to prevent IP fragmentation, leaving all fragmentation to Fast DDS.

The second issue reported here is a very common one (#2903, #2740, #2330...). By default, Fast DDS is currently configured with PREALLOCATED_MEMORY_MODE MemoryManagementPolicy. This means that if your data model is unbounded, Fast DDS will preallocate some memory for your data samples, but if your samples are larger, no more memory will be allocated. @contr4l and @ds58, you should change the MemoryManagementPolicy to one that allows for reallocations at run time. You have more information in the previous link.

EduPonz · 2022-11-04T08:15:24Z

2. The transport layer, IP protocol, fragments the UDP datagram at the same time depending on the network MTU. Usually the MTU is 1500, and consequently, each UDP datagram is split into ~40 IP fragments.

To further elaborate on this, if your network experiences IP frame drops at a rate of 1 every 40, no UDP datagram can be reconstructed upon reception and consequently no UPD datagrams are ever handed over to Fast DDS. There is nothing that Fast DDS can do on this situation, since it is a problem of the reliability on lower layers, which can be caused by a myriad of reasons. However, as @JLBuenoLopez-eProsima points out, setting the maxMessageSize to something smaller than the MTU, you make your UPD datagrams fit into IP frames, and so Fast DDS will receive, in the scenario I proposed, 39 out of every 40 of those UDP datagrams containing very small RTPS data fragments and as a consequence there would only be resends for that 1 missing frame every 40.

calvertdw · 2022-11-04T15:11:15Z

Thank you guys, that's extremely helpful.

you may try to increase the MTU if your network hardware allows it or limiting the maxMessageSize under the MTU to prevent IP fragmentation, leaving all fragmentation to Fast DDS.

in the scenario I proposed, 39 out of every 40 of those UDP datagrams containing very small RTPS data fragments and as a consequence there would only be resends for that 1 missing frame every 40.

These are both critical information to know when using Fast-DDS. Could we add that to the documentation for large data rates?
https://fast-dds.docs.eprosima.com/en/latest/fastdds/use_cases/large_data/large_data.html

problem of the reliability on lower layers, which can be caused by a myriad of reasons.

Is there some database or discussion of these possible reasons somewhere? Anywhere on the internet, it doesn't have to be documentation for Fast-DDS. It'd be nice to have at least some kind of list of common reasons to reference.

you may try to increase the MTU

From https://en.wikipedia.org/wiki/Maximum_transmission_unit:

Larger MTU is associated with reduced overhead. Smaller MTU values can reduce network delay.

It seems as though changing the MTU doesn't address the issue of reliability, so I think we would opt for merely reducing maxMessageSize to 1500.

qpc001 · 2023-03-12T09:49:23Z

Thr origin question is never fix. Even doing this :

setting "maxMessageSize"
Increasing socket buffers size

ALL this can't help sending and subscribing large data like 20000 * 20000 uint_8. The data is lost in network or somewhere...

calvertdw · 2023-03-15T17:16:30Z

@qpc001 That is correct. We solved our problem only by reducing our message sizes. For us, this meant to stop sending point clouds and switch to sending JPEG or PNG compressed depth images.

I proposed some solutions above that should be addressed.

Mario-DL · 2023-06-20T07:04:23Z

According to our CONTRIBUTING.md guidelines, I am closing this issue for now. Please, feel free to reopen it if necessary.

calvertdw · 2023-07-10T22:44:35Z

I found some relevant advice now in the documentation. If we run into this again, we'll try these steps and reopen if it doesn't work. Thanks!
https://fast-dds.docs.eprosima.com/en/latest/fastdds/use_cases/large_data/large_data.html#example-sending-a-large-file

calvertdw · 2023-09-28T18:56:00Z

This issue should really stay open. We still don't have a working solution and can't reliably send messages over ~262 kB.

JLBuenoLopez · 2023-09-29T05:24:49Z

Hi @calvertdw

I can reopen it and move it to the Support discussion forum. As already explained, this is not a proper issue or bug in the library. Al least for the moment. Our CI checks that larger messages than 262 kB are sent so this can be caused by several other factors different from Fast DDS library, for instance, the network architecture, mis-configuration of QoS, mis-expectations...

You might consider changing to a different transport like TCP and/or modify the discovery mechanism as DDS relies on multicast and this is troublesome in Wifi connections. Discovery Server mechanism might be the way to go.

Finally, eProsima offers architecture studies for users that are struggling making Fast DDS work with their specific use case. You might consider contacting eProsima's commercial support team for more information.

calvertdw added the triage Issue pending classification label Oct 28, 2022

MiguelCompany removed the triage Issue pending classification label Jan 31, 2023

sergmister mentioned this issue Mar 15, 2023

Can't use shared memory transport with initialPeersList or discovery server. ros2/rmw_fastrtps#676

Open

Mario-DL closed this as completed Jun 20, 2023

eProsima locked and limited conversation to collaborators Sep 29, 2023

JLBuenoLopez converted this issue into discussion #3892 Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Message sizes greater than around 262 kB drop out and don't get received #3053

Message sizes greater than around 262 kB drop out and don't get received #3053

calvertdw commented Oct 28, 2022 •

edited

Loading

calvertdw commented Oct 28, 2022

contr4l commented Nov 2, 2022 •

edited

Loading

calvertdw commented Nov 2, 2022

ds58 commented Nov 3, 2022 •

edited

Loading

ds58 commented Nov 3, 2022 •

edited

Loading

JLBuenoLopez commented Nov 4, 2022

EduPonz commented Nov 4, 2022

calvertdw commented Nov 4, 2022

qpc001 commented Mar 12, 2023

calvertdw commented Mar 15, 2023

Mario-DL commented Jun 20, 2023

calvertdw commented Jul 10, 2023

calvertdw commented Sep 28, 2023

JLBuenoLopez commented Sep 29, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Message sizes greater than around 262 kB drop out and don't get received #3053

Message sizes greater than around 262 kB drop out and don't get received #3053

Comments

calvertdw commented Oct 28, 2022 • edited Loading

Expected behavior

Current behavior

Steps to reproduce

Fast DDS version/commit

Platform/Architecture

Transport layer

Additional context

calvertdw commented Oct 28, 2022

contr4l commented Nov 2, 2022 • edited Loading

calvertdw commented Nov 2, 2022

ds58 commented Nov 3, 2022 • edited Loading

ds58 commented Nov 3, 2022 • edited Loading

JLBuenoLopez commented Nov 4, 2022

EduPonz commented Nov 4, 2022

calvertdw commented Nov 4, 2022

qpc001 commented Mar 12, 2023

calvertdw commented Mar 15, 2023

Mario-DL commented Jun 20, 2023

calvertdw commented Jul 10, 2023

calvertdw commented Sep 28, 2023

JLBuenoLopez commented Sep 29, 2023

This issue was moved to a discussion.

calvertdw commented Oct 28, 2022 •

edited

Loading

contr4l commented Nov 2, 2022 •

edited

Loading

ds58 commented Nov 3, 2022 •

edited

Loading

ds58 commented Nov 3, 2022 •

edited

Loading