Add ENOBUFS handling for unsolicited messages #2

Tuetuopay · 2022-10-12T09:48:26Z

This can happen when large burst of messages come all of a sudden, which happen very easily when routing protocols are involved (e.g. BGP). The current implementation incorrectly assumes that any failure to read from the socket is akin to the socket closed. This is not the case.

This adds handling for this specific error, which translates to a wrapper struct in the unsolicited messages stream: either a message, or an overrun. This lets applications handle best for their usecase such event: either resync because messages are lost, or do nothing if the listening is informational only (e.g. logging).

This is a direct port of little-dude/netlink#293.

src/framed.rs

cathay4t · 2022-10-24T04:05:27Z

@Tuetuopay The kernel is treating it as an error. With the changes you suggested, almost every existing rust-netlink crate needs to be changed and their user also just for a rare use case. The rust has a saying: you do not need to pay for what you will not use. Clearly your patch is against this design.

My suggestion is, raise proper error like OverRunError, for any user who cares, they can handle this error by a controlled retry.

Tuetuopay · 2022-10-25T14:01:38Z

@Tuetuopay The kernel is treating it as an error. With the changes you suggested, almost every existing rust-netlink crate needs to be changed and their user also just for a rare use case. The rust has a saying: you do not need to pay for what you will not use. Clearly your patch is against this design.

My suggestion is, raise proper error like OverRunError, for any user who cares, they can handle this error by a controlled retry.

@cathay4t Indeed this is more than a simple API breaking change. What would you suggest?

change the stream to yield Result<NetlinkPayload, Error> with the error only being an overrun
yield NetlinkPayload::Overrun(_) which I thought about originally, but it'd conflate an actual overrun notification received from netlink in a message and an overrun from the socket itself. That's my preferred option.

However I don't completely agree that not everyone needs to implement it. One of the strengths of Rust is exposing you to failure cases through the type system: you have to explicitly ignore the error case, while others languages makes you explicitly handle the error case.

cathay4t · 2022-11-04T12:54:06Z

I prefer NetlinkPayload::Overrun way.
Meanwhile, the existing user facing crate(e.g. rtnetlink) might need to amend to support this in their macros.rs to convert NetlinkPayload::Overrun to Err(rtnetlink::Error::Overrun). So user of rtnetlink could ignore this specific error.
To even better, we can allow user to say AddressGetRequest::set_auto_retry_on_overrun(true) or AddressGetRequest::set_ignore_overrun(true) to make the user's life eaiser.

Tuetuopay · 2022-12-14T18:02:53Z

I prefer NetlinkPayload::Overrun way.

Sure! Implementing this.

Meanwhile, the existing user facing crate(e.g. rtnetlink) might need to amend to support this in their macros.rs to convert NetlinkPayload::Overrun to Err(rtnetlink::Error::Overrun). So user of rtnetlink could ignore this specific error. To even better, we can allow user to say AddressGetRequest::set_auto_retry_on_overrun(true) or AddressGetRequest::set_ignore_overrun(true) to make the user's life eaiser.

Okay there is a deep misunderstanding of the issue here. This does not fixes issues with request/reply mode (as suggested by AddressGetRequest::), but with subscription mode. In this mode, the kernel sends notifications as fast as it can, without waiting. This does not happen in regular request/response mode because it will wait for consumers to ack messages before sending more.

Basically we can see regular request/response mode as TCP and multicast mode as UDP, if you squint hard enough.

As for allowing the user to ignore those errors, they can already be disabled when opening the stream socket with Socket::set_no_enobufs.

ps: sorry for the very late response, I did not have time to get back to this before.

Tuetuopay · 2022-12-15T17:28:51Z

Hi @cathay4t!

I pushed a fully new implementation. This is what was suggested: generate actual NetlinkMessages with a payload of NetlinkPayload::Overrun. This way, users can continue ignoring the error while people that mind (e.g. BGP people) can properly handle the error.

On a related note, I saw rust-netlink/netlink-packet-core#7 and it can indeed be done as I do not use the struct anymore in the new version.

I hope this one is more to your taste :)

cathay4t · 2024-01-10T12:42:30Z

@Tuetuopay Yes. The new patch looks good. Please rebase it then we are OK to merge.

Do you have plan to change rtnetlink try_rtnl! for this?

cathay4t · 2024-01-10T12:48:22Z

The src/connection.rs has Overrun(_) => unimplemented!("overrun is not handled yet") , can you also patch it?

yshui · 2024-09-23T17:15:53Z

I am also hitting the ENOBUFS problem. Did @Tuetuopay lost interests in this PR? I can try to take over as this doesn't look too complex.

Tuetuopay · 2024-09-24T08:04:42Z

I am also hitting the ENOBUFS problem. Did @Tuetuopay lost interests in this PR? I can try to take over as this doesn't look too complex.

Pretty much, I stopped using it for the usecase I was hitting. I may use this library again in the future though.

I'll see to address the comments, but if I don't get time for this feel free to pick it up :)

This can happen when large burst of messages come all of a sudden, which happen very easily when routing protocols are involved (e.g. BGP). The current implementation incorrectly assumes that any failure to read from the socket is akin to the socket closed. This is not the case. This commit adds handling for this specific error by generating a `NetlinkPayload::Overrun(_)` message that users receive on their unsolicited message channel. Since this is just an additional message, there is no breaking change for existing users and they are free to ignore it if they do not want to handle it, or handle it by e.g. resyncing.

Tuetuopay · 2024-09-24T08:49:34Z

The src/connection.rs has Overrun(_) => unimplemented!("overrun is not handled yet") , can you also patch it?

@cathay4t Well, about that, it was left as unimplemented on purpose. You will get ENOBUFS on the unsolicited message channel, not on the regular request/response channel. Why? Because when using netlink in request/response mode, there is an ack mechanism between the kernel and userspace, precisely to avoid this issue. (well, unless the buffer sizes are set stupidly low, but in such case, this is a non-recoverable error).

I can set it as a fatal error, and make it a break from the forward_responses loop to exit the whole netlink system. Less violent than the unimplemented!. Your call :)

Btw, rebased on master.

Tuetuopay · 2024-09-24T19:10:06Z

@Tuetuopay Yes. The new patch looks good. Please rebase it then we are OK to merge.

Do you have plan to change rtnetlink try_rtnl! for this?

Just remembered this. Well, no, for the same reason I left the unimplemented: this is not supposed to happen in a request/response scenario. It really is an unexpected thing to get in req/res.

cathay4t requested changes Oct 23, 2022

View reviewed changes

src/framed.rs Outdated Show resolved Hide resolved

Tuetuopay force-pushed the handle-enobufs branch from a96da1f to 7ed06ec Compare October 25, 2022 12:07

Tuetuopay force-pushed the handle-enobufs branch from 7ed06ec to 0e6673f Compare December 15, 2022 17:26

cathay4t added the Wait_Submitter label Jan 10, 2024

Tuetuopay force-pushed the handle-enobufs branch from 0e6673f to 507a69f Compare September 24, 2024 08:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ENOBUFS handling for unsolicited messages #2

Add ENOBUFS handling for unsolicited messages #2

Tuetuopay commented Oct 12, 2022

cathay4t commented Oct 24, 2022 •

edited

Loading

Tuetuopay commented Oct 25, 2022

cathay4t commented Nov 4, 2022 •

edited

Loading

Tuetuopay commented Dec 14, 2022 •

edited

Loading

Tuetuopay commented Dec 15, 2022

cathay4t commented Jan 10, 2024

cathay4t commented Jan 10, 2024

yshui commented Sep 23, 2024

Tuetuopay commented Sep 24, 2024

Tuetuopay commented Sep 24, 2024 •

edited

Loading

Tuetuopay commented Sep 24, 2024

Add ENOBUFS handling for unsolicited messages #2

Are you sure you want to change the base?

Add ENOBUFS handling for unsolicited messages #2

Conversation

Tuetuopay commented Oct 12, 2022

cathay4t commented Oct 24, 2022 • edited Loading

Tuetuopay commented Oct 25, 2022

cathay4t commented Nov 4, 2022 • edited Loading

Tuetuopay commented Dec 14, 2022 • edited Loading

Tuetuopay commented Dec 15, 2022

cathay4t commented Jan 10, 2024

cathay4t commented Jan 10, 2024

yshui commented Sep 23, 2024

Tuetuopay commented Sep 24, 2024

Tuetuopay commented Sep 24, 2024 • edited Loading

Tuetuopay commented Sep 24, 2024

cathay4t commented Oct 24, 2022 •

edited

Loading

cathay4t commented Nov 4, 2022 •

edited

Loading

Tuetuopay commented Dec 14, 2022 •

edited

Loading

Tuetuopay commented Sep 24, 2024 •

edited

Loading