Timeout on MacOS after making ~16384 RPC calls #67

casey · 2022-08-05T01:32:10Z

I originally hit this while using rust-bitcoincore-rpc, but was able to narrow it down to this crate.

The error is:

Transport(SocketError(Error { kind: TimedOut, message: "connection timed out" }))

I uploaded a repo containing a self-contained reproduction here: https://github.com/casey/bitcoind-rpc-timeout/blob/master/src/main.rs

The issue was introduced some time between 0.11 and 0.12 and is still present on master. I tried to bisect it down, but there was a lot of flux in the crate, which made it impossible to track down a specific revision.

While debugging, I would sometimes get this error:

Transport(SocketError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }))

Which makes me think it's related to this issue: rust-bitcoin/rust-bitcoincore-rpc#211
And this issue: bitcoin/bitcoin#25724 (comment)

Although I wouldn't necessarily bet on it.

The text was updated successfully, but these errors were encountered:

tcharding · 2022-08-05T01:41:00Z

That repo crate should hit the bug when running using cargo run, right? Just posting here to confirm the MacOS in the title, on Ubuntu 20.04 the bug does not appear to be triggered.

casey · 2022-08-05T02:01:00Z

That repo crate should hit the bug when running using cargo run, right? Just posting here to confirm the MacOS in the title, on Ubuntu 20.04 the bug does not appear to be triggered.

Yup, that's right. I also tried it on an Ubuntu 21.10 server, and wasn't able to reproduce it there.

casey · 2022-08-05T02:14:01Z

For reference, I added a GitHub actions workflow that passes on Ubuntu and fails on MacOS.

tcharding · 2022-08-05T03:51:56Z

Good bug reporting effort bro, thanks!

apoelstra · 2022-08-05T16:54:57Z

Between 0.11 and 0.12 we completely replaced hyper with a homebrew http transport. Probably we are leaking some resource.

casey · 2022-08-05T16:55:58Z

It's always around 16384 which seems suspicious. Possibly the resource being leaked is a socket or file handle?

apoelstra · 2022-08-05T17:00:51Z

Yep -- those are my guesses. I'll try to poke at this later today, although I don't have a Mac so it might be a bit annoying.

On your Mac could you try running your test in strace and see if there's some particular syscall (like open on a socket?) which is timing out?

casey · 2022-08-05T17:24:43Z

On your Mac could you try running your test in strace and see if there's some particular syscall (like open on a socket?) which is timing out?

I actually tried to, but I'm on a recent mac OS with security features that prevent dtrace (the strace equivalent available on macos) from running, unless I reboot into recovery mode and run some scary commands. I have another mac that I can do this on, but I'd need to wipe it first.

apoelstra · 2022-08-07T15:08:12Z

After reading the code (and using strace) I'm fairly confident there are no resource leaks. I suspect that on Mac you just aren't allowed to open 2^16 sockets even if you close the old ones (at least, you can't do this in very rapid succession).

The previous code used Hyper, which just maintained a socket pool. We should probably do the same (well, we'd have a single socket rather than a pool .. but still we should cache it).

The reason we didn't do this in the past is that TCP sockets suck -- they can go stale and you can't tell they're dead until you actually try sending and then you've gotta re-try, so you better not have handed ownership over any objects to the serialization code. None of this is a big deal but we were trying to go for maximal simplicity.

I don't think this'll be too hard to fix, especially as we have the old hyper code lying around which tells us which particular io errors mean "broken pipe, just try again" and which ones are real errors.

casey · 2022-08-07T18:28:46Z

After reading the code (and using strace) I'm fairly confident there are no resource leaks. I suspect that on Mac you just aren't allowed to open 2^16 sockets even if you close the old ones (at least, you can't do this in very rapid succession).

This sounds right. It might have to do with open ports, since each outgoing socket will allocate a local ephemeral port.

The previous code used Hyper, which just maintained a socket pool. We should probably do the same (well, we'd have a single socket rather than a pool .. but still we should cache it).

Another option to consider making the back-end pluggable. My application already uses hyper, so there aren't any additional security/bloat considerations if I use it in a dependency.

All this makes me wish that Bitcoin Core had a simple binary RPC API over a local FIFO, so that if you didn't need remote access, you didn't have to think about HTTP, security, ports, sockets.

casey · 2022-08-08T17:45:30Z

More digging, and it looks like it's the ephemeral port range that's being exhausted. MacOS's ephemeral port range is 49152 to 65535, which is 16383 ports. If reusing the socket is hard and annoying, then reusing the port might be a lower impact option.

apoelstra · 2022-08-08T18:41:43Z

The back-end is actually pluggable -- we have a Transport trait that you can implement using Hyper if you want (and I think we'd accept a PR to use hyper if you wanted to upstream it here, though it would need to be feature gated).

But I think just reusing the socket would be easier/not too bad. It's on my list.

gwillen · 2022-08-09T18:06:03Z

I suggest trying to set SO_REUSEADDR on the sockets, and possibly also disable SO_LINGER -- this may help if the issue is you're trying to reuse ports too fast, while the old socket is still in TIME_WAIT (closed but not fully cleaned up yet.) I don't know why it's only seen on mac -- one possibility is that Linux only applies this to sockets bound to a specific port, and mac is also applying it to ephemeral ones. (Or maybe the mac just has a much smaller legal range for ephemeral ports.)

(Obviously just reusing the socket is better, if you're making a ton of calls in rapid succession. But this might be a lot easier to try.)

apoelstra · 2022-08-10T22:16:53Z

@casey are you able to try this locally? I don't have a Mac to play with.

casey · 2022-08-12T00:58:57Z

@apoelstra I should be able to fool around with this tomorrow .

casey · 2022-08-13T09:43:45Z

Narrator: “He didn’t have time to fool around with it tomorrow.” Definitely early next week though!

casey · 2022-08-16T22:57:19Z

We just tried setting SO_REUSEADDR, SO_REUSEPORT, and disabling SO_LINGER, and they don't seem to have helped. Reusing the socket might be the best option.

apoelstra · 2022-08-19T19:05:38Z

Surprising! But ok, we'll reuse the socket (probably wise regardless).

e867374 Reuse HTTP connection (raphjaph) Pull request description: This is a quick and dirty attempt at fixing #67 that reuses the socket for multiple requests. It has few issues: - no error handing, client becomes unusable after encountering a single error - `SimpleHttpTransport::request()` takes a `mut` reference to self, which cascades to all downstream users - `TcpStream` is not `Clone`, therefore I had to remove the derive `Clone` An alternate implementation based on `reqwest` (`hyper` seems too low-level because of the async stuff) wouldn't have these problems. So I think I'm going to start on that. Just wanted so share this draft PR if anyone has feedback or has an idea on how to do this without `reqwest`. ACKs for top commit: apoelstra: ACK e867374 Tree-SHA512: 470e740b7bff7e323bdce014e73e44228e88ed332b3cb49d01302a9d591e51a1ffb25bd3cbcb6413090d89a627e289a04685c6dea615b968190a36635450d486

apoelstra · 2022-11-28T14:56:33Z

Fixed by #72

casey mentioned this issue Aug 5, 2022

bitcoind RPC Errors ordinals/ord#273

Closed

casey mentioned this issue Aug 12, 2022

Socket or port exhaustion when making fast RPC calls to bitcoin core ordinals/ord#309

Closed

notmandatory mentioned this issue Sep 10, 2022

Error syncing wallet via RPC client on macOS system bitcoindevkit/bdk#749

Open

raphjaph mentioned this issue Oct 26, 2022

Reuse socket #72

Merged

apoelstra closed this as completed Nov 28, 2022

w0xlt mentioned this issue Jan 29, 2023

Update jsonrpc to 0.14 rust-bitcoin/rust-bitcoincore-rpc#276

Open

dannydeezy mentioned this issue Feb 7, 2023

Possible timeout to bitcoin core on inscribe ordinals/ord#1566

Open

andrewtoth mentioned this issue Feb 12, 2023

Use REST interface for calling get_raw_transaction ordinals/ord#1636

Closed

This was referenced Feb 15, 2023

Make test_esplora_syncs more robust lightningdevkit/rust-lightning#2033

Merged

RPC sync failure bitcoindevkit/bdk#859

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout on MacOS after making ~16384 RPC calls #67

Timeout on MacOS after making ~16384 RPC calls #67

casey commented Aug 5, 2022

tcharding commented Aug 5, 2022 •

edited

Loading

casey commented Aug 5, 2022

casey commented Aug 5, 2022

tcharding commented Aug 5, 2022

apoelstra commented Aug 5, 2022

casey commented Aug 5, 2022

apoelstra commented Aug 5, 2022

casey commented Aug 5, 2022

apoelstra commented Aug 7, 2022

casey commented Aug 7, 2022

casey commented Aug 8, 2022

apoelstra commented Aug 8, 2022

gwillen commented Aug 9, 2022 •

edited

Loading

apoelstra commented Aug 10, 2022

casey commented Aug 12, 2022

casey commented Aug 13, 2022

casey commented Aug 16, 2022

apoelstra commented Aug 19, 2022

apoelstra commented Nov 28, 2022

Timeout on MacOS after making ~16384 RPC calls #67

Timeout on MacOS after making ~16384 RPC calls #67

Comments

casey commented Aug 5, 2022

tcharding commented Aug 5, 2022 • edited Loading

casey commented Aug 5, 2022

casey commented Aug 5, 2022

tcharding commented Aug 5, 2022

apoelstra commented Aug 5, 2022

casey commented Aug 5, 2022

apoelstra commented Aug 5, 2022

casey commented Aug 5, 2022

apoelstra commented Aug 7, 2022

casey commented Aug 7, 2022

casey commented Aug 8, 2022

apoelstra commented Aug 8, 2022

gwillen commented Aug 9, 2022 • edited Loading

apoelstra commented Aug 10, 2022

casey commented Aug 12, 2022

casey commented Aug 13, 2022

casey commented Aug 16, 2022

apoelstra commented Aug 19, 2022

apoelstra commented Nov 28, 2022

tcharding commented Aug 5, 2022 •

edited

Loading

gwillen commented Aug 9, 2022 •

edited

Loading