Request never connects on `armhf` #642

kinnison · 2019-09-18T14:14:48Z

Hi, this was originally discussed in tokio-rs/mio#1089 where we decided that it probably made sense to migrate the discussion to here.

In brief -- A friend (@cjwatson) and I have been diagnosing a fault in rustup on armhf in Snapcraft's build environment. It seems to sit for 30s trying to connect and then fails. This only seems to happen on armhf -- on other platforms it connects just fine.

An strace of the attempt shows:

[pid  3517] 06:37:57.516581 futex(0xf933b8, FUTEX_WAIT_PRIVATE, 0, {tv_sec=29, tv_nsec=990974355} <unfinished ...>
[pid  3518] 06:37:57.516671 <... fcntl64 resumed> ) = 0x2 (flags O_RDWR)
[pid  3518] 06:37:57.516762 fcntl64(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid  3518] 06:37:57.516894 connect(7, {sa_family=AF_INET, sin_port=htons(8222), sin_addr=inet_addr("10.10.10.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid  3518] 06:37:57.521838 epoll_ctl(4, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLET, {u32=0, u64=0}}) = 0
[pid  3517] 06:38:27.507984 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out)

(Further straces show the epoll_ctl() call takes microsecnds, so it's not actually stuck in it for 30s, but the thread which did the epoll_ctl() call subsequently did nothing. (Trace attached to the mio bug so I won't reattach it here).

Interestingly in that strace we never get to epoll_wait() on armhf.

I had previously assumed it was probably mio at fault, but the discussion there suggests it's more likely in the reqwest/tokio interfacing, so I brought the issue to here to discuss further.

The text was updated successfully, but these errors were encountered:

seanmonstar · 2019-09-19T17:11:00Z

Hm, do you have easy access to an armhf machine so we can work through this together?

The futex wait is because the main thread is parking until the async runtime thread makes progress and returns a Response. If we want to eliminate that as a problem, we could try just running the async example. If that doesn't work, then I'd suspect the issue is lower in the stack, either tokio or mio.

kinnison · 2019-09-20T07:24:11Z

I don't have real armhf hardware to hand to try arbitrary stuff on -- those straces came from the snapcraft build infrastructure itself. I will see if I can replicate the issue running in qemu-user-static on my laptop. If I can, I'll see if that example has similar issues.

cjwatson · 2019-09-20T07:36:08Z

If somebody can work out how to wedge the relevant test code into snapcraft then I can also try running it on our infrastructure.

kinnison · 2019-09-20T07:51:58Z

I have failed at replicating the issue on my x86 laptop using qemu-user so I imagine we will have to try @cjwatson 's idea -- Problem is, I don't know what I'd do to do that. I'll also see if I can fake the number of CPUs which is reported to rustup in case there's something spawning ncpus threads for the worker pool and that's what's going on.

kinnison · 2019-09-20T08:32:56Z

Even isolating to a 1 CPU VM, I couldn't replicate it on x86_64 with qemu-user so we're going to have to try something else. I am firing up an armhf instance in scaleway (or at least trying to) to see if I can replicate on there.

kinnison · 2019-10-02T19:51:50Z

I failed to replicate it myself. I wonder if it has something to do with the virtualisation that is done for Snapcraft, combined with something else in the stack of reqwest? @seanmonstar is the upstream label suggesting you've filed another bug elsewhere?

lnicola · 2019-10-02T19:55:16Z

I have a Raspberry Pi I can try to reproduce this on, if you think it would help.

seanmonstar · 2019-10-02T20:00:56Z

The upstream label is a guess that it's either in mio or tokio. Neither reqwest nor hyper have conditional code per target.

kinnison · 2019-10-02T20:25:19Z

Aah, as per the original post, I first discussed this with the mio folks in tokio-rs/mio#1089 and they suggested here. I'm now worried that noone knows what's going on. I'm not sure it'll be platform specific so much as perhaps an interaction between something "interesting" on armhf, and the particular size of the system snapcraft are using. The oddness was that epoll_ctl() was called, but then the epoll was never checked, which points perhaps at an executor with too few threads?

popey · 2019-10-09T16:37:12Z

If you have test cases I can help wrangle them into snapcraft with whatever tracing / debugging is needed so we can run that on the infrastructure exhibiting the issue. (I am affected as one of my snaps fails in this way - @cjwatson sent me this way and I'd like to help where I can).

seanmonstar · 2019-10-09T16:40:30Z

A good first step would be trying the async example, which would help determine if the issue is about the blocking API not allowing epoll to run.

tesuji · 2019-12-10T04:30:38Z

@popey is there any progresses ?
Edit: The async example runs well in aarch64-linux-gnu machine. I don't have armhf to test it.

x448 · 2020-04-27T23:02:01Z

EDIT: sorry, I didn't see kinnison's work using QEMU on this. Not sure if others can reproduce issue by trying different settings like 2+ cpu, etc.

Running Ubuntu 16.04.1 armhf on Qemu
https://gist.github.com/takeshixx/686a4b5e057deff7892913bf69bcb85a

This is a writeup about how to install Ubuntu 16.04.1 Xenial Xerus for the 32-bit hard-float ARMv7 (armhf) architecture on a Qemu VM via Ubuntu netboot.

The setup will create a Ubuntu VM with LPAE extensions (generic-lpae) enabled. However, this writeup should also work for non-LPAE (generic) kernels.

The performance of the resulting VM is quite good, and it allows VMs with >1G ram ...
...
The netboot files are available on the official Ubuntu mirror.

First comment on this gist is from Nov 2016 but there are comments as recent as April 20, 2020 that solve networking issues some people had.

kinnison · 2020-04-28T19:21:59Z

@x448 Thanks, but the issue is in the Snapcraft builder VMs, so I'd guess Canonical are okay at configuring qemu properly, and since it tends to work for everything else I remain confused as to why reqwest fails.

cjwatson · 2020-04-28T20:07:49Z

We're also using actual hardware, not ARM-on-x86. qemu is still involved, but unlikely to be very much related to that gist.

cjwatson · 2022-08-25T19:18:47Z

We may possibly have got to the bottom of this. See lxc/lxcfs#553.

kinnison · 2023-01-12T08:27:01Z

Looks like we should close this off, thank you @cjwatson and @seanmonstar for your efforts.

seanmonstar added the B-upstream Blocked: upstream. Depends on a dependency to make a change first. label Sep 27, 2019

kinnison mentioned this issue Oct 2, 2019

download: Remove cURL backend rust-lang/rustup#2034

Closed

kinnison closed this as completed Jan 12, 2023

kinnison mentioned this issue Feb 24, 2023

rustup-init hangs in armv7 docker container running on an arm64 Linux with reqwest backend rust-lang/rustup#3122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request never connects on `armhf` #642

Request never connects on `armhf` #642

kinnison commented Sep 18, 2019

seanmonstar commented Sep 19, 2019

kinnison commented Sep 20, 2019

cjwatson commented Sep 20, 2019

kinnison commented Sep 20, 2019

kinnison commented Sep 20, 2019

kinnison commented Oct 2, 2019

lnicola commented Oct 2, 2019

seanmonstar commented Oct 2, 2019

kinnison commented Oct 2, 2019

popey commented Oct 9, 2019

seanmonstar commented Oct 9, 2019

tesuji commented Dec 10, 2019 •

edited

Loading

x448 commented Apr 27, 2020 •

edited

Loading

kinnison commented Apr 28, 2020

cjwatson commented Apr 28, 2020

cjwatson commented Aug 25, 2022

kinnison commented Jan 12, 2023

Request never connects on armhf #642

Request never connects on armhf #642

Comments

kinnison commented Sep 18, 2019

seanmonstar commented Sep 19, 2019

kinnison commented Sep 20, 2019

cjwatson commented Sep 20, 2019

kinnison commented Sep 20, 2019

kinnison commented Sep 20, 2019

kinnison commented Oct 2, 2019

lnicola commented Oct 2, 2019

seanmonstar commented Oct 2, 2019

kinnison commented Oct 2, 2019

popey commented Oct 9, 2019

seanmonstar commented Oct 9, 2019

tesuji commented Dec 10, 2019 • edited Loading

x448 commented Apr 27, 2020 • edited Loading

kinnison commented Apr 28, 2020

cjwatson commented Apr 28, 2020

cjwatson commented Aug 25, 2022

kinnison commented Jan 12, 2023

Request never connects on `armhf` #642

Request never connects on `armhf` #642

tesuji commented Dec 10, 2019 •

edited

Loading

x448 commented Apr 27, 2020 •

edited

Loading