-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request never connects on armhf
#642
Comments
Hm, do you have easy access to an armhf machine so we can work through this together? The futex wait is because the main thread is parking until the async runtime thread makes progress and returns a |
I don't have real armhf hardware to hand to try arbitrary stuff on -- those straces came from the snapcraft build infrastructure itself. I will see if I can replicate the issue running in qemu-user-static on my laptop. If I can, I'll see if that example has similar issues. |
If somebody can work out how to wedge the relevant test code into |
I have failed at replicating the issue on my x86 laptop using qemu-user so I imagine we will have to try @cjwatson 's idea -- Problem is, I don't know what I'd do to do that. I'll also see if I can fake the number of CPUs which is reported to rustup in case there's something spawning |
Even isolating to a 1 CPU VM, I couldn't replicate it on x86_64 with qemu-user so we're going to have to try something else. I am firing up an armhf instance in scaleway (or at least trying to) to see if I can replicate on there. |
I failed to replicate it myself. I wonder if it has something to do with the virtualisation that is done for Snapcraft, combined with something else in the stack of reqwest? @seanmonstar is the |
I have a Raspberry Pi I can try to reproduce this on, if you think it would help. |
The |
Aah, as per the original post, I first discussed this with the mio folks in tokio-rs/mio#1089 and they suggested here. I'm now worried that noone knows what's going on. I'm not sure it'll be platform specific so much as perhaps an interaction between something "interesting" on armhf, and the particular size of the system snapcraft are using. The oddness was that |
If you have test cases I can help wrangle them into snapcraft with whatever tracing / debugging is needed so we can run that on the infrastructure exhibiting the issue. (I am affected as one of my snaps fails in this way - @cjwatson sent me this way and I'd like to help where I can). |
A good first step would be trying the async example, which would help determine if the issue is about the blocking API not allowing epoll to run. |
@popey is there any progresses ? |
EDIT: sorry, I didn't see kinnison's work using QEMU on this. Not sure if others can reproduce issue by trying different settings like 2+ cpu, etc. Running Ubuntu 16.04.1 armhf on Qemu
First comment on this gist is from Nov 2016 but there are comments as recent as April 20, 2020 that solve networking issues some people had. |
@x448 Thanks, but the issue is in the Snapcraft builder VMs, so I'd guess Canonical are okay at configuring qemu properly, and since it tends to work for everything else I remain confused as to why reqwest fails. |
We're also using actual hardware, not ARM-on-x86. qemu is still involved, but unlikely to be very much related to that gist. |
We may possibly have got to the bottom of this. See lxc/lxcfs#553. |
Looks like we should close this off, thank you @cjwatson and @seanmonstar for your efforts. |
Hi, this was originally discussed in tokio-rs/mio#1089 where we decided that it probably made sense to migrate the discussion to here.
In brief -- A friend (@cjwatson) and I have been diagnosing a fault in
rustup
on armhf in Snapcraft's build environment. It seems to sit for 30s trying to connect and then fails. This only seems to happen onarmhf
-- on other platforms it connects just fine.An
strace
of the attempt shows:(Further straces show the
epoll_ctl()
call takes microsecnds, so it's not actually stuck in it for 30s, but the thread which did theepoll_ctl()
call subsequently did nothing. (Trace attached to the mio bug so I won't reattach it here).Interestingly in that strace we never get to
epoll_wait()
on armhf.I had previously assumed it was probably mio at fault, but the discussion there suggests it's more likely in the reqwest/tokio interfacing, so I brought the issue to here to discuss further.
The text was updated successfully, but these errors were encountered: