-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image upload sometimes stalls with HTTP/2 #3559
Comments
I've found that navigating around and doing other things on the console shortly before trying the upload makes it more likely to fail. The HTTP/2 connection reuse might figure into that. So force reloading the image upload page could help? Anecdotally, Chrome and the cli haven't run into this and the packet capture for those show Nexus sending |
I took a peek at how the cli was doing image uploads and compared it to what the console was doing and noticed there was a difference in the size of chunks both used (cli, console). As a quick test I changed the console to use the same chunk size: index 879342f4..93210bc0 100644
--- a/app/forms/image-upload.tsx
+++ b/app/forms/image-upload.tsx
@@ -136,7 +136,8 @@ const ABORT_ERROR = new Error('Upload canceled')
* line breaks can introduce up 4% more overhead, but that doesn't apply here
* because we are only sending the encoded data itself.
*/
-const CHUNK_SIZE = Math.floor((512 * KiB * 3) / 4)
+const CHUNK_SIZE = 512 * KiB With that, I've started seeing regular It's hard to say exactly why that works but the HTTP/2 spec does not mandate any specifics inhow a sender or receiver needs to manage these windows:
It could be that increasing Regardless, I think we definitely want to think about adjusting the server side initial window frame size instead of just relying on the default Footnotes |
We received a report of a user hitting this on Windows 11 + Chrome (latest versions as of this update) when uploading to the colo rack release 5 (stuck at 100% upload). Specifically attempting to upload this image: https://download.fedoraproject.org/pub/fedora/linux/releases/39/Cloud/x86_64/images/Fedora-Cloud-Base-39-1.5.x86_64.raw.xz Going to check logs on the rack to see if I can find anything relevant. |
Should we be attempting to prevent users from uploading compressed images? There's no way they can boot from a |
We can certainly check the extension client-side at the very least. Actually, it would have to be client-side because the API never sees the filename. We have an issue (oxidecomputer/console#1846) to at least say something about this on the form, but I'll add a note about enforcement. |
Maybe a warning instead of outright not allowing? |
I just tried uploading
This makes sense: that URL currently returns |
That's a relief. That likely means one of their chunk requests errored out and the built-in retries weren't enough. We could probably stand to pull the error message out of the API response and display it there. The fact that the size still got up to 100% is weird, though. The chunk size is 512 KiB, so if one failed we should see the amount uploaded down by half a MiB. Could be a mistake in our error handling — maybe it failed, then retried, then succeeded, and we still show it as an error. |
We haven't seen this for a very long time. I think #6690 addressed it. |
This was the image upload failing on Firefox/macOS bug that @david-crespo was running into.
I've looked into it some more and from what I can tell, at some point the browser gets stalled while uploading some chunks.
On the console side we split up the file into 384KiB chunks which we try to upload 6 at a time (to not hit browser concurrency limits). It doesn't happen every time, but every so often there will be a chunk or two where it seems like the browser made the request but there's no response. (At least from the browser Web Developer Tools Network tab).
I changed the console side to add a query parameter on each individual chunk upload and for the stalled chunks I saw no mention of such a request in the Nexus logs. Next I tried a packet capture with Wireshark and after setting up
SSLKEYLOGFILE
(because this will only repro with https; we'll get back to that) I did see the browser making those requests. And in fact, I could see it sending until at some point it stops with still no response.After not being able to repro without TLS and then seeing the packet cap, I realized we're using HTTP/2 for compatible clients. The browser is maintaining a single connection and using multiple HTTP/2 streams to make the different requests. Ok, so is something else blocking our image upload somehow? Cue some more reading about HTTP/2 and it definitely has a concept of flow control that each peer maintains separately.
Basically, for each side, there's a connection level flow control window size as well as a per-stream size. Every byte sent decrements the available bytes in the window. If the sender has exhausted the window size, they must not send anymore until a
WINDOW_UPDATE
is received from the peer that tells it there's more space.I need to look into it some more but it seems like the browser might think its exhausted the window but there's no window update from the nexus size. Hacked in a
tracing
subscriber to see anything useful from hyper and I do see mentions of the stalled streams but not super familier with hyper enough to decode them yet. (hyperium/hyper#2899 seems relevant maybe?)The text was updated successfully, but these errors were encountered: