-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uploads don't start until a client connects, fail in multi-instances environment #3538
Comments
Uppy and companion instances are 'married' to each other, meaning you can't do a multi instance setup with picking up the upload later in a different companion instance. Companion isn't stateless, it stores a lot of data in memory. We talked about this extensively at Transloadit, but the conclusion was not to pursue a huge rewrite to make this happen. Perhaps @kvz can shed some light on how one of Transloadit's customers solved their scaling issue in a similar way. |
Hey @Murderlon, |
Hi
I agree, but it's not just about progress, but about synchronisation between client and server. If we did not wait here and the upload finishes before the client has connected to the socket, then the client would never know that the upload finished so it would hang forever in the "uploading" state. This could probably be fixed but it could require a new major version of both uppy and companion.
For now, if you want to do multi-instance setup I think it should still be possible with sticky sessions.
Actually by chance, unrelated to this, I fixed this just now in #3544 by implementing a timeout. As for the redis configuration and logic, I'm not 100% sure what's the purpose, but I can see from the docs:
From the code we can see that when redis is enabled in companion config, I can see that the following happens:
However I don't see how this helps much when uploads cannot be started on one server and continue on a different one. |
Thanks @Murderlon, I'd love to hear more details about how your customers configured scaling! |
Looking at uppy.io, I can see that the companion keeps trying to send the response header I think what needs to be done to support cookies (and thereby sticky sessions in a load balancer) is to enable |
I've done some more research and I've found that companion does indeed not support true fault-tolerant non-sticky load balancing with multiple companion instances behind one (sub)domain (where any client request can go to any instance). Instead it supports a different type of scaling where each companion instance needs to have its own domain name. e.g. This means that you don't need to set up sticky sessions, redis or anything like this, because Uppy/Companion will handle stickiness internally (actually the express session code isn't really in use AFAIK.) The only thing that you need to do is to set I found this information in #1845 and #2321 I will update the documentation because this is a frequently requested feature. |
Actually when I look more closely at the code, unless my brain is too tired, I can see that socket.js does indeed proxy/forward all websocket messages to redis.
This would imply that we do actually support fault-tolerant scaling between multiple companion instances on the same domain 🤯 consider this:
So if this theory is correct, then I think it should just work if you enable redis in your setup. In such case we need to update our docs also. The explanation for why express sessions are being used (even though they don't seem to work over XHR) is this: If this really works, I think we should make an integration or e2e test for running with redis and a load balancer, to confirm that switching between two instances on upload really works. |
The problem is this code waiting for socket connect start for start file upload:
So even if the companion got a socket connection to another pod it's not handling file upload. We need to add some listener on socket connect to handle the upload as stateless. |
@Mosharush see my previous comment. Because other instances will broadcast a |
Can we turn this issue into something actionable so we work towards closing it? What needs to happen? Code changes? Docs? |
Someone needs to test this whole setup with two companion instances connected over redis and a load balancer that balances requests round robin (every new request goes to a different instance) Or we must set up an automated system test for this (a lot more work) |
Perhaps we can quickly test it first ourselves and see if it works. If we decide to support it there should be a test for it in my opinion. But that can come later. |
But AFAIK two people here already tested it and it didn't work with redis enabled so I suppose we can go straight into deciding whether we want to support this? |
IMO it's critical to support this in order to have a scaleable solution :) |
I agree that it's a critical feature for being able to use companion in real-world applications that need to scale. And because I think we are already so close to supporting it (if not already supported), we should do the little effort of testing if it works. @gabiganam did you happen to see if redis events were coming in at all? like |
@mifi In my test environment it does not work as expected :( |
Hi again. I've done some more testing around this, and I've set up a simple load balancer (reverse proxy) based on http-proxy in front of two companion instances, so that every even request goes to one companion and odd requests to the other. I've observed that the '/drive/get/x' request goes to companion 1 while the websocket Companion 1
|
We have now deployed a multi-instance setup of Companion on uppy.io (runs behind Heroku's built in SSL terminating router / load balancer). So when you upload a file there, all the requests will be distributed between two companion instances. This seems to be working nicely, so I will consider it a proof that multi-instance companion works as expected. There were a couple of things I had to make sure for it to work:
I'm going to close this but can reopen if people are still having trouble |
I'm not sure I understand the logic behind this mechanism, IMO getting the progress is not worth not starting the transfer.
From upload.js:
From Uploader.js:
I'm setting up multiple companion pods in a k8s cluster, so it happens that an upload starts in 1 instance and the client connection is received by a different instance.
Resulting in the first instance "awaitReady" being stuck forever, and the upload never starts.
So basically my questions are:
The text was updated successfully, but these errors were encountered: