Uploads don't start until a client connects, fail in multi-instances environment #3538

gabiganam · 2022-03-07T13:32:40Z

I'm not sure I understand the logic behind this mechanism, IMO getting the progress is not worth not starting the transfer.

From upload.js:

      // wait till the client has connected to the socket, before starting
      // the download, so that the client can receive all download/upload progress.
      logger.debug('Waiting for socket connection before beginning remote download/upload.', null, req.id)
      await uploader.awaitReady()
      logger.debug('Socket connection received. Starting remote download/upload.', null, req.id)

From Uploader.js:

  async awaitReady () {
    // TODO timeout after a while? Else we could leak emitters
    logger.debug('waiting for socket connection', 'uploader.socket.wait', this.shortToken)
    await new Promise((resolve) => emitter().once(`connection:${this.token}`, resolve))
    logger.debug('socket connection received', 'uploader.socket.wait', this.shortToken)
  }

I'm setting up multiple companion pods in a k8s cluster, so it happens that an upload starts in 1 instance and the client connection is received by a different instance.
Resulting in the first instance "awaitReady" being stuck forever, and the upload never starts.

So basically my questions are:

What am I missing in my multiple-instances setup?
Can we please make this logic of waiting for client connection configurable? or have a configurable timeout as the TODO comment suggests?
I set up a Redis as the documentation suggests, why isn't this allowing for the second instance (that receives the connection) to commence the upload?

The text was updated successfully, but these errors were encountered:

Murderlon · 2022-03-08T07:21:16Z

Uppy and companion instances are 'married' to each other, meaning you can't do a multi instance setup with picking up the upload later in a different companion instance. Companion isn't stateless, it stores a lot of data in memory. We talked about this extensively at Transloadit, but the conclusion was not to pursue a huge rewrite to make this happen.

Perhaps @kvz can shed some light on how one of Transloadit's customers solved their scaling issue in a similar way.

Mosharush · 2022-03-08T14:35:03Z

Hey @Murderlon,
I am also tried this and seems can't scale the companion server.
But I do not understand why we need Redis at all if there is have some data ("a lot of") in memory, why not save it on Redis so It's can be full stateless?

mifi · 2022-03-08T16:25:24Z

Hi

IMO getting the progress is not worth not starting the transfer

I agree, but it's not just about progress, but about synchronisation between client and server. If we did not wait here and the upload finishes before the client has connected to the socket, then the client would never know that the upload finished so it would hang forever in the "uploading" state. This could probably be fixed but it could require a new major version of both uppy and companion.

What am I missing in my multiple-instances setup?

For now, if you want to do multi-instance setup I think it should still be possible with sticky sessions.

Can we please make this logic of waiting for client connection configurable? or have a configurable timeout as the TODO comment suggests?

Actually by chance, unrelated to this, I fixed this just now in #3544 by implementing a timeout.

As for the redis configuration and logic, I'm not 100% sure what's the purpose, but I can see from the docs:

redisUrl(optional) - URL to running Redis server. If this is set, the state of uploads would be stored temporarily. This helps for resumed uploads after a browser crash from the client. The stored upload would be sent back to the client on reconnection.

From the code we can see that when redis is enabled in companion config, I can see that the following happens:

Express sessions use redis

uppy/packages/@uppy/companion/src/standalone/index.js

Line 146 in 7b74148

sessionOptions.store = new RedisStore({ client: redisClient })
upload progress of the file is stored in redis and sent to client upon socket connect

uppy/packages/@uppy/companion/src/server/socket.js

Line 41 in 7b74148

redisClient.get(`${STORAGE_PREFIX}:${token}`, (err, data) => {
companion's EventEmitter events are instead sent over redis

uppy/packages/@uppy/companion/src/server/emitter/index.js

Line 13 in 7b74148

emitter = redisUrl ? redisEmitter(redisUrl, redisPubSubScope) : nodeEmitter()

However I don't see how this helps much when uploads cannot be started on one server and continue on a different one.

gabiganam · 2022-03-14T09:10:36Z

Thanks @Murderlon, I'd love to hear more details about how your customers configured scaling!
@kvz any info you can share will be highly appreciated.

mifi · 2022-03-15T06:41:29Z

Looking at uppy.io, I can see that the companion keeps trying to send the response header Set-Cookie: connect.sid=xyz on every request, and for every request there is a new session ID being created. This is because Uppy client does not send the Cookie header with the cookie that it has received. So actually in the default configuration, I think scaling and sessions does not work.

I think what needs to be done to support cookies (and thereby sticky sessions in a load balancer) is to enable withCredentials: true and add to CORS allowed headers: Cookie as well as some other things. Only when the client sends a cookie with a fixed connect.sid for every request, can this cookie be used in the load balancer to enable stickyness.

mifi · 2022-03-16T14:10:43Z

I've done some more research and I've found that companion does indeed not support true fault-tolerant non-sticky load balancing with multiple companion instances behind one (sub)domain (where any client request can go to any instance).

Instead it supports a different type of scaling where each companion instance needs to have its own domain name. e.g. companion1.example.com, companion2.example.com, ...
There will be one "master" companion instance that handles all authentication, in order to avoid having to set up separate callbacks URL for each instance's subdomain on the OAuth providers' admin page. This master instance is configured using the options oauthDomain and validHosts.

This means that you don't need to set up sticky sessions, redis or anything like this, because Uppy/Companion will handle stickiness internally (actually the express session code isn't really in use AFAIK.)

The only thing that you need to do is to set companionUrl in Uppy to a URL with a randomised companion instance from a list of subdomains.

I found this information in #1845 and #2321

I will update the documentation because this is a frequently requested feature.

mifi · 2022-03-16T15:10:05Z

Actually when I look more closely at the code, unless my brain is too tired, I can see that socket.js does indeed proxy/forward all websocket messages to redis.

uppy/packages/@uppy/companion/src/server/socket.js

Line 57 in fc6f8f3

emitter().emit(`${data.action}:${token}`)

This would imply that we do actually support fault-tolerant scaling between multiple companion instances on the same domain 🤯

consider this:

we have 2 companion instances connected over redis
client connects to companion 1 instance and calls /get on file.
companion 1 stops at await uploader.awaitReady() which waits for the connect:${this.token} event send over redis
client connects via websockets to companion 2 instance and this instance broadcasts the connect:${this.token} event over redis
companion 1 continues the upload and emits progress/success events over redis
companion 2 relays these events from redis back over the websocket

So if this theory is correct, then I think it should just work if you enable redis in your setup. In such case we need to update our docs also.

The explanation for why express sessions are being used (even though they don't seem to work over XHR) is this:
Grant uses express sessions for sending data between redirects, so that we don't need to expose secret tokens in URLs https://github.com/simov/grant#callback-session - after that the session is no longer used. And because these are not XHR requests coming from Uppy, the CORS configuration doesn't matter (the browser is making requests directly to companion in a new tab/window!)

If this really works, I think we should make an integration or e2e test for running with redis and a load balancer, to confirm that switching between two instances on upload really works.

Mosharush · 2022-03-20T12:38:27Z

The problem is this code waiting for socket connect start for start file upload:

await uploader.awaitReady()

So even if the companion got a socket connection to another pod it's not handling file upload.

We need to add some listener on socket connect to handle the upload as stateless.

mifi · 2022-03-20T15:45:42Z

@Mosharush see my previous comment. Because other instances will broadcast a connect:token event over redis when websocket connects, awaitReady should return when it receives this redis event

Murderlon · 2022-03-21T04:31:38Z

Can we turn this issue into something actionable so we work towards closing it? What needs to happen? Code changes? Docs?

mifi · 2022-03-21T04:34:14Z

Someone needs to test this whole setup with two companion instances connected over redis and a load balancer that balances requests round robin (every new request goes to a different instance)

Or we must set up an automated system test for this (a lot more work)

Murderlon · 2022-03-21T04:38:58Z

Perhaps we can quickly test it first ourselves and see if it works. If we decide to support it there should be a test for it in my opinion. But that can come later.

Murderlon · 2022-03-21T04:40:34Z

But AFAIK two people here already tested it and it didn't work with redis enabled so I suppose we can go straight into deciding whether we want to support this?

gabiganam · 2022-03-21T10:31:13Z

IMO it's critical to support this in order to have a scaleable solution :)

mifi · 2022-03-21T12:02:27Z

I agree that it's a critical feature for being able to use companion in real-world applications that need to scale. And because I think we are already so close to supporting it (if not already supported), we should do the little effort of testing if it works. @gabiganam did you happen to see if redis events were coming in at all? like connect:token when websocket connection was established

gabiganam · 2022-03-21T15:57:46Z

@mifi In my test environment it does not work as expected :(
I run 2 companion instances, 1 gets the upload request and awaits the connection, the 2nd companion gets the connection, but the first doesn't continue.
We have Redis defined in both of course.

See logs from companion 1:

Companion 2:

mifi · 2022-10-31T11:37:12Z

Hi again. I've done some more testing around this, and I've set up a simple load balancer (reverse proxy) based on http-proxy in front of two companion instances, so that every even request goes to one companion and odd requests to the other.

I've observed that the '/drive/get/x' request goes to companion 1 while the websocket /api/y request goes to companion 2.

Companion 1 `http://127.0.0.1:3021` logs

companion: 2022-10-31T10:11:42.817Z [debug] 1fdb5dcd-83d0-4e6f-8059-76b3ea7c2579null Instantiating uploader.
companion: 2022-10-31T10:11:42.818Z [debug] 1fdb5dcd-83d0-4e6f-8059-76b3ea7c2579null Starting download stream.
companion: 2022-10-31T10:11:47.047Z [debug] 1fdb5dcd-83d0-4e6f-8059-76b3ea7c2579null Waiting for socket connection before beginning remote download/upload.
companion: 2022-10-31T10:11:47.047Z [debug] 35e0f960 uploader.socket.wait waiting for socket connection
::ffff:127.0.0.1 - - [31/Oct/2022:10:11:47 +0000] "POST /drive/get/fileId HTTP/1.1" 200 48 "http://127.0.0.1:5173/" "Mozilla"
companion: 2022-10-31T10:11:47.077Z [debug] 35e0f960 uploader.socket.wait socket connection received
companion: 2022-10-31T10:11:47.077Z [debug] 1fdb5dcd-83d0-4e6f-8059-76b3ea7c2579null Socket connection received. Starting remote download/upload.
companion: 2022-10-31T10:11:47.077Z [debug] 35e0f960 controller.get.provider.size need to download the whole file first
companion: 2022-10-31T10:11:47.077Z [debug] 35e0f960 uploader.download fully downloading file
companion: 2022-10-31T10:11:47.079Z [debug] 35e0f960 uploader.total.progress 7264 151981 4.78%
companion: 2022-10-31T10:11:48.107Z [debug] 35e0f960 uploader.total.progress 7645 151981 5.03%
companion: 2022-10-31T10:11:50.254Z [debug] 35e0f960 uploader.total.progress 41785 151981 27.49%
companion: 2022-10-31T10:11:51.300Z [debug] 35e0f960 uploader.total.progress 44541 151981 29.31%
companion: 2022-10-31T10:11:53.013Z [debug] 35e0f960 uploader.total.progress 64370 151981 42.35%
companion: 2022-10-31T10:11:55.044Z [debug] 35e0f960 uploader.total.progress 74629 151981 49.10%
companion: 2022-10-31T10:11:55.101Z [debug] 35e0f960 uploader.download finished fully downloading file
companion: 2022-10-31T10:12:02.700Z [debug] 35e0f960 uploader.total.progress 75990 151981 50.00%
companion: 2022-10-31T10:12:04.655Z [debug] 35e0f960 uploader.total.progress 151981 151981 100.00%
companion: 2022-10-31T10:12:04.656Z [debug] 35e0f960 cleanup

Companion 2 `http://127.0.0.1:3022` logs

companion: 2022-10-31T10:11:47.073Z [info] socket.connect connection received from 35e0f960-70b3-4656-a469-d0c66543512d

Load balancer `http://127.0.0.1:3020` logs

req POST http://127.0.0.1:3021 /drive/get/fileId
upgrade http://127.0.0.1:3022 /api/35e0f960-70b3-4656-a469-d0c66543512d

As you can see, Companion 1 gets the first /drive/get request and logs waiting for socket connection, then after the Companion 2 receives its websocket request, it forwards this connection:token event onto redis, and Companion 1 logs socket connection received because its promise resolves upon receiving this event from redis. So to me it looks like it's working as intented.

Here is the code of the simple load balancer that I created:

const http = require('http');
const httpProxy = require('http-proxy');

const proxy = httpProxy.createProxyServer({ ws: true });

let i = 0;

const getTarget = () => (i % 2 === 0 ? 'http://127.0.0.1:3021' : 'http://127.0.0.1:3022');

const server = http.createServer(function(req, res) {
  const target = getTarget();
  console.log('req', req.method, target, req.url);
  proxy.web(req, res, { target });
  i++;
});

server.on('upgrade', function (req, socket, head) {
  const target = getTarget();
  console.log('upgrade', target, req.url);
  proxy.ws(req, socket, head, { target });
  i++;
});

console.log('listening');
server.listen(3020);

Here is my companion configuration:

companion1.sh:

env \
COMPANION_DATADIR="./output1" \
COMPANION_DOMAIN="localhost:3020" \
COMPANION_PROTOCOL="http" \
COMPANION_PORT=3021 \
COMPANION_CLIENT_ORIGINS="" \
COMPANION_SECRET="development" \
COMPANION_ALLOW_LOCAL_URLS="true" \
COMPANION_REDIS_URL=redis://localhost:6379 \
COMPANION_GOOGLE_KEY="key" \
COMPANION_GOOGLE_SECRET="secret" \
nodemon --watch packages/@uppy/companion/src --exec node ./packages/@uppy/companion/src/standalone/start-server.js

companion2.sh: same as companion1.sh but with COMPANION_PORT=3022

Are you running the latest version of Companion?
Could you try to run my proxy and see if that works?
Could you provide more information about your setup to help me reproduce the problem?

mifi · 2022-11-21T05:49:29Z

We have now deployed a multi-instance setup of Companion on uppy.io (runs behind Heroku's built in SSL terminating router / load balancer).

So when you upload a file there, all the requests will be distributed between two companion instances. This seems to be working nicely, so I will consider it a proof that multi-instance companion works as expected.

There were a couple of things I had to make sure for it to work:

The two instances obviously need to be connected to the same redis server
You need to set COMPANION_SECRET to the same value on both servers (and optionally COMPANION_PREAUTH_SECRET if you use that)

I'm going to close this but can reopen if people are still having trouble

gabiganam added Bug Triage labels Mar 7, 2022

Murderlon added Question and removed Bug Triage labels Mar 8, 2022

Murderlon assigned kvz and mifi Mar 8, 2022

Murderlon added 💬 Discussion Companion The auth server (for Instagram, GDrive, etc) and upload proxy (for S3) labels Mar 17, 2022

Murderlon mentioned this issue Mar 21, 2022

Docs note suggestion about Companion state #3584

Closed

transloadit deleted a comment from cmsnegar Mar 21, 2022

Murderlon mentioned this issue Jun 22, 2022

"Cannot find state in session" when trying to use "oauthDomain" for multiple companion servers #3843

Closed

mifi closed this as completed Nov 21, 2022

mifi mentioned this issue Nov 21, 2022

Smoke test uppy.io on CI deploy #4226

Open

2 tasks

mifi added a commit that referenced this issue Nov 21, 2022

document Companion multi-instance feature #3538

2dbbe79

movy mentioned this issue Mar 14, 2024

{ message: "Not Found" } when settings COMPANION_PATH="/" #4271

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uploads don't start until a client connects, fail in multi-instances environment #3538

Uploads don't start until a client connects, fail in multi-instances environment #3538

gabiganam commented Mar 7, 2022 •

edited

Loading

Murderlon commented Mar 8, 2022

Mosharush commented Mar 8, 2022 •

edited

Loading

mifi commented Mar 8, 2022

gabiganam commented Mar 14, 2022

mifi commented Mar 15, 2022

mifi commented Mar 16, 2022 •

edited

Loading

mifi commented Mar 16, 2022 •

edited

Loading

Mosharush commented Mar 20, 2022 •

edited

Loading

mifi commented Mar 20, 2022

Murderlon commented Mar 21, 2022

mifi commented Mar 21, 2022

Murderlon commented Mar 21, 2022

Murderlon commented Mar 21, 2022

gabiganam commented Mar 21, 2022

mifi commented Mar 21, 2022

gabiganam commented Mar 21, 2022

mifi commented Oct 31, 2022

mifi commented Nov 21, 2022

Uploads don't start until a client connects, fail in multi-instances environment #3538

Uploads don't start until a client connects, fail in multi-instances environment #3538

Comments

gabiganam commented Mar 7, 2022 • edited Loading

Murderlon commented Mar 8, 2022

Mosharush commented Mar 8, 2022 • edited Loading

mifi commented Mar 8, 2022

gabiganam commented Mar 14, 2022

mifi commented Mar 15, 2022

mifi commented Mar 16, 2022 • edited Loading

mifi commented Mar 16, 2022 • edited Loading

Mosharush commented Mar 20, 2022 • edited Loading

mifi commented Mar 20, 2022

Murderlon commented Mar 21, 2022

mifi commented Mar 21, 2022

Murderlon commented Mar 21, 2022

Murderlon commented Mar 21, 2022

gabiganam commented Mar 21, 2022

mifi commented Mar 21, 2022

gabiganam commented Mar 21, 2022

mifi commented Oct 31, 2022

Companion 1 http://127.0.0.1:3021 logs

Companion 2 http://127.0.0.1:3022 logs

Load balancer http://127.0.0.1:3020 logs

mifi commented Nov 21, 2022

gabiganam commented Mar 7, 2022 •

edited

Loading

Mosharush commented Mar 8, 2022 •

edited

Loading

mifi commented Mar 16, 2022 •

edited

Loading

mifi commented Mar 16, 2022 •

edited

Loading

Mosharush commented Mar 20, 2022 •

edited

Loading

Companion 1 `http://127.0.0.1:3021` logs

Companion 2 `http://127.0.0.1:3022` logs

Load balancer `http://127.0.0.1:3020` logs