Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

https connection times out #8107

Closed
timonson opened this issue Oct 24, 2020 · 27 comments
Closed

https connection times out #8107

timonson opened this issue Oct 24, 2020 · 27 comments
Labels
bug Something isn't working correctly cli related to cli/ dir

Comments

@timonson
Copy link
Contributor

timonson commented Oct 24, 2020

I have a problem running an https server and I can't solve it on my own. I tried various things which include:

  • switching linux distros
  • using different ssl certificates
  • changing server configurations
  • different domains
  • comparing to http server
  • and more

The problem is that after an indeterminable time, but mostly about 20 hours, the server stops working in the sense that requests just seem to timeout: curl: (28) Failed to connect to xyz.com port 443: Connection timed out. Before this time period, everything works prefectly. But the deno process is still running and the port is still used:
deno_ps_ax-2020-10-24
deno_lsof-2020-10-24

I don't use an nginx proxy server; just deno directly on a linux distro.
I execute deno with sudo and I also tried to start deno with pm2. In all cases it stops working after a couple of hours.

The server runs on AWS Lightsail and I can rule out that a huge amount of requests is causing this (maybe 10 requests per hour).

I can also say with certainty that this does not happen with a deno http server nor with a node https server, which I both tested at least once for two days.

There was no big CPU or memory load which I could see in the monitoring tools. I only respond with hello world to requests at the moment.
If necessary I would be happy to give ssh access to somebody who knows how to debug this issue.

Thanks a lot!

@lucacasonato
Copy link
Member

I have spun up a system on GCP (e2-medium, 2 vCPUs, 4 GB memory) with this code:

import { listenAndServeTLS } from "https://deno.land/[email protected]/http/server.ts";

const body = "Hello HTTPS";
const options = {
  port: 443,
  certFile: "/etc/letsencrypt/live/denohttpstest.lcas.dev/fullchain.pem",
  keyFile: "/etc/letsencrypt/live/denohttpstest.lcas.dev/privkey.pem",
};
listenAndServeTLS(options, (req) => {
  req.respond({ body });
});

I have also set up uptime checks to see if it goes down. Let's see if I can reproduce this.

@timonson
Copy link
Contributor Author

I am using a 512 MB RAM, 1 vCPU, 20 GB SSD machine. But I could not see high resources usage. Otherwise just dm me and I send you my ssh key.

@kitsonk kitsonk added the needs investigation requires further investigation before determining if it is an issue or not label Oct 24, 2020
@lucacasonato lucacasonato added bug Something isn't working correctly cli related to cli/ dir and removed needs investigation requires further investigation before determining if it is an issue or not labels Oct 25, 2020
@lucacasonato
Copy link
Member

lucacasonato commented Oct 25, 2020

I can confirm this bug. My test server went down exactly 5 hours (on the minute) after starting it. The entire server log:

lucacasonato@deno-https-test:~$ sudo /home/lucacasonato/.deno/bin/deno run -A server.ts
TLS alert received: Message {
    typ: Alert,
    version: TLSv1_3,
    payload: Alert(
        AlertMessagePayload {
            level: Fatal,
            description: CertificateUnknown,
        },
    ),
}
TLS alert received: Message {
    typ: Alert,
    version: TLSv1_3,
    payload: Alert(
        AlertMessagePayload {
            level: Fatal,
            description: CertificateUnknown,
        },
    ),
}
Illegal SNI hostname received [51, 52, 46, 49, 50, 50, 46, 49, 55, 48, 46, 49, 56, 51]
Sending fatal alert DecodeError
Sending fatal alert ProtocolVersion
Illegal SNI hostname received [51, 52, 46, 49, 50, 50, 46, 49, 55, 48, 46, 49, 56, 51]
Sending fatal alert DecodeError
error: Uncaught ConnectionReset: Connection reset by peer (os error 104)
    at processResponse (core.js:226:13)
    at Object.jsonOpAsync (core.js:244:12)
    at async TLSListener.accept (deno:cli/rt/40_tls.js:42:19)
    at async Server.acceptConnAndIterateHttpRequests (server.ts:212:14)
    at async MuxAsyncIterator.callIteratorNext (mux_async_iterator.ts:30:31)
lucacasonato@deno-https-test:~$ 

I am going to rerun the test, but add more metrics to the log so we can tell what happens when.

@rvdende
Copy link

rvdende commented Oct 25, 2020

I've been battling this one as well.. and its a bit out of my league at the moment. Hoping @lucacasonato you can help! Thank you

I don't know if the Illegal SNI hostname has anything to do with the ConnectionReset though opine might hide that from my log.

Screenshot 2020-10-25 at 19 34 39

@rvdende
Copy link

rvdende commented Oct 25, 2020

more searching found this: https://en.it1352.com/article/6b9299e70af448e3a86b63562103ae4e.html

Solution
I believe the problem is that you are not fully reading the data sent from the client, so the client never has a chance to transition to reading the response. When it tries to write more data, it notices the socket has been closed and fails.

@lucacasonato
Copy link
Member

The server crashed again after a little over 7 hours. Log of server with -Ldebug here: https://0x0.st/ikSM.txt

@rvdende
Copy link

rvdende commented Oct 28, 2020

In the mean time I am using nginx to reverse proxy https to http instead. That seems to have solved the immediate need for this fix

@rvdende
Copy link

rvdende commented Oct 29, 2020

12 hours later server crashed on http instead of https, related issue https://github.com/denoland/deno/issues/6319 Uncaught BrokenPipe: Broken pipe (os error 32)

This might mean that Deno is unable to run a web api reliably at this moment?

@tveronezi
Copy link
Contributor

tveronezi commented Oct 30, 2020

@lucacasonato One of the last lines in this log is...

DEBUG RS - deno::tsc:757 - compiled filename: "/root/.cache/deno/gen/https/deno.land/37e30d67ded9172ae43b9aaedbaeb1788b75e43c942e5ec21f0cc717b20eb0aa.js"

This is the tsc stating it compiled the same file as the initial server.ts. Why did deno compiles that file twice? Actually it also compiled https://deno.land/[email protected]/async/delay.ts before crashing. Maybe that's why we have a broken pipe: deno recompiled the server and its dependencies and didnt properly closed pending connections.

Anyway, the deno::tsc:757 - compiled filename: log message is coming from the previous version of deno. https://github.com/denoland/deno/blob/v1.4.6/cli/tsc.rs#L757

@timonson @lucacasonato @rvdende

Maybe try again with the version 1.5?

@rvdende
Copy link

rvdende commented Oct 30, 2020

Strange that it would compile so far into the process, what could be triggering this?
Even if deno is running in watch mode it should cleanly restart the process.

@ry
Copy link
Member

ry commented Oct 30, 2020

A guess is that we encounter some TLS error but don't propagate it properly.

We should try setting up a TLS server and sending it some garbage data - does it crash? If so, we can probably easily fix it.

@timonson
Copy link
Contributor Author

timonson commented Oct 30, 2020

@tveronezi I started a new https server with deno 1.5 after I read your post. It crashed after about 9 hrs again. If you need my Ldebug file, let me know, because I saved it. On the other hand, my http sever keeps running. Only https is crashing.

@bartlomieju bartlomieju mentioned this issue Nov 17, 2020
22 tasks
@mrkurt
Copy link
Contributor

mrkurt commented Nov 17, 2020

For what it's worth, this seems to happen with the HTTP server and no TLS (just like @rvdende saw). We're running about 2k requests per second through the http server code and it seems like there are a whole bunch of uncatchable errors. It is difficult to keep running. ;)

This PR is getting some of them, but the connection reset error we got still slipped through: #8365

Actually, seems like req.respond is a promise you can handle these on.

@bartlomieju
Copy link
Member

I believe this issue will be fixed in #8365

@bartlomieju
Copy link
Member

I'm gonna provisionally close this issue because #8365 has landed. Let's reopen if the problem keeps happening.

@timonson
Copy link
Contributor Author

timonson commented Dec 1, 2020

Unfortunately this issue has not been resolved with the latest deno version:

DEBUG RS - deno_core::shared_queue:187 - rust:shared_queue:pre-push: op=74, off=812, end=824, len=12, aligned_end=824
DEBUG RS - deno_core::shared_queue:207 - rust:shared_queue:push: num_records=1, num_shifted_off=0, head=824
DEBUG RS - deno_core::shared_queue:187 - rust:shared_queue:pre-push: op=74, off=812, end=824, len=12, aligned_end=824
DEBUG RS - deno_core::shared_queue:207 - rust:shared_queue:push: num_records=1, num_shifted_off=0, head=824
DEBUG RS - rustls::server::hs:666 - decided upon suite SupportedCipherSuite { suite: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, kx: ECDHE, bulk: AES_256_GCM, hash:
 SHA384, sign: Some([RSA_PSS_SHA512, RSA_PSS_SHA384, RSA_PSS_SHA256, RSA_PKCS1_SHA512, RSA_PKCS1_SHA384, RSA_PKCS1_SHA256]), enc_key_len: 32, fixed_iv_len: 4,
explicit_nonce_len: 8 }
DEBUG RS - rustls::server::tls12:346 - Session saved
DEBUG RS - deno_core::shared_queue:187 - rust:shared_queue:pre-push: op=108, off=812, end=984, len=172, aligned_end=984
DEBUG RS - deno_core::shared_queue:207 - rust:shared_queue:push: num_records=1, num_shifted_off=0, head=984
DEBUG RS - deno::ops::io:243 - read rid=28
DEBUG RS - deno_core::shared_queue:187 - rust:shared_queue:pre-push: op=74, off=812, end=824, len=12, aligned_end=824
DEBUG RS - deno_core::shared_queue:207 - rust:shared_queue:push: num_records=1, num_shifted_off=0, head=824
DEBUG RS - deno::ops::io:361 - write rid=28
DEBUG RS - deno_core::shared_queue:187 - rust:shared_queue:pre-push: op=75, off=812, end=824, len=12, aligned_end=824
DEBUG RS - deno_core::shared_queue:207 - rust:shared_queue:push: num_records=1, num_shifted_off=0, head=824
DEBUG RS - deno::ops::io:243 - read rid=28
DEBUG RS - deno_core::shared_queue:187 - rust:shared_queue:pre-push: op=74, off=812, end=824, len=12, aligned_end=824
DEBUG RS - deno_core::shared_queue:207 - rust:shared_queue:push: num_records=1, num_shifted_off=0, head=824
DEBUG RS - deno_core::shared_queue:187 - rust:shared_queue:pre-push: op=108, off=812, end=926, len=114, aligned_end=928
DEBUG RS - deno_core::shared_queue:207 - rust:shared_queue:push: num_records=1, num_shifted_off=0, head=928
^[[0m^[[1m^[[31merror^[[0m: Uncaught (in promise) NotConnected: Transport endpoint is not connected (os error 107)
    at ^[[0m^[[1m^[[3mprocessResponse^[[0m (^[[0m^[[36mcore.js^[[0m:^[[0m^[[33m223^[[0m:^[[0m^[[33m11^[[0m)
    at ^[[0m^[[1m^[[3mObject.jsonOpAsync^[[0m (^[[0m^[[36mcore.js^[[0m:^[[0m^[[33m240^[[0m:^[[0m^[[33m12^[[0m)
    at ^[[0m^[[38;5;8masync ^[[0m^[[0m^[[1m^[[3m^[[38;5;8mTLSListener.accept^[[0m ^[[0m^[[38;5;8m(^[[0m^[[0m^[[38;5;8mdeno:cli/rt/40_tls.js^[[0m^[[0m^[[38;5;8m
:^[[0m^[[0m^[[38;5;8m42^[[0m^[[0m^[[38;5;8m:^[[0m^[[0m^[[38;5;8m19^[[0m^[[0m^[[38;5;8m)^[[0m
    at ^[[0m^[[38;5;8masync ^[[0m^[[0m^[[1m^[[3mServer.acceptConnAndIterateHttpRequests^[[0m (^[[0m^[[36mserver.ts^[[0m:^[[0m^[[33m223^[[0m:^[[0m^[[33m14^[[0m)
    at ^[[0m^[[38;5;8masync ^[[0m^[[0m^[[1m^[[3mMuxAsyncIterator.callIteratorNext^[[0m (^[[0m^[[36mmux_async_iterator.ts^[[0m:^[[0m^[[33m30^[[0m:^[[0m^[[33m31^
[[0m)

@lideming
Copy link
Contributor

lideming commented Dec 1, 2020

@timonson #8499

@Boyceman

This comment has been minimized.

@Soremwar
Copy link
Contributor

I tested this with 1.6.0, server kept crashing after 9/10 hours. Tried it in latest canary on a 24/7 server and it's been almost 48 hours with no crashes whatsover.

@timonson
Copy link
Contributor Author

timonson commented Dec 18, 2020

In my case, 18 days ago, it took 4 days until the server crashed. So please wait another couple of days @Soremwar . I started a new server 2 days ago, too (still running).

@Soremwar
Copy link
Contributor

@timonson Will keep you informed then. This is such a tedious bug to test 🙄

@rsmirnov90
Copy link

I am running a prod api using deno 1.8.1 and it is happening for me... It is NOT crashing; rather, after a bunch of those errors show up in the console, the server just locks up and all requests to it start timing out.

It usually runs fine for 2-3 days. The protocol errors show up totally randomly btw; this is a very low-usage service and it is not being called when they are appearing.

Current workaround is to have it auto restart every 24 hours, but a more robust solution would of course be highly appreciated...

@aegooby
Copy link

aegooby commented Apr 29, 2021

Same here (Deno 1.9.2). Currently looking for a workaround but I have trouble identifying even the source of the bug itself.

@ralyodio
Copy link

Seeing this with deno 1.16.1 and oak

@cherepanov
Copy link

occurs sometimes on incoming connections without any visible reason

Deno 1.19.3
Oak 10.4.0

@kazeens
Copy link

kazeens commented Apr 24, 2023

Still seeing this issue with Deno and Oak

@ralyodio
Copy link

same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly cli related to cli/ dir
Projects
None yet
Development

No branches or pull requests