Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.7.x: panic when TLS-SNI is received for an unknown domain #5680

Closed
otbutz opened this issue Aug 3, 2023 · 86 comments
Closed

v2.7.x: panic when TLS-SNI is received for an unknown domain #5680

otbutz opened this issue Aug 3, 2023 · 86 comments
Labels
bug 🐞 Something isn't working upstream ⬆️ Relates to some dependency of this project
Milestone

Comments

@otbutz
Copy link

otbutz commented Aug 3, 2023

caddy v2.7.2 installed via the apt repository:

Aug 03 08:55:31 proxy caddy[1163492]: panic: runtime error: invalid memory address or nil pointer dereference
Aug 03 08:55:31 proxy caddy[1163492]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x903750]
Aug 03 08:55:31 proxy caddy[1163492]: goroutine 57 [running]:
Aug 03 08:55:31 proxy caddy[1163492]: github.com/caddyserver/certmagic.(*Config).getCertDuringHandshake(0xc000543520, {0x1f09a88, 0xc000194008}, _, _)
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/caddyserver/[email protected]/handshake.go:378 +0x1390
Aug 03 08:55:31 proxy caddy[1163492]: github.com/caddyserver/certmagic.(*Config).GetCertificateWithContext(0xc000543520, {0x1f09a88, 0xc000194008}, 0xc000543450)
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/caddyserver/[email protected]/handshake.go:84 +0xbff
Aug 03 08:55:31 proxy caddy[1163492]: github.com/caddyserver/certmagic.(*Config).GetCertificate(0xc000138ee0?, 0xc0001dc1b0?)
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/caddyserver/[email protected]/handshake.go:50 +0x2a
Aug 03 08:55:31 proxy caddy[1163492]: github.com/caddyserver/caddy/v2/modules/caddytls.(*ConnectionPolicy).buildStandardTLSConfig.func1(0xc000543450)
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/caddyserver/caddy/[email protected]/modules/caddytls/connpolicy.go:232 +0x14f
Aug 03 08:55:31 proxy caddy[1163492]: github.com/quic-go/qtls-go1-20.(*config).getCertificate(0xc0008d7800, 0xc000543450)
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/quic-go/[email protected]/common.go:1086 +0x42
Aug 03 08:55:31 proxy caddy[1163492]: github.com/quic-go/qtls-go1-20.(*serverHandshakeStateTLS13).pickCertificate(0xc000631be8)
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/quic-go/[email protected]/handshake_server_tls13.go:415 +0x66
Aug 03 08:55:31 proxy caddy[1163492]: github.com/quic-go/qtls-go1-20.(*serverHandshakeStateTLS13).handshake(0xc000631be8)
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/quic-go/[email protected]/handshake_server_tls13.go:60 +0x53
Aug 03 08:55:31 proxy caddy[1163492]: github.com/quic-go/qtls-go1-20.(*Conn).serverHandshake(0xc00053d180, {0x1f09a50, 0xc000562fa0})
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/quic-go/[email protected]/handshake_server.go:53 +0x188
Aug 03 08:55:31 proxy caddy[1163492]: github.com/quic-go/qtls-go1-20.(*Conn).handshakeContext(0xc00053d180, {0x1f09af8, 0xc0001753e0})
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/quic-go/[email protected]/conn.go:1540 +0x3ce
Aug 03 08:55:31 proxy caddy[1163492]: github.com/quic-go/qtls-go1-20.(*Conn).HandshakeContext(0xc00005dfd0?, {0x1f09af8?, 0xc0001753e0?})
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/quic-go/[email protected]/conn.go:1480 +0x25
Aug 03 08:55:31 proxy caddy[1163492]: created by github.com/quic-go/qtls-go1-20.(*QUICConn).Start
Aug 03 08:55:31 proxy caddy[1163492]:         github.com/quic-go/[email protected]/quic.go:179 +0xcf

Might be related to golang/go#61639 ?

@otbutz
Copy link
Author

otbutz commented Aug 3, 2023

@Animosity022
Copy link

I updated and tested 2.7.2 this morning had to roll back for this. I use a built version with CloudFlareDNS and the Security plugins only.

Aug 03 07:47:24 gemini systemd[1]: Started caddy.service - Caddy.
Aug 03 07:49:28 gemini caddy[27478]: panic: runtime error: invalid memory address or nil pointer dereference
Aug 03 07:49:28 gemini caddy[27478]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x907cf0]
Aug 03 07:49:28 gemini caddy[27478]: goroutine 842 [running]:
Aug 03 07:49:28 gemini caddy[27478]: github.com/caddyserver/certmagic.(*Config).getCertDuringHandshake(0xc0010ee410, {0x29e8728, 0xc000138008}, _, _)
Aug 03 07:49:28 gemini caddy[27478]:         github.com/caddyserver/[email protected]/handshake.go:378 +0x1390
Aug 03 07:49:28 gemini caddy[27478]: github.com/caddyserver/certmagic.(*Config).GetCertificateWithContext(0xc0010ee410, {0x29e8728, 0xc000138008}, 0xc0010ee270)
Aug 03 07:49:28 gemini caddy[27478]:         github.com/caddyserver/[email protected]/handshake.go:84 +0xbff
Aug 03 07:49:28 gemini caddy[27478]: github.com/caddyserver/certmagic.(*Config).GetCertificate(0xc0000f0ee0?, 0xc000578360?)
Aug 03 07:49:28 gemini caddy[27478]:         github.com/caddyserver/[email protected]/handshake.go:50 +0x2a
Aug 03 07:49:28 gemini caddy[27478]: github.com/caddyserver/caddy/v2/modules/caddytls.(*ConnectionPolicy).buildStandardTLSConfig.func1(0xc0010ee270)
Aug 03 07:49:28 gemini caddy[27478]:         github.com/caddyserver/caddy/[email protected]/modules/caddytls/connpolicy.go:232 +0x14f
Aug 03 07:49:28 gemini caddy[27478]: github.com/quic-go/qtls-go1-20.(*config).getCertificate(0xc001d9e480, 0xc0010ee270)
Aug 03 07:49:28 gemini caddy[27478]:         github.com/quic-go/[email protected]/common.go:1086 +0x42
Aug 03 07:49:28 gemini caddy[27478]: github.com/quic-go/qtls-go1-20.(*serverHandshakeStateTLS13).pickCertificate(0xc0005fdbe8)
Aug 03 07:49:28 gemini caddy[27478]:         github.com/quic-go/[email protected]/handshake_server_tls13.go:415 +0x66
Aug 03 07:49:28 gemini caddy[27478]: github.com/quic-go/qtls-go1-20.(*serverHandshakeStateTLS13).handshake(0xc0005fdbe8)
Aug 03 07:49:28 gemini caddy[27478]:         github.com/quic-go/[email protected]/handshake_server_tls13.go:60 +0x53
Aug 03 07:49:28 gemini caddy[27478]: github.com/quic-go/qtls-go1-20.(*Conn).serverHandshake(0xc000977500, {0x29e86f0, 0xc0005b24b0})
Aug 03 07:49:28 gemini caddy[27478]:         github.com/quic-go/[email protected]/handshake_server.go:53 +0x188
Aug 03 07:49:28 gemini caddy[27478]: github.com/quic-go/qtls-go1-20.(*Conn).handshakeContext(0xc000977500, {0x29e8798, 0xc000c90b40})
Aug 03 07:49:28 gemini caddy[27478]:         github.com/quic-go/[email protected]/conn.go:1540 +0x3ce
Aug 03 07:49:28 gemini caddy[27478]: github.com/quic-go/qtls-go1-20.(*Conn).HandshakeContext(0xbe592a?, {0x29e8798?, 0xc000c90b40?})
Aug 03 07:49:28 gemini caddy[27478]:         github.com/quic-go/[email protected]/conn.go:1480 +0x25
Aug 03 07:49:28 gemini caddy[27478]: created by github.com/quic-go/qtls-go1-20.(*QUICConn).Start
Aug 03 07:49:28 gemini caddy[27478]:         github.com/quic-go/[email protected]/quic.go:179 +0xcf
Aug 03 07:49:28 gemini systemd[1]: caddy.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 03 07:49:28 gemini systemd[1]: caddy.service: Failed with result 'exit-code'.
Aug 03 07:51:03 gemini systemd[1]: Starting caddy.service - Caddy...

@francislavoie francislavoie added the upstream ⬆️ Relates to some dependency of this project label Aug 3, 2023
@marten-seemann
Copy link
Contributor

This is where the nil pointer dereference happens: https://github.com/caddyserver/certmagic/blob/v0.19.1/handshake.go#L378

This should have been fixed by quic-go/quic-go#4001. Did you make sure that this patch is included here (released as v0.37.1)?

@francislavoie
Copy link
Member

francislavoie commented Aug 3, 2023

Yes @marten-seemann we built and released with 0.37.1

caddy/go.mod

Line 20 in e2fc08b

github.com/quic-go/quic-go v0.37.1

$ caddy build-info | grep quic                                                                                     
dep	github.com/quic-go/qpack	v0.4.0	h1:Cr9BXA1sQS2SmDUWjSofMPNKmvF6IiIfDRmgU0w1ZCo=
dep	github.com/quic-go/qtls-go1-20	v0.3.0	h1:NrCXmDl8BddZwO67vlvEpBTwT89bJfKYygxv4HQvuDk=
dep	github.com/quic-go/quic-go	v0.37.1	h1:M+mcsFq9KoxVjCetIwH65TvusW1UdRBc6zmxI6pkeD0=

@francislavoie
Copy link
Member

francislavoie commented Aug 3, 2023

Oh, we built with Go 1.20.6 though 🤔 did we need to build with 1.20.7?

Edit: Nevermind that shouldn't have mattered according to the changes in https://github.com/golang/go/issues?q=milestone%3AGo1.20.7

@marten-seemann
Copy link
Contributor

Oh, we built with Go 1.20.6 though 🤔 did we need to build with 1.20.7?

Edit: Nevermind that shouldn't have mattered according to the changes in https://github.com/golang/go/issues?q=milestone%3AGo1.20.7

That's a different issue for which I'm about to cut a quic-go patch release today (the RSA key size DoS vulnerability fixed in Go 1.20.7, which I need to backport on my crypto/tls fork). I suggested including this in the Caddy patch release in #5671 (comment).

@marten-seemann
Copy link
Contributor

We should probably figure out why this panic is occurring before I cut v0.37.2. Any hints?

@Animosity022
Copy link

I installed my first version from a download from the caddy website with the extra modules selected:

[felix@phoenix caddy]$ ./caddy build-info | head -5
go	go1.20.1
path	caddy
mod	caddy	(devel)
dep	filippo.io/edwards25519	v1.0.0	h1:0wAIcmJUqRdI8IJ/3eGi5/HwXZWPujYXXlkrQogz0Ek=
dep	github.com/AndreasBriese/bbloom	v0.0.0-20190825152654-46b345b51c96	h1:cTp8I5+VIoKjsnZuH8vjyaysT/ses3EvZeaV/1UkF2M=

My other box has it built from xcaddy with the latest GO version installed, which was 1.20.7.

[felix@gemini caddy]$ ./caddy build-info  | head -5
go	go1.20.7
path	caddy
mod	caddy	(devel)
dep	filippo.io/edwards25519	v1.0.0	h1:0wAIcmJUqRdI8IJ/3eGi5/HwXZWPujYXXlkrQogz0Ek=
dep	github.com/AndreasBriese/bbloom	v0.0.0-20190825152654-46b345b51c96	h1:cTp8I5+VIoKjsnZuH8vjyaysT/ses3EvZeaV/1UkF2M=

I've been up for ~10 minutes now without issues and it would panic after 2-3 minutes for me anyway.

@mholt
Copy link
Member

mholt commented Aug 3, 2023

@animosity22 In your build-info output, what is the version of quic-go?

@mholt
Copy link
Member

mholt commented Aug 3, 2023

@otbutz @animosity22 Additionally, can you provide a minimal reproducer for this? At least your config and possibly a request that triggers it?

@Animosity022
Copy link

Here are the two:

[felix@gemini caddy]$ ./caddy build-info | grep quic
dep	github.com/quic-go/qpack	v0.4.0	h1:Cr9BXA1sQS2SmDUWjSofMPNKmvF6IiIfDRmgU0w1ZCo=
dep	github.com/quic-go/qtls-go1-20	v0.3.0	h1:NrCXmDl8BddZwO67vlvEpBTwT89bJfKYygxv4HQvuDk=
dep	github.com/quic-go/quic-go	v0.37.1	h1:M+mcsFq9KoxVjCetIwH65TvusW1UdRBc6zmxI6pkeD0=
[felix@gemini caddy]$ ./caddy version
v2.7.2 h1:QqThyoyUFAv1B7A2NMeaWlz7xmgKqU49PXBX08A+6xg=
[felix@gemini caddy]$

and

[felix@phoenix caddy]$ ./caddy build-info | head -5
go	go1.20.1
path	caddy
mod	caddy	(devel)
dep	filippo.io/edwards25519	v1.0.0	h1:0wAIcmJUqRdI8IJ/3eGi5/HwXZWPujYXXlkrQogz0Ek=
dep	github.com/AndreasBriese/bbloom	v0.0.0-20190825152654-46b345b51c96	h1:cTp8I5+VIoKjsnZuH8vjyaysT/ses3EvZeaV/1UkF2M=
[felix@phoenix caddy]$ ./caddy build-info | grep quic
dep	github.com/quic-go/qpack	v0.4.0	h1:Cr9BXA1sQS2SmDUWjSofMPNKmvF6IiIfDRmgU0w1ZCo=
dep	github.com/quic-go/qtls-go1-20	v0.1.0	h1:d1PK3ErFy9t7zxKsG3NXBJXZjp/kMLoIb3y/kV54oAI=
dep	github.com/quic-go/quic-go	v0.32.0	h1:lY02md31s1JgPiiyfqJijpu/UX/Iun304FI3yUqX7tA=
[felix@phoenix caddy]$ ./caddy version
v2.6.4 h1:2hwYqiRwk1tf3VruhMpLcYTg+11fCdr8S3jhNAdnPy8=

My config is small I think as I only use the security plugin to do SSO auth and that protects my site.

[felix@phoenix caddy]$ cat Caddyfile
{
	email {env.EMAIL}
	storage file_system {
		root /opt/caddy/ssl
	}
	order authenticate before respond
	order authorize before basicauth

	security {
		oauth identity provider github {env.GITHUB_CLIENT_ID} {env.GITHUB_CLIENT_SECRET}

		authentication portal myportal {
			crypto default token lifetime 604800
			crypto key sign-verify {env.JWT_SHARED_KEY}
			cookie domain blah.us
			cookie lifetime 604800
			enable identity provider github
			ui {
				links {
					"My Identity" "/whoami" icon "las la-user"
				}
			}

			transform user {
				match realm github
				match sub github.com/user1234
				action add role authp/admin
			}
		}

		authorization policy mypolicy {
			set auth url https://auth.blah.us/oauth2/github
			crypto key verify {env.JWT_SHARED_KEY}
			allow roles authp/admin authp/user
			validate bearer header
			inject headers with claims
		}
	}
}

auth.blah.us {
	tls {
		dns cloudflare {env.CLOUDFLARE_API_TOKEN}
		resolvers 1.1.1.1
	}
	authenticate with myportal
}

I wasn't doing anything specific to generate as the automatic SSL renewal would kick in and that would produce the error as it happened right after startup.

@mholt
Copy link
Member

mholt commented Aug 3, 2023

@animosity22 Thanks. The phoenix instance is clearly using too-old of a quic-go version for some reason: dep github.com/quic-go/quic-go v0.32.0 . It needs to be on 0.37.1 or newer. (EDIT: Oh, but it's also running Caddy 2.6.4, not 2.7. So not relevant? Unless are you seeing this bug in 2.6.4 as well?)

Do they both have the same config? I wonder if a plugin is causing the downgrade.

But then again, @otbutz says he gets the error using Caddy from the apt repository... (meaning without plugins) 🤔 We are also deploying that Caddy on our website and I can't find this error in our logs at all.

@mholt mholt added the needs info 📭 Requires more information label Aug 3, 2023
@Animosity022
Copy link

Oh sorry, one second, I forgot I downgraded my phoenix back back to the older version, let me update it and reproduce.

@m4r-v1n
Copy link

m4r-v1n commented Aug 3, 2023

Just installed 2.7.2 on Debian 12 via this repo:
https://dl.cloudsmith.io/public/caddy/stable/deb/debian

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x903750]
goroutine 28 [running]:
github.com/caddyserver/certmagic.(*Config).getCertDuringHandshake(0xc0003cf6c0, {0x1f09a88, 0xc000042058}, _, _)
        github.com/caddyserver/[email protected]/handshake.go:378 +0x1390
github.com/caddyserver/certmagic.(*Config).GetCertificateWithContext(0xc0003cf6c0, {0x1f09a88, 0xc000042058}, 0xc0003cf5f0)
        github.com/caddyserver/[email protected]/handshake.go:84 +0xbff
github.com/caddyserver/certmagic.(*Config).GetCertificate(0xc0007f2ee0?, 0xc000855620?)
        github.com/caddyserver/[email protected]/handshake.go:50 +0x2a
github.com/caddyserver/caddy/v2/modules/caddytls.(*ConnectionPolicy).buildStandardTLSConfig.func1(0xc0003cf5f0)
        github.com/caddyserver/caddy/[email protected]/modules/caddytls/connpolicy.go:232 +0x14f
github.com/quic-go/qtls-go1-20.(*config).getCertificate(0xc000003680, 0xc0003cf5f0)
        github.com/quic-go/[email protected]/common.go:1086 +0x42
github.com/quic-go/qtls-go1-20.(*serverHandshakeStateTLS13).pickCertificate(0xc000867be8)
        github.com/quic-go/[email protected]/handshake_server_tls13.go:415 +0x66
github.com/quic-go/qtls-go1-20.(*serverHandshakeStateTLS13).handshake(0xc000867be8)
        github.com/quic-go/[email protected]/handshake_server_tls13.go:60 +0x53
github.com/quic-go/qtls-go1-20.(*Conn).serverHandshake(0xc0003e7180, {0x1f09a50, 0xc00003f130})
        github.com/quic-go/[email protected]/handshake_server.go:53 +0x188
github.com/quic-go/qtls-go1-20.(*Conn).handshakeContext(0xc0003e7180, {0x1f09af8, 0xc0005c08d0})
        github.com/quic-go/[email protected]/conn.go:1540 +0x3ce
github.com/quic-go/qtls-go1-20.(*Conn).HandshakeContext(0xc00041f7d0?, {0x1f09af8?, 0xc0005c08d0?})
        github.com/quic-go/[email protected]/conn.go:1480 +0x25
created by github.com/quic-go/qtls-go1-20.(*QUICConn).Start
        github.com/quic-go/[email protected]/quic.go:179 +0xcf

my Caddyfile:

{
	email [email protected]
}

(cors) {
	@cors_preflight method OPTIONS
	@cors header Origin {args.0}

	handle @cors_preflight {
		header Access-Control-Allow-Origin {args.0}
		header Access-Control-Allow-Methods "GET"
		header Access-Control-Allow-Headers "X-Requested-With"
		header Access-Control-Max-Age 3600
		respond "" 204
	}

	handle @cors {
		header Access-Control-Allow-Origin {args.0}
	}
}

(global-headers) {
	header {
		-Server
		Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
		X-Content-Type-Options nosniff
		X-Frame-Options SAMEORIGIN
	}
}

(php) {
	php_fastcgi unix//run/php/php8.1-fpm.sock
}

import sites/*

and an example sites/site:

example.com {

	root * /srv/www/example/public
	file_server
	encode zstd gzip

	import global-headers

}

@Animosity022
Copy link

Hmm, I've tried a bit more on the reproducing the issue side with a few scenarios.

  • I deleted my ssl directory and let it regenerate certs on my non important box and that worked without issue
  • I swapped around binaries from the website download and ones built by xcaddy and that worked without issue

@raregems-io - if you restart a few times, does it clear up?

@mholt
Copy link
Member

mholt commented Aug 3, 2023

Thanks for the info. I'll try to reproduce it.

I should point out that our production website and the forums are both using 2.7.2 with similar configs without issues.

@marten-seemann
Copy link
Contributor

lmk if there’s anything you need from the quic-go side. I’ll be holding off on cutting the RSA patch release until we’ve reached a conclusion here.

@Animosity022
Copy link

Yeah, I hate bugs like this as it happened to me twice on one box and zero times on my other box. Same build process, identical config minus 2 different site names.

Aug  3 07:45:49 gemini caddy[27228]: panic: runtime error: invalid memory address or nil pointer dereference
Aug  3 07:49:28 gemini caddy[27478]: panic: runtime error: invalid memory address or nil pointer dereference

and since those 2, I've tried restarting ~30 times and I can't get it to reproduce.

@mholt
Copy link
Member

mholt commented Aug 3, 2023

@marten-seemann Thanks. Really appreciate your collaboration and patience. Are there any other code paths that could allow that Conn() or RemoteAddr() to be nil? Given the spurious nature of the reports, it seems like there's either some nondeterminism here or some other code path that we didn't think of that causes them to not be populated.

@bt90
Copy link
Contributor

bt90 commented Aug 3, 2023

Is that codepath always used or only if a on-demand certificate is created? That would explain why the error isn't that common as the mechanism for HTTP2 and 1.1 isn't affected.

@mholt
Copy link
Member

mholt commented Aug 3, 2023

@raregems-io I am unable to reproduce the issue with that config, using HTTP/3, and trying in a loop of about a thousand requests... Tried restarting the server several times and still can't reproduce it.

@RainmakerRaw
Copy link

I seem to have this issue also, but missed this existing one (sorry!). See #5681 and merge if desired. Apologies!

@RainmakerRaw
Copy link

RainmakerRaw commented Aug 3, 2023

@mholt How can I help you reproduce this? It happens every single time I visit the URL without fail. Restart the caddy.service and it's fine (and serving the other sites without issue), but visit Plex or Jellyfin and boom, issue recurs again and again. I'm happy to liaise with you if you want to let me know how I can help?

e: I notice the site loads OK in Firefox, no issues. Dev tools shows some items using http 1.1 and others (over half the resources) are loaded with http3. Using Brave, the site refuses to load at all and the Caddy server crashes. I wonder if the agent you're testing from is having an impact (or lack of)?

@mholt
Copy link
Member

mholt commented Aug 3, 2023

@RainmakerRaw I too have a Jellyfin installation. I can try proxying to it? But I feel like it shouldn't matter what kind of content we're serving, since the bug occurs before connections are even established.

e: I notice the site loads OK in Firefox, no issues. Dev tools shows some items using http 1.1 and others (over half the resources) are loaded with http3. Using Brave, the site refuses to load at all and the Caddy server crashes. I wonder if the agent you're testing from is having an impact (or lack of)?

If that's the case, then why aren't we seeing this on our production caddyserver.com and caddy.community sites?

The bug does seem sporadic though. @marten-seemann I dunno if any of this info is helpful, but I'm starting to wonder if there's another code path we missed. I'm not too familiar with the quic-go library though to know.

@RainmakerRaw
Copy link

@mholt Aha! I don't know what effect this has, but I can bypass the issue and make Brave work again. On my test desktop client, Firefox was using DoH by default (my own AdGuard Home installation on a VPS). Brave was set to use the local DNS server, and causes Caddy to crash. When I set Brave to also use offsite DoH the Jellyfin server loads fine again and Caddy doesn't crash. I'll be honest, I don't know what this means, but it's progress maybe?

@RainmakerRaw
Copy link

RainmakerRaw commented Aug 3, 2023

Further digging: Cloudflare has enabled ECH (encrypted client hello) on my domain. The Caddy server (and Jellyfin server) are running on the same LAN as my test desktop device - i.e. I'm hairpinning back in through the router to connect.

The local DNS server and the remote DoH server are both AdGuard Home using the same encrypted upstreams, but I wonder if something's getting mangled and causing the crash? I can connect to Plex and Jellyfin (using their domain not the local address) perfectly now I have enabled DoH in the browser. Both browsers are set to use ECH where available - is yours?

In summary, ECH is enabled on both browsers. With DoH and ECH enabled, Firefox connects to Jellyfin fine and Caddy doesn't crash. With Brave, ECH being enabled but DoH not causes Caddy to crash and no page loads. Adding DoH to the ECH on Brave makes everything work OK again.

Edit: Disabling ECH in the browser allows the Jellyfin site to load over QUIC with or without DoH enabled. I reloaded dozens of times without incident and Caddy no longer crashes. This seems to be ECH related.

@marten-seemann
Copy link
Contributor

@marten-seemann I dunno if any of this info is helpful, but I'm starting to wonder if there's another code path we missed. I'm not too familiar with the quic-go library though to know.

Good thinking! In fact there is: tls.Config.GetCertificate also passes in a ClientHelloInfo. I feel really stupid for missing this 😔. For some reason I thought that tls.Config.GetConfigForClient would be the only place, and didn't even bother checking. This means we'll need to wrap GetCertificate as well, equivalent to what we do in https://github.com/quic-go/quic-go/pull/4001/files#diff-c1eebd1c52f66da524b99c57f22669500f22784ae80475fbf2b786bc0d1ba278.

Fix coming later today. This will be included in the v0.37.2 patch release.

@RainmakerRaw
Copy link

@marten-seemann This ties in with my last edit above. ECH was enabled in Brave for me, and I needed to also enable DoH to not have Caddy crash. Enabling ECH + DoH meant no crash and the page loads. Disabling ECH in the browser means the page loads (and Caddy doesn't crash) regardless of DoH status.

@francislavoie
Copy link
Member

francislavoie commented Aug 4, 2023

Thanks @marten-seemann, I can confirm my reproduce case from #5680 (comment) is fixed by quic-go/quic-go#4016.

I think the panic @W0n9 reported is a separate issue though. I don't know how to replicate that problem. A new issue should probably be opened for that.

@mholt
Copy link
Member

mholt commented Aug 4, 2023

@RainmakerRaw @otbutz @animosity22 @raregems-io @bt90 Can some of you also confirm that the patch at quic-go/quic-go#4016 works for you? You will need to build Caddy with that version of quic-go. Here's an easy way using xcaddy: xcaddy build --with github.com/quic-go/quic-go=github.com/quic-go/quic-go@280054c -- or you can simply modify the go.mod of Caddy to refer to that commit (go get github.com/quic-go/quic-go@280054c) if you're cloning the source manually.

@RainmakerRaw
Copy link

@mholt Sorry, per my last reply I fixed the crash (not the underlying cause, just the manifestation of it) by changing my DNS and ECH settings on Cloudflare. I had my root domain proxied by CF but the subdomains in question were CNAMES yet not proxied. This seems to have caused issues with DNS (tagging my Jellyfin server with an ECH https answer that wasn't valid).

Once I removed the CF proxy from the root domain, DNS answers changed to reflect the CNAME and root domain, and Caddy no longer crashed. I can replicate the original conditions and wait for DNS to propagate and caches to expire (I chain from DHCP > local DNS > my upstream DNS) to test if you wish; it'll just take me some time. I had already built the binary with that patch, and put it in a dir on the server just in case...

@mholt
Copy link
Member

mholt commented Aug 4, 2023

@RainmakerRaw A confirmation would be helpful. The changes you made are unrelated -- even if correct -- except that they just happen to make it so that the clients send a known/expected SNI to Caddy, which is the real reason it "fixed" it for you, as you alluded to. The bug is exposed when the SNI is not known and Caddy doesn't have a cert. You don't have to revert your changes necessarily, just send a request to Caddy with an unknown SNI/hostname.

Thanks for your help thus far :)

@RainmakerRaw
Copy link

@mholt Ah, I understand what you were asking now. Following the earlier example with Docker and curl-http3 (substituting for justdanz/curl-http3 because I'm on arm64), I tried with Caddy v2.7.2 as a control and it did indeed crash.

I then tried with GOOS=linux GOARCH=arm64 ./xcaddy build --with github.com/caddy-dns/cloudflare --with github.com/quic-go/quic-go=github.com/quic-go/quic-go@280054c and there is no crash after several minutes of running. I hope this is helpful.

@mholt
Copy link
Member

mholt commented Aug 4, 2023

Awesome, thank you. Confirmations of the fix from anyone else experiencing the issue would be valuable as we await the patch release! 💯

@AlyoshaVasilieva
Copy link

Testing my server with curl -v https://google.com --connect-to google.com:443:MY_SERVER_IP:443 --http3-only

Caddy crash: 2.7.2 built with xcaddy build --with github.com/caddyserver/cache-handler --with github.com/caddyserver/replace-response
Works: 2.7.2 built with xcaddy build --with github.com/caddyserver/cache-handler --with github.com/caddyserver/replace-response --with github.com/quic-go/quic-go=github.com/quic-go/quic-go@280054c

@Animosity022
Copy link

I did a quick test as well:

docker run -it --rm ymuski/curl-http3 curl -v https://google.com --connect-to google.com:443:192.168.1.25:443 --http3-only

2.7.2 fails but the updated build works.

@mholt
Copy link
Member

mholt commented Aug 4, 2023

Excellent, thank you both!

@marten-seemann We now have numerous confirmations of the patch working, so I'm confident with your fix. 💯 Thank you thank you.

@wazerstar
Copy link

For whatever its worth, I tested on windows as well and I no longer receive TLS handshake error spam after building with

xcaddy build master --with github.com/caddy-dns/cloudflare --with github.com/caddyserver/transform-encoder --with github.com/WeidiDeng/caddy-cloudflare-ip --with github.com/quic-go/quic-go=github.com/quic-go/quic-go@280054c

@wazerstar
Copy link

Oh, as I just checked again I got handshake error now?

{"level":"debug","ts":1691191817.8557942,"logger":"http.stdlib","msg":"http: TLS handshake error from 192.168.1.26:56616: no certificate available for '192.168.1.4'"}

@bt90
Copy link
Contributor

bt90 commented Aug 4, 2023

You don't have a certificate for the raw IP, so that is expected and only logged with level debug.

@wazerstar
Copy link

You don't have a certificate for the raw IP, so that is expected and only logged with level debug.

Right, I should not stay up this late, was just focusing on watching the errors, thats from the server and not cloudflare, so all good.

@W0n9
Copy link

W0n9 commented Aug 5, 2023

Thanks @marten-seemann, I can confirm my reproduce case from #5680 (comment) is fixed by quic-go/quic-go#4016.

I think the panic @W0n9 reported is a separate issue though. I don't know how to replicate that problem. A new issue should probably be opened for that.

@RainmakerRaw @otbutz @animosity22 @raregems-io @bt90 Can some of you also confirm that the patch at quic-go/quic-go#4016 works for you? You will need to build Caddy with that version of quic-go. Here's an easy way using xcaddy: xcaddy build --with github.com/quic-go/quic-go=github.com/quic-go/quic-go@280054c -- or you can simply modify the go.mod of Caddy to refer to that commit (go get github.com/quic-go/quic-go@280054c) if you're cloning the source manually.

I have tried this new commit, and panic again. I think I shoud open a new issue.😥

xcaddy build    --with github.com/caddy-dns/duckdns     --with github.com/W0n9/caddy_waf_plugin   --with github.com/quic-go/quic-go=github.com/quic-go/quic-go@280054c   --output /root/caddy
8月 05 10:24:36 NewCaddy caddy[1905066]: panic: runtime error: invalid memory address or nil pointer dereference
8月 05 10:24:36 NewCaddy caddy[1905066]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0xa6532a]
8月 05 10:24:36 NewCaddy caddy[1905066]: goroutine 31817 [running]:
8月 05 10:24:36 NewCaddy caddy[1905066]: github.com/quic-go/quic-go/internal/ackhandler.(*sentPacketHandler).ReceivedAck(0xc003a69600?, 0xc003a69600?, 0x5?, {0xc000fe9802?, 0x10200002c?, 0x2b87520?})
8月 05 10:24:36 NewCaddy caddy[1905066]:         github.com/quic-go/[email protected]/internal/ackhandler/sent_packet_handler.go:298 +0x6a
8月 05 10:24:36 NewCaddy caddy[1905066]: github.com/quic-go/quic-go.(*connection).handleAckFrame(0xc000fe9800, 0xc003a69640, 0x60?)
8月 05 10:24:36 NewCaddy caddy[1905066]:         github.com/quic-go/[email protected]/connection.go:1484 +0x4e
8月 05 10:24:36 NewCaddy caddy[1905066]: github.com/quic-go/quic-go.(*connection).handleFrame(0xc000fe9800, {0x1f11860?, 0xc003a69640?}, 0x0?, {{0xd1, 0x1c, 0x94, 0x5, 0x0, 0x0, ...}, ...})
8月 05 10:24:36 NewCaddy caddy[1905066]:         github.com/quic-go/[email protected]/connection.go:1280 +0xed
8月 05 10:24:36 NewCaddy caddy[1905066]: github.com/quic-go/quic-go.(*connection).handleFrames(0xc000fe9800, {0xc00169520e?, 0xc000e69860?, 0x40deca?}, {{0xd1, 0x1c, 0x94, 0x5, 0x0, 0x0, ...}, ...}, ...)
8月 05 10:24:36 NewCaddy caddy[1905066]:         github.com/quic-go/[email protected]/connection.go:1253 +0x365
8月 05 10:24:36 NewCaddy caddy[1905066]: github.com/quic-go/quic-go.(*connection).handleUnpackedLongHeaderPacket(0xc000fe9800, 0xc001cae780, 0xa7?, {0x60?, 0x0?, 0x2b87520?}, 0x4a)
8月 05 10:24:36 NewCaddy caddy[1905066]:         github.com/quic-go/[email protected]/connection.go:1194 +0x6aa
8月 05 10:24:36 NewCaddy caddy[1905066]: github.com/quic-go/quic-go.(*connection).handleLongHeaderPacket(0xc000fe9800, {0xc0023da880, {0x1f130e8, 0xc001cae690}, {0xc12b8a990dc0ca26, 0x1e4bfa1fc8, 0x2b87520}, {0xc001695200, 0x4a, 0x5ac}, ...}, ...)
8月 05 10:24:36 NewCaddy caddy[1905066]:         github.com/quic-go/[email protected]/connection.go:967 +0x765
8月 05 10:24:36 NewCaddy caddy[1905066]: github.com/quic-go/quic-go.(*connection).handlePacketImpl(0xc000fe9800, {0xc0023da880, {0x1f130e8, 0xc001cae690}, {0xc12b8a990dc0ca26, 0x1e4bfa1fc8, 0x2b87520}, {0xc001695200, 0x4a, 0x5ac}, ...})
8月 05 10:24:36 NewCaddy caddy[1905066]:         github.com/quic-go/[email protected]/connection.go:849 +0x1a8
8月 05 10:24:36 NewCaddy caddy[1905066]: github.com/quic-go/quic-go.(*connection).run(0xc000fe9800)
8月 05 10:24:36 NewCaddy caddy[1905066]:         github.com/quic-go/[email protected]/connection.go:560 +0x525
8月 05 10:24:36 NewCaddy caddy[1905066]: created by github.com/quic-go/quic-go.(*baseServer).handleInitialImpl
8月 05 10:24:36 NewCaddy caddy[1905066]:         github.com/quic-go/[email protected]/server.go:672 +0x7cd
8月 05 10:24:36 NewCaddy systemd[1]: caddy.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
8月 05 10:24:36 NewCaddy systemd[1]: caddy.service: Failed with result 'exit-code'.

@marten-seemann
Copy link
Contributor

marten-seemann commented Aug 5, 2023

@W0n9 A fix for both panics is in the works, and will be included in the v0.37.2 release.

@francislavoie
Copy link
Member

francislavoie commented Aug 5, 2023

@W0n9 can you try a build with quic-go/quic-go#4018 ? You can use --with github.com/quic-go/quic-go=github.com/quic-go/quic-go@ack-after-handshake-complete

@W0n9
Copy link

W0n9 commented Aug 6, 2023

@W0n9 can you try a build with quic-go/quic-go#4018 ? You can use --with github.com/quic-go/quic-go=github.com/quic-go/quic-go@ack-after-handshake-complete

I have tried caddy 2.7.3, it fixed for me👍

@wazerstar
Copy link

Hey @mholt , @marten-seemann I'm not kidding its back with me on windows

I built master after a8cc5d1

v2.7.3-0.20230805213002-65e33fc1ee47 h1:6PFhIFWiV7Nfez0pPrk8lnQ0IdbMPnnQUz1clDX+HxY=

Is it me being wrong here?, getting massive spams.


{"level":"debug","ts":1691309078.0863955,"logger":"http.stdlib","msg":"http: TLS handshake error from 162.158.29.26:57591: EOF"}
{"level":"debug","ts":1691309078.1616666,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.71.13.71:57891: EOF"}
{"level":"debug","ts":1691309078.233539,"logger":"http.stdlib","msg":"http: TLS handshake error from 162.158.29.26:13931: EOF"}
{"level":"debug","ts":1691309078.3741865,"logger":"http.stdlib","msg":"http: TLS handshake error from 162.158.29.26:37805: EOF"}
{"level":"debug","ts":1691309078.3912568,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.71.13.71:65391: EOF"}
{"level":"debug","ts":1691309078.5111217,"logger":"http.stdlib","msg":"http: TLS handshake error from 162.158.29.26:46309: EOF"}
{"level":"debug","ts":1691309078.621165,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.71.13.71:24263: EOF"}
{"level":"debug","ts":1691309078.661057,"logger":"http.stdlib","msg":"http: TLS handshake error from 162.158.29.26:16085: EOF"}
{"level":"debug","ts":1691309078.8506064,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.71.13.71:9705: EOF"}
{"level":"debug","ts":1691309079.0801218,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.71.13.71:47291: EOF"}
{"level":"debug","ts":1691309477.3321996,"logger":"http.stdlib","msg":"http: TLS handshake error from 162.158.228.70:29757: EOF"}
{"level":"debug","ts":1691309477.637015,"logger":"http.stdlib","msg":"http: TLS handshake error from 162.158.228.70:37959: EOF"}
{"level":"debug","ts":1691309478.848605,"logger":"http.stdlib","msg":"http: TLS handshake error from 162.158.228.70:50391: EOF"}
{"level":"debug","ts":1691309479.9939373,"logger":"http.stdlib","msg":"http: TLS handshake error from 162.158.228.70:41453: EOF"}
{"level":"debug","ts":1691310489.8182952,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.68.88.96:64407: EOF"}
{"level":"debug","ts":1691310489.950913,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.68.88.96:42685: EOF"}
{"level":"debug","ts":1691310490.0826983,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.68.88.96:13429: EOF"}
{"level":"debug","ts":1691310490.21494,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.68.88.96:43121: EOF"}
{"level":"debug","ts":1691310490.3484063,"logger":"http.stdlib","msg":"http: TLS handshake error from 172.68.88.96:28693: EOF"}

@francislavoie
Copy link
Member

francislavoie commented Aug 6, 2023

@wazerstar That's normal. That's not a bug. That's just bots/crawlers trying to connect to your server without knowing the correct domain name (i.e. trying to connect by IP and no domain). Ignore those. Turn off the debug logging level to quiet it.

@wazerstar
Copy link

@wazerstar That's normal. That's not a bug. That's just bots/crawlers trying to connect to your server without knowing the correct domain name (i.e. trying to connect by IP and no domain). Ignore those. Turn off the debug logging level to quiet it.

That's odd, why do I not receive those error when I revert back to the version previous to this, the handshake error EOF only recently started to appear after this?

I will let this run for another 12 hours and see what happens.

@francislavoie
Copy link
Member

francislavoie commented Aug 6, 2023

It's definitely not a new error, handshake errors have always been there. If you only enabled debug logging recently then you'd only be seeing those now. It's also affected by external factors whether bots are actually hitting your server or not. It happens in bursts as bots change their attention from one IP address to another.

To further clarify, these are debug logs, not actionable errors. It just means "some client" failed to connect. If you can yourself connect to your site, and Caddy is not crashing, then the problem in this issue is fixed (and we've confirmed that multiple times over).

@wazerstar
Copy link

It's definitely not a new error, handshake errors have always been there. If you only enabled debug logging recently then you'd only be seeing those now. It's also affected by external factors whether bots are actually hitting your server or not. It happens in bursts as bots change their attention from one IP address to another.

No I have debug enabled for about 5-6 days now to figure out some other things, only after this I noticed the errors, perhaps I have been lucky not to get targeted, I don't know, sorry for causing spikes in this thread again then, will just leave and ignore it until I'm done finding cause of other issue.

@bt90
Copy link
Contributor

bt90 commented Aug 6, 2023

Baader–Meinhof phenomenon 😉

@caddyserver caddyserver deleted a comment from bt90 Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working upstream ⬆️ Relates to some dependency of this project
Projects
None yet
Development

Successfully merging a pull request may close this issue.