Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[10.x] Add new error messages for detecting lost connections #47398

Merged
merged 1 commit into from
Jun 10, 2023

Conversation

mfn
Copy link
Contributor

@mfn mfn commented Jun 10, 2023

Summary

Ever since I moved from AWS Aurora to using AWS RDS Proxy, which (similar to pgbouncer) is essential to not overload the primary database with connection, I sporadically see these messages. This affects scheduled tasks, worker, HTTP requests; it's across the whole board.

This happened across different PHP versions (7.4, 8.2) and different
docker base OS images, as well as different Laravel version (8.x, 10.x).


The message for
SQLSTATE[08006] [7] unrecognized SSL error code:
is actually
SQLSTATE[08006] [7] unrecognized SSL error code: 6
but I figured this minor error code shouldn't be relevant.

I also saw other messages starting with SQLSTATE[08006] [7], so maybe there's a case for just adding this prefix, but I guess no one knows if this would really apply to all of such messages.

Ever since I moved from AWS Aurora to using AWS RDS Proxy, which
(similar to pgbouncer) is essential to not overload the primary database
with connection, I sporadically see these messages. This affects
scheduled tasks, worker, HTTP requests; it's across the whole board.

This happened across different PHP versions (7.4, 8.2) and different
docker base OS images, as well as different Laravel version (8.x, 10.x).
@@ -63,6 +63,8 @@ protected function causedByLostConnection(Throwable $e)
'Reason: Server is in script upgrade mode. Only administrator can connect at this time.',
'Unknown $curl_error_code: 77',
'SSL: Handshake timed out',
'SQLSTATE[08006] [7] SSL error: sslv3 alert unexpected message',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sslv3 😭 do we really want to support use of SSL v3? Do any modern databases actually still support this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS Aurora to using AWS RDS Proxy

Seems like a configuration error perhaps? Can you try setting the minimum TLS version to 1.2 in the parameter group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the error message being received here; all connections are at minimum TLS 1.2 I can proof this with a query like this:
SELECT * FROM pg_stat_activity JOIN pg_stat_ssl USING(pid);
and the output of any Laravel initiated connection looks like this:

…
ssl              | t
version          | TLSv1.2
cipher           | AES128-SHA256
bits             | 128
…

We can change it to SQLSTATE[08006] [7] SSL error: if that makes more sense; I just never had any other message then the two I added in this PR.

@taylorotwell taylorotwell merged commit 245e891 into laravel:10.x Jun 10, 2023
@mfn mfn deleted the mfn-rds-proxy-caused-by branch June 10, 2023 18:16
taylorotwell pushed a commit that referenced this pull request Dec 8, 2024
Last year I added those two via #47398
Please see details and reasoning there, it still applies; TL;DR:
> Ever since I moved from AWS Aurora to using AWS RDS Proxy, which
> (similar to pgbouncer) is essential to not overload the primary database
> with connection, I sporadically see these messages. This affects
> scheduled tasks, worker, HTTP requests; it's across the whole board.

What I didn't knew back then was that the order/format of those messages
very much dependent on the underlying libraries (libpq and or libssl
presumably) and once we started to upgrade from Debian bullseye to
bookworm, they changed and now are randomly interrupting our service
again.

The new (complete) messages look like this:
> `SQLSTATE[08006] [7] connection to server at "…" (…), port 5… failed: SSL error: sslv3 alert unexpected message (Connection: main, SQL: …`

That is, it still contains "SSL error: sslv3 alert unexpected message"
but there's now connection specific information before the SQLSTATE
prefix.

Since lost connection detection is based on exact string matching and
not regex, we're just removing the magic numbers in front and still
keep backwards compatibility.

I opened this PR against Laravel 10.x because we're still using and it
requires the fix there (we're in the progress moving to L11, but it still
takes time).

Thanks
taylorotwell pushed a commit to illuminate/database that referenced this pull request Dec 8, 2024
Last year I added those two via laravel/framework#47398
Please see details and reasoning there, it still applies; TL;DR:
> Ever since I moved from AWS Aurora to using AWS RDS Proxy, which
> (similar to pgbouncer) is essential to not overload the primary database
> with connection, I sporadically see these messages. This affects
> scheduled tasks, worker, HTTP requests; it's across the whole board.

What I didn't knew back then was that the order/format of those messages
very much dependent on the underlying libraries (libpq and or libssl
presumably) and once we started to upgrade from Debian bullseye to
bookworm, they changed and now are randomly interrupting our service
again.

The new (complete) messages look like this:
> `SQLSTATE[08006] [7] connection to server at "…" (…), port 5… failed: SSL error: sslv3 alert unexpected message (Connection: main, SQL: …`

That is, it still contains "SSL error: sslv3 alert unexpected message"
but there's now connection specific information before the SQLSTATE
prefix.

Since lost connection detection is based on exact string matching and
not regex, we're just removing the magic numbers in front and still
keep backwards compatibility.

I opened this PR against Laravel 10.x because we're still using and it
requires the fix there (we're in the progress moving to L11, but it still
takes time).

Thanks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants