-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UPSTREAM_TRANSPORT_FAILURE_REASON not populated when tls handshake fails with unsupported cipher suites #16991
Comments
cc @ggreenway |
Before I realized support had been dropped, I tried configuring the upstream TLS context to support transport_socket:
name: envoy.transport_sockets.tls
typed_config:
'@type': type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_params: {"cipher_suites":["ECDHE-ECDSA-AES128-SHA"]} |
That's very surprising to me. Can you check if there was anything in the logs about this when you ran with that configuration? |
This is the info log including one request:
This is the cluster configuration:
|
I'm running envoy in docker with the |
Huh, not sure why it's not working as expected. I would have expected to hit this error:
|
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
ECDHE-ECDSA-AES128-SHA in the supported list if you are using boringssl. I think the same result in openssl. But if tls 1.3 is negotiated, this suite is ignored.
|
The portability is decided by the underlying library. either build or link time, you might see NACK LDS or SDS if that suite is not supported along with this exception. This log won't be seen upon ssl handshake as per the above NACK |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
Title: UPSTREAM_TRANSPORT_FAILURE_REASON not populated when tls handshake fails with unsupported cipher suites
Description:
We deployed an updated envoy image yesterday, including a version change from 1.16 to 1.18. Following the deployment, envoy started issuing local 503 replies with RESPONSE_CODE_DETAILS of
upstream_reset_before_response_started{connection_failure}
.I added
upstreamTransportFailureReason: '%UPSTREAM_TRANSPORT_FAILURE_REASON%'
to the access logs (json format), but the value wasnull
(formatted for readability):I set the envoy log-level to trace which produced the following clues:
I noticed that there is a debug message in the log which also appears to be missing a value
upstream reset: reset reason: connection failure, transport failure reason:
It turns out that the problem was that envoy 1.17 included the removal of weak cipher suites in these issues (ref: #5401) and the upstream had an old list of allowed ciphers, which were all weak 🤦
I figured this out by searching for "tls" in the version history of the major version releases, finding the relevant PRs, etc.
I would have expected some stat, debug log or the UPSTREAM_TRANSPORT_FAILURE_REASON to indicate something like "TLS handshake failure: no supported ciphers".
Repro steps:
A TLS upstream with no supported ciphers. In my case the upstream had the following cipher suites enabled:
Admin and Stats Output:
These are all the stats after making 3 requests which route to the upstream with unsupported ciphers:
Logs:
See description above.
The text was updated successfully, but these errors were encountered: