-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlproxyccl: rework sqlproxy connection throttler #71412
Conversation
2bc96fb
to
4e3bd3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 17 of 17 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @darinpp)
4e3bd3f
to
55ba27e
Compare
@darinpp, can you take a look as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson and @kpatron-cockroachlabs)
pkg/ccl/sqlproxyccl/authentication.go, line 35 at r2 (raw file):
return newErrorf(codeBackendReadFailed, "unable to receive message from backend: %v", err) }
I don't think we should be reporting throttle errors as such. This gives an attacker information that we don't need to give back. Reporting authentication error when there is one but also when they are throttled prevents them from knowing that password they tried is incorrect. This also eliminates the need to have throttleHook
in authenticate
so this code is not needed.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 266 at r2 (raw file):
throttleTime, err := handler.throttleService.LoginCheck(throttleTags) if err != nil { log.Errorf(ctx, "throttler refused connection: %v", err.Error())
This should be authentication error. Reporting that they are throttled let's an attacked know that the password may be correct and to try later. Making all throttler errors looks like authentication errors, hides the actual length of the throttle period and prevents the attackers to get reliable feedback on incorrect passwords.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 402 at r2 (raw file):
// Perform user authentication. if err := authenticate(conn, crdbConn, func(status throttler.AttemptStatus) *pgproto3.ErrorResponse {
If all errors are authenticate errors then we don't need this hook.
pkg/ccl/sqlproxyccl/throttler/local.go, line 61 at r2 (raw file):
maxCacheSize: 1e6, /* 1 million */ baseDelay: time.Second, maxDelay: time.Hour,
I don't think one hours works. If I run workload init tpcc
for example with 36 connections but I make a mistake in the password. That will activate the throttler and it will make me wait an hour. We should probably have more like 10,15 or 30 seconds. Another issue with one hour (or any long period in general) is if the connections don't get spread equally across sqlproxies. Imagine that 16 connections get spread over 3 proxies as 4,5,7. So one proxy will block for 16 sec while another will block for 2+min. So if I try again my workload in 20 sec, I will get 1/3 of my connections fine and 2/3 throttled. There isn't much we can do about it now but that is one more reason why we shouldn't have a very high max delay as this can lead to bad experience due to a small mistake.
pkg/ccl/sqlproxyccl/throttler/local.go, line 103 at r2 (raw file):
func (s *localService) ReportAttempt( connection ConnectionTags, throttleTime time.Time, status AttemptStatus,
If the only two status choices are OK
and Credentials
it may be simpler to not have the status and instead have ReportSuccess
and ReportFailure
methods.
pkg/ccl/sqlproxyccl/throttler/throttle.go, line 35 at r2 (raw file):
func (l *throttle) triggerThrottle(now time.Time, maxBackoff time.Duration) { l.nextTime = now.Add(l.nextBackoff) l.nextBackoff *= 2
Do we ever reset that to initialBackoff
? If I run workload init tpcc
with say 48 connections and get locked for one hour and then in one hour I try cockroach sql
making a mistake in my password - am I going to be locked for one more hour? My preference would be to keep it as is (not reset it) but have something like 10 sec maximum backoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @darinpp and @kpatron-cockroachlabs)
pkg/ccl/sqlproxyccl/authentication.go, line 35 at r2 (raw file):
Previously, darinpp wrote…
I don't think we should be reporting throttle errors as such. This gives an attacker information that we don't need to give back. Reporting authentication error when there is one but also when they are throttled prevents them from knowing that password they tried is incorrect. This also eliminates the need to have
throttleHook
inauthenticate
so this code is not needed.
The throttle hook is here to prevent leaking authentication state. authenticate does some frame level proxying back to the client. If we didn't check the throttle here, it would forward an AuthenticationOK or ErrorResponse to the client.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 266 at r2 (raw file):
error. Reporting that they are throttled let's an attacked know that the password may be correct and to try later. Making all th
This is not a behavior change. It is the same throttle error we returned in previous iterations of the throttle.
I think it would be a bad user experience to report these as authentication errors. If a user misconfigures their password, then fixes it, we should show an error message indicating we didn't actually check their password.
Even if we returned an authentication error here, it would still be obvious to an attacker that we didn't actually check the credentials. The sql server responds with a AuthenticationCleartextPassword message prompting the client to send its password. Throttling before we try to dial the backend means we would send an authentication error before the client has sent us its credentials.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 402 at r2 (raw file):
Previously, darinpp wrote…
If all errors are authenticate errors then we don't need this hook.
The hook would still be necessary to consult throttle state and suppress a potential AuthenticationOK response from the sql server.
pkg/ccl/sqlproxyccl/throttler/local.go, line 61 at r2 (raw file):
suppress
The exponential backoff doesn't trigger if the request was throttled. Only attempts let past the throttle trigger an exponential backoff. So the total time period of authentication errors is going to be at most double the time your script was retrying with a bad password.
pkg/ccl/sqlproxyccl/throttler/local.go, line 103 at r2 (raw file):
Previously, darinpp wrote…
If the only two status choices are
OK
andCredentials
it may be simpler to not have the status and instead haveReportSuccess
andReportFailure
methods.
I tried splitting them. They share all but one line of code and having a function handle both status allows us to pass a single throttle hook function to authenticate.
pkg/ccl/sqlproxyccl/throttler/throttle.go, line 35 at r2 (raw file):
Previously, darinpp wrote…
Do we ever reset that to
initialBackoff
? If I runworkload init tpcc
with say 48 connections and get locked for one hour and then in one hour I trycockroach sql
making a mistake in my password - am I going to be locked for one more hour? My preference would be to keep it as is (not reset it) but have something like 10 sec maximum backoff.
I chose a 1 hour backoff because that is the value used by the original throttle logic. In general you won't get locked out for an hour unless you leave the script retrying for at least 30 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffswenson and @kpatron-cockroachlabs)
pkg/ccl/sqlproxyccl/authentication.go, line 35 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
The throttle hook is here to prevent leaking authentication state. authenticate does some frame level proxying back to the client. If we didn't check the throttle here, it would forward an AuthenticationOK or ErrorResponse to the client.
OK. Sounds good. Can you still merge this switch into the on below by moving the fe.Send(backendMsg)
into the switch?
pkg/ccl/sqlproxyccl/proxy_handler.go, line 266 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
error. Reporting that they are throttled let's an attacked know that the password may be correct and to try later. Making all th
This is not a behavior change. It is the same throttle error we returned in previous iterations of the throttle.
I think it would be a bad user experience to report these as authentication errors. If a user misconfigures their password, then fixes it, we should show an error message indicating we didn't actually check their password.
Even if we returned an authentication error here, it would still be obvious to an attacker that we didn't actually check the credentials. The sql server responds with a AuthenticationCleartextPassword message prompting the client to send its password. Throttling before we try to dial the backend means we would send an authentication error before the client has sent us its credentials.
Sounds good.
pkg/ccl/sqlproxyccl/proxy_handler.go, line 402 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
The hook would still be necessary to consult throttle state and suppress a potential AuthenticationOK response from the sql server.
OK
pkg/ccl/sqlproxyccl/throttler/local.go, line 61 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
suppress
The exponential backoff doesn't trigger if the request was throttled. Only attempts let past the throttle trigger an exponential backoff. So the total time period of authentication errors is going to be at most double the time your script was retrying with a bad password.
I see. So a quick burst of connection requests with invalid passwords won't cause a problem but a persistent stream of invalid password connections over time will push the backoff time up.
pkg/ccl/sqlproxyccl/throttler/throttle.go, line 35 at r2 (raw file):
Previously, JeffSwenson (Jeff Swenson) wrote…
I chose a 1 hour backoff because that is the value used by the original throttle logic. In general you won't get locked out for an hour unless you leave the script retrying for at least 30 minutes.
I see. It may be helpful to add a line to the log when the backoff gets to the one hour mark for a given tenant/IP. It should be very rare but may be a good signal for security.
3ec03d5
to
0c3a32b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @darinpp, @jeffswenson, and @kpatron-cockroachlabs)
pkg/ccl/sqlproxyccl/authentication.go, line 35 at r2 (raw file):
Previously, darinpp wrote…
OK. Sounds good. Can you still merge this switch into the on below by moving the
fe.Send(backendMsg)
into the switch?
Done. PTAL the refactor was non-trivial. I made one small behavior change. If there is an unexpected message type during authentication, we used to forward the message, then immediately close the connection. Now we close the connection without forwarding the unexpected message.
pkg/ccl/sqlproxyccl/throttler/throttle.go, line 35 at r2 (raw file):
Previously, darinpp wrote…
I see. It may be helpful to add a line to the log when the backoff gets to the one hour mark for a given tenant/IP. It should be very rare but may be a good signal for security.
Done.
0c3a32b
to
0cd0cc1
Compare
This change switches the sqlproxy connection throttling logic back to exponential backoff. The tokenbucket approach was introduced by PR cockroachdb#69041. There are a few behavior differences between this and the original exponential backoff implementation. 1. The throttling logic is maintained per (ip, tenant) instead of per (ip). Some platform as a service provides share a single outbound ip address between multiple clients. These users would occasionaly see throttling caused by a second user sharing their IP. 2. The throttling logic was triggered before there was an authentication failure. It takes ~100ms-1000ms to authenticate with the tenant process. Any requests that arrived after the first request, but before it was processed, would trigger the throttle. Now, we only trigger the throttle in response to an explict authorization error. Release note: None
0cd0cc1
to
2559b4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @darinpp, @jeffswenson, and @kpatron-cockroachlabs)
bors r+ |
Build failed: |
bors r+ |
Build succeeded: |
This change switches the sqlproxy connection throttling logic back to
exponential backoff. The tokenbucket approach was introduced by
PR #69041. There are a few behavior differences between this and the
original exponential backoff implementation.
(ip). Some platform as a service provides share a single outbound ip
address between multiple clients. These users would occasionaly see
throttling caused by a second user sharing their IP.
failure. It takes ~100ms-1000ms to authenticate with the tenant
process. Any requests that arrived after the first request, but
before it was processed, would trigger the throttle. Now, we only
trigger the throttle in response to an explict authorization error.
Release note: None