Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CC-3025] sqlproxy: rate limit proxy connections by IP #56192

Merged
merged 1 commit into from
Nov 13, 2020

Conversation

spaskob
Copy link
Contributor

@spaskob spaskob commented Nov 2, 2020

Add a new admitter package which performs basic connection admission
control to rate limit connection attempts. Connection attempts are
rate limited based on source IP. Admission control is currently
purely a local decision, but the package this interface defines could
easily become the client-side for connection to a centralized service.

Release note: none.

@spaskob spaskob requested a review from tbg November 2, 2020 10:35
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg requested a review from asubiotto November 3, 2020 11:19
@tbg
Copy link
Member

tbg commented Nov 3, 2020

@asubiotto could you or someone else on the SQL end review this?

@tbg tbg removed the request for review from asubiotto November 3, 2020 11:20
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: for the admitter code which was forked from code in CC land. The sqlproxyccl changes look reasonable enough, though I've never looked at that code before. I think there should be a test that this rate limiting is working as expected. In CC code, we utilize the the admitter.Service mock to force various code paths during login. I can see a similar test being implemented for sqlproxyccl

Looks like c-deps/{cryptopp,googletest} were re-added accidentally.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @spaskob and @tbg)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

	if s.admitter != nil {
		// TODO(spaskob): check for previous successful connection from the same IP

I wonder if we'll have to do this soon. Connections to the SQL proxy are likely much more frequent that console logins.

@spaskob spaskob force-pushed the sqlproxy-rate-limit branch from f373eb9 to 66448bb Compare November 3, 2020 14:27
Copy link
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I will be adding the recommended test soon.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @petermattis and @tbg)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I wonder if we'll have to do this soon. Connections to the SQL proxy are likely much more frequent that console logins.

Let me try to add something even in this PR.
Some clarifying questions:

  1. Is metadata about successful connections persisted somewhere?
  2. Do we need to take into an account the tenant id as well? A hacker could connect to their own tenant and then use the free-pass to attack others.
  3. The above may already be a vulnerability of the current rate limiting approach. Perhaps we should track simultaneous connections from the same IP?

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @spaskob and @tbg)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

  1. Is metadata about successful connections persisted somewhere?

Not that I know of. I suspect we don't want to persist it as it is nice for sqlproxy to be stateless. Perhaps caching it in memory is sufficient. I'm imagining a cache keyed by tenant ID and source IP address.

  1. Do we need to take into an account the tenant id as well? A hacker could connect to their own tenant and then use the free-pass to attack others.

Yes. We definitely need to account for the tenant ID.

  1. The above may already be a vulnerability of the current rate limiting approach. Perhaps we should track simultaneous connections from the same IP?

Can you elaborate on this?

@spaskob spaskob force-pushed the sqlproxy-rate-limit branch 2 times, most recently from 2f258e1 to df9b92f Compare November 4, 2020 19:35
Copy link
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @petermattis and @tbg)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…
  1. Is metadata about successful connections persisted somewhere?

Not that I know of. I suspect we don't want to persist it as it is nice for sqlproxy to be stateless. Perhaps caching it in memory is sufficient. I'm imagining a cache keyed by tenant ID and source IP address.

  1. Do we need to take into an account the tenant id as well? A hacker could connect to their own tenant and then use the free-pass to attack others.

Yes. We definitely need to account for the tenant ID.

  1. The above may already be a vulnerability of the current rate limiting approach. Perhaps we should track simultaneous connections from the same IP?

Can you elaborate on this?

  1. Caching is being added (still WIP).
  2. Talking to Jay on how to actually parse the tenant id from the connection params. It is doable - we are figuring out the details.

As for 3:

Is this scenario possible:

  • connection attempts to tenant X from a given IP get throttled
  • so attacker successfully connects to their own tenant Y through the same IP instead
  • this clears the IP from the admitter
  • the attacker resumes the attack on tenant X

@asubiotto
Copy link
Contributor

@tbg I see you removed a request for review, do you still want me to take a look at this?

@tbg tbg requested review from asubiotto and removed request for tbg November 5, 2020 11:11
@tbg
Copy link
Member

tbg commented Nov 5, 2020

Yes, sorry! I must have fat fingered it.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to simulate the single IP multiple tenant ID scenario you point out in the test you're adding.

Reviewed 2 of 8 files at r1, 7 of 7 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 42 at r4 (raw file):

	OnSendErrToClient func(code ErrorCode, msg string) string

	admitter admitter.Service

nit: as far as I can see, it doesn't seem like this field is used anywhere (the one on the Server is)


pkg/ccl/sqlproxyccl/admitter/local.go, line 114 at r3 (raw file):

		index: len(s.mu.limiters),
	}
	s.mu.limiters[addr] = l

can you reset and reuse the limiter rather than allocating a new one? does it make sense to do so? Just realizing now this is a fork of existing code, feel free to ignore this comment.


pkg/ccl/sqlproxyccl/admitter/local.go, line 79 at r4 (raw file):

	}
	s.mu.limiters = make(map[string]*limiter)
	s.mu.knownTenants = make(map[string]map[uint64]bool)

If existence is enough, consider making this a map[string]map[uint64]struct{} to use less memory?


pkg/ccl/sqlproxyccl/admitter/local.go, line 109 at r4 (raw file):

		s.mu.knownTenants[ipAddress] = map[uint64]bool{tenID: true}
	} else {
		s.mu.knownTenants[ipAddress][tenID] = true

Same here.


pkg/ccl/sqlproxyccl/admitter/local.go, line 116 at r4 (raw file):

	s.mu.Lock()
	defer s.mu.Unlock()
	_, ok := s.mu.knownTenants[ipAddress]

You can avoid a second lookup by storing the result of this lookup and reusing the map in the if ok block.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 123 at r4 (raw file):

	if s.admitter != nil {
		ip := conn.RemoteAddr().String()
		tenID, clientErr = s.opts.TenantFromParams(msg.Parameters)

I believe this is the first introduction of the concept of tenants to the proxy code. I think it would be better to keep tenant logic separated. One way to do this is to define a local Admitter interface that takes the IP address and msg.Parameters:

type Admitter interface {
  AllowRequest(ip string, params map[string]string], ...) error
}

pkg/ccl/sqlproxyccl/admitter/local.go, line 52 at r4 (raw file):

		limiters map[string]*limiter
		// Map from IP address to known tenant ids for this IP.
		knownTenants map[string]map[uint64]bool

I think you're always looking up the tuple (ip-address, tenant-id). If that is the case, it would be faster and use less memory to have a single level map:

type tenantKey struct {
  ipAddress string
  tenantID uint64
}
...
knownTenants map[tenantKey]struct{}

Every map has overhead. This approach switches from a map per ip address to a single map which should be a significant reduction in memory.

Related to my other comment: this is the first introduction of the concept of tenants into the SQL proxy code. I think we should avoid it. The knownTenants stuff can be pulled out by adjusting the admitter interface used by SQL proxy and having the caller implement the known tenant logic. Specifically, my suggestion is that the known tenant handling get pulled into a separate struct and implemented in the managed-service repo.

Do we need a max size on this map? I think we can get away with not having one because the knownTenants map is limited by the number of tenants and how many IPs they are successfully be connected from. If that sounds right to you, it is worthwhile putting that explanation in a comment.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aaron-crl, @asubiotto, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Is this scenario possible:

This sounds possible, but unlikely, and not too harmful. The attacker needs to be able to spoof an IP address that someone is using log in from. How do they know that IP? Assuming they somehow do know such an IP, all they are getting is periodic resets of the rate limiter. Most of their attempts would still be throttled unless the successful connections from that IP address are happening very frequently.

Cc @aaron-crl for his thoughts.

Copy link

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably ought to have a release note as enabling this can result in users locking themselves out of a cluster in a way previously not possible.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is this scenario possible:

This sounds possible, but unlikely, and not too harmful. The attacker needs to be able to spoof an IP address that someone is using log in from. How do they know that IP? Assuming they somehow do know such an IP, all they are getting is periodic resets of the rate limiter. Most of their attempts would still be throttled unless the successful connections from that IP address are happening very frequently.

Cc @aaron-crl for his thoughts.

I looked at the local service code in this PR and I'm not sure I understand the proposed attack.

Would this be the attacker loop?

  • Attacker makes n guesses against target_tenant from attacker_IP
  • Attacker signs into attacker_tenant reseting the counter from attacker_IP

It looks like we track both the tenant and the IP in the knownTenant map so this wouldn't unblock attempts from the attacker's IP for target_tenant .

Is the concern that the attacker could spoof a valid connection from the legitimate user's IP? That should be virtually impossible if authentication takes place under TLS.

@spaskob
Copy link
Contributor Author

spaskob commented Nov 5, 2020

This probably ought to have a release note as enabling this can result in users locking themselves out of a cluster in a way previously not possible.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)

pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…
I looked at the local service code in this PR and I'm not sure I understand the proposed attack.

Would this be the attacker loop?

  • Attacker makes n guesses against target_tenant from attacker_IP
  • Attacker signs into attacker_tenant reseting the counter from attacker_IP

It looks like we track both the tenant and the IP in the knownTenant map so this wouldn't unblock attempts from the attacker's IP for target_tenant .

Is the concern that the attacker could spoof a valid connection from the legitimate user's IP? That should be virtually impossible if authentication takes place under TLS.

Yes exactly this is what I am worried about.
knownTenant map is used to bypass the admitter entirely - if (IP,tenID) has had a known successful login, we don't try to rate limit connection request.
The admitter however only tracks requests by IP, hence it is possible for the attacker to reset their exp backoff to 0 on target_tenant by successfully connecting from their same IP to their own attacker_tenant.

Copy link

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

Yes exactly this is what I am worried about. knownTenant map is used to bypass the admitter entirely - if (IP,tenID) has had a known successful login, we don't try to rate limit connection request. The admitter however only tracks requests by IP, hence it is possible for the attacker to reset their exp backoff to 0 on target_tenant by successfully connecting from their same IP to their own attacker_tenant.

Ah! Good find! Yes, we should protect against that case. I missed that on the original review.

Impact is that any user with valid credentials can circumvent rate-limiting protection for another user account (which is mercifully limited).

We still want to protect against the base case of an an attacker attempting to check for common passwords against multiple accounts, but we should also limit at the user/tenant, IP level (but not at the user/tenant level to limit DDoS channels).

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aaron-crl, @asubiotto, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, aaron-crl wrote…

Ah! Good find! Yes, we should protect against that case. I missed that on the original review.

Impact is that any user with valid credentials can circumvent rate-limiting protection for another user account (which is mercifully limited).

We still want to protect against the base case of an an attacker attempting to check for common passwords against multiple accounts, but we should also limit at the user/tenant, IP level (but not at the user/tenant level to limit DDoS channels).

Interesting. Is it sufficient to get rid of the RequestSuccess path? The knownTenant check will disable rate limiting for a particular tenant once they successfully auth, but we'd never reset the rate limit for a particular IP address.

@spaskob
Copy link
Contributor Author

spaskob commented Nov 6, 2020

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aaron-crl, @asubiotto, and @spaskob)

pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, aaron-crl wrote…
Interesting. Is it sufficient to get rid of the RequestSuccess path? The knownTenant check will disable rate limiting for a particular tenant once they successfully auth, but we'd never reset the rate limit for a particular IP address.

Yes - that should work. I checked the existing throttling code in CC and it seems that we should do the same there. WDYT?

Copy link
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 123 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I believe this is the first introduction of the concept of tenants to the proxy code. I think it would be better to keep tenant logic separated. One way to do this is to define a local Admitter interface that takes the IP address and msg.Parameters:

type Admitter interface {
  AllowRequest(ip string, params map[string]string], ...) error
}

Now that I understand better the


pkg/ccl/sqlproxyccl/admitter/local.go, line 52 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think you're always looking up the tuple (ip-address, tenant-id). If that is the case, it would be faster and use less memory to have a single level map:

type tenantKey struct {
  ipAddress string
  tenantID uint64
}
...
knownTenants map[tenantKey]struct{}

Every map has overhead. This approach switches from a map per ip address to a single map which should be a significant reduction in memory.

Related to my other comment: this is the first introduction of the concept of tenants into the SQL proxy code. I think we should avoid it. The knownTenants stuff can be pulled out by adjusting the admitter interface used by SQL proxy and having the caller implement the known tenant logic. Specifically, my suggestion is that the known tenant handling get pulled into a separate struct and implemented in the managed-service repo.

Do we need a max size on this map? I think we can get away with not having one because the knownTenants map is limited by the number of tenants and how many IPs they are successfully be connected from. If that sounds right to you, it is worthwhile putting that explanation in a comment.

Thanks for the detailed explanation, the purpose of the proxy makes much more sense to me now. Should we take your argument even further and pass the admitter and knownClient cache as options in sqlproxyccl.Options https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/sqlproxyccl/proxy.go#L26 ? Thus we won't have to fork the admitter library at all.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aaron-crl, @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

Yes - that should work. I checked the existing throttling code in CC and it seems that we should do the same there. WDYT?

Agreed, we should fix the CC code as well. Can you take care of that?


pkg/ccl/sqlproxyccl/admitter/local.go, line 52 at r4 (raw file):

Should we take your argument even further and pass the admitter and knownClient cache as options in sqlproxyccl.Options https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/sqlproxyccl/proxy.go#L26 ? Thus we won't have to fork the admitter library at all.

This is a good idea. Are you imagining defining an Admitter interface as I mentioned in another message? I think that would work. Another possibility is that we extend OutgoingAddrFromParams (which is now called BackendFromParams after Jackson's recent PR). Seems like any throttling could be hidden in that callback. We'd just have to also supply the remote IP address.

Copy link

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Agreed, we should fix the CC code as well. Can you take care of that?

Sounds good on all counts.

Copy link
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments - based on them I am going to change this PR significantly and ping you for further reviews. @asubiotto I need guidance on how to unskip this test https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/sqlproxyccl/proxy_test.go#L205 so that I can use it as template for the test testing the throttling logic.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, aaron-crl wrote…

Sounds good on all counts.

Will do


pkg/ccl/sqlproxyccl/proxy.go, line 42 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: as far as I can see, it doesn't seem like this field is used anywhere (the one on the Server is)

ack - we are taking an approach not using a explicit admitter instance - this will go away.


pkg/ccl/sqlproxyccl/admitter/local.go, line 114 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

can you reset and reuse the limiter rather than allocating a new one? does it make sense to do so? Just realizing now this is a fork of existing code, feel free to ignore this comment.

Done.


pkg/ccl/sqlproxyccl/admitter/local.go, line 52 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Should we take your argument even further and pass the admitter and knownClient cache as options in sqlproxyccl.Options https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/sqlproxyccl/proxy.go#L26 ? Thus we won't have to fork the admitter library at all.

This is a good idea. Are you imagining defining an Admitter interface as I mentioned in another message? I think that would work. Another possibility is that we extend OutgoingAddrFromParams (which is now called BackendFromParams after Jackson's recent PR). Seems like any throttling could be hidden in that callback. We'd just have to also supply the remote IP address.

I like the latter better now that you mention it, so that we don't have to expose the existence of an admitter at all.


pkg/ccl/sqlproxyccl/admitter/local.go, line 79 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

If existence is enough, consider making this a map[string]map[uint64]struct{} to use less memory?

ack - we are taking an approach not using a explicit admitter instance - this will go away.


pkg/ccl/sqlproxyccl/admitter/local.go, line 109 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Same here.

ack - we are taking an approach not using a explicit admitter instance - this will go away.


pkg/ccl/sqlproxyccl/admitter/local.go, line 116 at r4 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

You can avoid a second lookup by storing the result of this lookup and reusing the map in the if ok block.

ditto

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you need a bazel regen. The more we find ourselves copying code rather than sharing and improving, the sadder I'll be. That doesn't need to be solved for this but it's something we should be mindful of.

I'd love to see some testing of

  • The KnownClient logic
  • The plumbing in the last commit, can you write something of an integration test that ensures clients get the appropriate error in the appropriate cases?

Reviewed 1 of 8 files at r1, 1 of 7 files at r3, 3 of 8 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/admitter/service.go, line 18 at r4 (raw file):

	// AllowRequest determines whether a request should be allowed to proceed. It
	// rate limits requests from IP addresses regardless of tenant id.
	AllowRequest(ipAddress string, now time.Time) error

Returning an error here seems like too big of a hammer. Given you aren't providing any guidance on the set of errors being returned even though it's just one, I think returning a bool would make for a better API.


pkg/ccl/sqlproxyccl/admitter/service.go, line 23 at r4 (raw file):

	// KnownClient checks if this client has connected successfully before.
	KnownClient(ipAddress string, tenID uint64) bool

Is it definitely better to have this as a separate method and force the caller to check this before calling AllowRequest? Should we just augment the AllowRequest interface to also take the tenID and short-circuit?

@spaskob spaskob force-pushed the sqlproxy-rate-limit branch from df9b92f to 9f079b2 Compare November 9, 2020 23:33
@spaskob spaskob force-pushed the sqlproxy-rate-limit branch 2 times, most recently from 6759a6c to d505572 Compare November 11, 2020 00:40
@spaskob
Copy link
Contributor Author

spaskob commented Nov 11, 2020

PTAL

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @petermattis, and @spaskob)


pkg/ccl/sqlproxyccl/metrics.go, line 76 at r6 (raw file):

		CurConnCount:           metric.NewGauge(metaCurConnCount),
		RoutingErrCount:        metric.NewCounter(metaRoutingErrCount),
		RefusedConnCount:       metric.NewCounter(metaBackendDisconnectCount),

s/metaBackendDisconnectCount/metaRefusedConnCount/g


pkg/ccl/sqlproxyccl/proxy.go, line 28 at r6 (raw file):

// being proxied.
type BackendConfig struct {
	OutgoingAddress     string

Let's document these fields.


pkg/ccl/sqlproxyccl/proxy.go, line 31 at r6 (raw file):

	TLSConf             *tls.Config
	RefuseConn          bool
	OnConnectionSuccess func() error

Under what circumstances would this callback want to return an error?


pkg/ccl/sqlproxyccl/proxy.go, line 133 at r6 (raw file):

		var clientErr error
		backendConfig, clientErr = s.opts.BackendConfigFromParams(msg.Parameters, ip)
		if clientErr != nil {

Rather than the special backendConfig.RefuseConn handling, perhaps we should export newErrorf and then allow inspecting it here. That is, we'd do something like:

if clientErr != nil {
  s.metrics.RoutingErrCount.Inc(1)

  var codeErr *codeError
  if !errors.As(clientErr, &codeErr) {
    codeErr = newErrorf(CodeParamsRoutingFailed, "rejected by BackendConfigFromParams: %v", clientErr)
  }
  sendErrToClient(conn, codeErr.code, clientErr.Error())
  return codeErr
}

Then if the BackendConfigFromParams wanted to return a specific error code it would do:

  return nil, sqlproxyccl.NewErrorf(sqlproxyccl.CodeProxyRefusedConnection, ....)

@spaskob spaskob force-pushed the sqlproxy-rate-limit branch from d505572 to 612690a Compare November 11, 2020 16:40
Copy link
Contributor Author

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @asubiotto, and @petermattis)


pkg/ccl/sqlproxyccl/metrics.go, line 76 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/metaBackendDisconnectCount/metaRefusedConnCount/g

Done.


pkg/ccl/sqlproxyccl/proxy.go, line 59 at r2 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

Will do

Done.


pkg/ccl/sqlproxyccl/proxy.go, line 123 at r4 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

Now that I understand better the

Done.


pkg/ccl/sqlproxyccl/proxy.go, line 112 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Yes, that approach sounds good, though I have a nit about the naming. The "known tenants" cache is caching whether a tenant connection attempt was successful. Naming it CacheConn makes me think that it is caching the connection itself. I'd name this callback something like ConnectionSuccess.

Done.


pkg/ccl/sqlproxyccl/proxy.go, line 28 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Let's document these fields.

Done.


pkg/ccl/sqlproxyccl/proxy.go, line 31 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Under what circumstances would this callback want to return an error?

I was thinking on service timeout when this becomes a service one day but I guess we can hide all this complexity in the closure and only emit errors in the log as this is not really a serious failure that requires terminating


pkg/ccl/sqlproxyccl/proxy.go, line 133 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Rather than the special backendConfig.RefuseConn handling, perhaps we should export newErrorf and then allow inspecting it here. That is, we'd do something like:

if clientErr != nil {
  s.metrics.RoutingErrCount.Inc(1)

  var codeErr *codeError
  if !errors.As(clientErr, &codeErr) {
    codeErr = newErrorf(CodeParamsRoutingFailed, "rejected by BackendConfigFromParams: %v", clientErr)
  }
  sendErrToClient(conn, codeErr.code, clientErr.Error())
  return codeErr
}

Then if the BackendConfigFromParams wanted to return a specific error code it would do:

  return nil, sqlproxyccl.NewErrorf(sqlproxyccl.CodeProxyRefusedConnection, ....)

Done.


pkg/ccl/sqlproxyccl/admitter/local.go, line 116 at r4 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

ditto

Done.


pkg/ccl/sqlproxyccl/admitter/service.go, line 23 at r4 (raw file):

Previously, ajwerner wrote…
	// KnownClient checks if this client has connected successfully before.
	KnownClient(ipAddress string, tenID uint64) bool

Is it definitely better to have this as a separate method and force the caller to check this before calling AllowRequest? Should we just augment the AllowRequest interface to also take the tenID and short-circuit?

Done.

@spaskob spaskob force-pushed the sqlproxy-rate-limit branch 5 times, most recently from 1791d32 to 1f0822c Compare November 11, 2020 20:39
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @asubiotto, and @petermattis)

@spaskob
Copy link
Contributor Author

spaskob commented Nov 11, 2020

TFTRs!

@spaskob
Copy link
Contributor Author

spaskob commented Nov 11, 2020

bors r+

1 similar comment
@spaskob
Copy link
Contributor Author

spaskob commented Nov 12, 2020

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 12, 2020

Build failed:

@spaskob
Copy link
Contributor Author

spaskob commented Nov 12, 2020

bors r+

@spaskob
Copy link
Contributor Author

spaskob commented Nov 12, 2020

bors r-

@craig
Copy link
Contributor

craig bot commented Nov 12, 2020

Canceled.

@spaskob spaskob force-pushed the sqlproxy-rate-limit branch from 1f0822c to a1a5fe1 Compare November 12, 2020 14:26
@spaskob
Copy link
Contributor Author

spaskob commented Nov 12, 2020

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 12, 2020

Build failed:

We need a mechanism for sqlproxy users to provide admission control to
clients that is enforced in the proxy. To that end we expand
`BackendConfigFromParams` to provide a signal whether a connection
should be refused and a callback to record the success of a connection.
The callback can implement caching of known connections for example so
that they are not rate limited on further connection requests.

Release note: none.
@spaskob spaskob force-pushed the sqlproxy-rate-limit branch from a1a5fe1 to 9349bd4 Compare November 13, 2020 14:24
@spaskob
Copy link
Contributor Author

spaskob commented Nov 13, 2020

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 13, 2020

Build succeeded:

@craig craig bot merged commit 5b69ef1 into cockroachdb:master Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants