Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HTTP 429 when too many requests per IP calls limit is reached #6

Merged
merged 9 commits into from
Jun 22, 2020

Conversation

pierresouchay
Copy link
Contributor

Just closing HTTP connection when too many requests is reached make hard to provide good quality SDK and make troubleshouting hard.

As described in hashicorp/consul#7527, here is an implementation of it.

@pierresouchay pierresouchay force-pushed the http_429_for_too_many_requests branch from c8183de to 24bec32 Compare March 27, 2020 14:50
@pierresouchay
Copy link
Contributor Author

@schmichael @banks Might deserve a look to improve error handling with HTTP requests

After applying this patch on Consul, this gives me this error message:

curl -i localhost:8500/v1/health/service/consul
HTTP/1.1 429 Too Many Requests
Content-Type: text/plain
Content-Length: 81
Connection: close
Your IP is issuing too many concurrent connections, please rate limit your calls

connlimit.go Outdated Show resolved Hide resolved
connlimit.go Outdated Show resolved Hide resolved
connlimit.go Outdated Show resolved Hide resolved
connlimit.go Outdated Show resolved Hide resolved
@pierresouchay
Copy link
Contributor Author

@schmichael In last commit, I implemented the go routine as you suggested.
From my test in Consul, the behavior does not change I have the same exact results with httpd ab test in order to test the impact on performance in Consul (it does not break either).

@pierresouchay pierresouchay requested a review from schmichael April 1, 2020 15:09
@pierresouchay
Copy link
Contributor Author

@schmichael Are you Happy with the implementation with goroutine?

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for leading you down the goroutine path. I don't see a way to avoid it racing with the HTTP handler.

I think the ConnState limiter approach is insufficient if we want to return 529 responses, and we'll probably need to implement a limiter as http.Handler middleware.

connlimit.go Outdated Show resolved Hide resolved
connlimit.go Outdated Show resolved Hide resolved
@pierresouchay
Copy link
Contributor Author

@schmichael In the last commit, I have a proposal, where you can plug your own error handler.
Thus, it is possible for the caller to:

  • either use HTTP 429N by default (sync)
  • either flag the conn as TooMany requests + Send the response later in the HTTP Handler

What do you think?

@pierresouchay
Copy link
Contributor Author

@schmichael I made a quick & dirty demo on how the last commit it could be integrated into Consul: pierresouchay/consul@8399df9

So it basically:

  • handle HTTP 429 nicely in HTTP Mux
  • if the client is too slow (maliciously or not), auto-close the connection after a few MS

In this patch, I don't use the same mechanism to show both approach do work (the message iis different in HTTP Mux and in my hook).

So, the last approach allow either to:

  • keep same behaviour (default)
  • respond HTTP 429 on the same execution thread as Accept (might slow down the server) using the new method HTTPConnStateFuncWithDefault429Handler()
  • OR handle it in the HTTP Mux or with async behaviour as shown in pierresouchay/consul@8399df9

what do you think?

@pierresouchay
Copy link
Contributor Author

Hello,

I made a proposal in last comment that might solve the issue of HTTP mux fighting with connlimit showing how it might work (and not on same thread as accept). @schmichael do you think it might work? (with more care on the Consul patch of course)

@pierresouchay
Copy link
Contributor Author

@i0rek @schmichael @banks Any other comment on this review?

Note there is a POC to implement it without blocking the Accept thread.

Here are the possibilities offered by this :

  • keep same behavior (default)
  • respond HTTP 429 on the same execution thread as Accept (might slow down the server) using the new method HTTPConnStateFuncWithDefault429Handler() straightforward, but a bit less efficient with malicious clients playing with TCP or very slow clients
  • OR handle it in the HTTP Mux or with async behavior as shown in Proof Of Concept pierresouchay/consul@8399df9

The issue hashicorp/consul#7527 it would solve is really impacting for several people (I have been contacted several times by people from several organizations on this subject). For us, just breaking the connection make this setting unusable as our SDKs and systems cannot take good decisions and will just retry blindly.

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think this looks pretty good.

It's a little tricky to know whether the 429 handler can be turned on by default in Consul and Nomad thought. I suspect it would be fine since the response is tiny and likely fits in buffers etc. but my mental model of network stacks on different platforms shouldn't be relied on for something like this.

I think the risk is that this library is designed to mitigate DOS attempts so any amount of work done on a resource that we've chosen to deny is a risk. So we're not dealing with "will this write block in most normal situations" which I'm pretty sure is no. It's more like "can an attacker find any possible way to feed requests in/refuse to ack packets such that the OS buffers are full and the write of the response blocks and holds the goroutine open" which is much harder to answer!

On the flip side, the max write deadline does seem to put a manageable bound on the time we'd wait for it so I feel like this is OK (with the suggestion I made inline done).

I'm interested if @schmichael feels the same though as we'd not want to adopt this in Consul but not Nomad if there are still concerns there.

connlimit.go Outdated Show resolved Hide resolved
@pierresouchay
Copy link
Contributor Author

Hello @i0rek @schmichael ,

Do you have further comments on this PR?

Reminder:

Regards

@jeromegn
Copy link

This would've helped us figure out what settings to tweak when we started getting EOF from the API.

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this @pierresouchay and thanks for stepping in with the review @banks.

We'll use HTTPConnStateFuncWithErrorHandler in Nomad to also tick a metric (and maybe log?) on rejected IPs as well, so the design is appreciated.

@banks banks merged commit cf3e679 into hashicorp:master Jun 22, 2020
@pierresouchay
Copy link
Contributor Author

@banks Would it require a new release first?

@banks
Copy link
Member

banks commented Jun 23, 2020

Done: https://github.com/hashicorp/go-connlimit/releases/tag/v0.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants