Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul 1.6.3 DDos using consul-template (1.6.2 working fine) #1346

Closed
obourdon opened this issue Feb 24, 2020 · 7 comments
Closed

Consul 1.6.3 DDos using consul-template (1.6.2 working fine) #1346

obourdon opened this issue Feb 24, 2020 · 7 comments
Labels
bug consul Related to the Consul integration question waiting-reply

Comments

@obourdon
Copy link
Contributor

Please have a look at consul issue #7259

Note that running with -once works perfectly well so I guess the "burst" occurs on refresh

I think the issue/configuration values to solve can be both in consul AND consul-template therefore the post of 2 separate issues

Consul Template version

0.24.1

Configuration

I have reproduced the issue isolated from my complete (crypted and SSL secured) environment using official consul Docker container and get the same behaviour: OK with 1.6.2 KO with 1.6.3 and later.

You can find the code to reproduce it yourself here.

Command

idem

Debug output

None to provide at this step (way too big) but again use provided reproductible test case above

Expected behavior

Either success or some limits to be configured properly to prevent this from happening but tried a lot of combination of consul and consul-template configuration parameters without success

Actual behavior

Lots of EOF are seen within consul-template output and consul client node loses connection to consul server/cluster consul members does not work anymore. However after some time, it seems to self-recover

Steps to reproduce

See README.md of reproductible test case

References

See also consul-template issues and PRs #1279, #1066, #1065, #1107

@eikenb
Copy link
Contributor

eikenb commented Mar 6, 2020

I was looking into this and while asking around was pointed to http_max_conns_per_client as the probably source of the problem. I already went into this in the related consul issue. Here's a link to that comment for posterity.

hashicorp/consul#7259 (comment)

@eikenb
Copy link
Contributor

eikenb commented Mar 6, 2020

@obourdon see the above (and linked consul comment). I'm adding this to be sure you get messaged about it. Please let me know if this does/doesn't fix your issue. Thanks.

@obourdon
Copy link
Contributor Author

@eikenb will have a look at this thanks. Sorry for not answering earlier. Will keep you posted

@obourdon
Copy link
Contributor Author

@eikenb and all please have a look at my latest entry in consul issue #7259

@pierresouchay
Copy link

@eikenb Maybe returning HTTP 429 could allow consul-template to detect that behavior and automatically apply rate limit, see hashicorp/consul#7527 and hashicorp/go-connlimit#6

@srstsavage
Copy link

Just sanity checking here, based on @pierresouchay's comment above I went looking for some config switch to manually apply some consul agent concurrent request limiting but didn't find any in consul-template -h or in view.go:

https://github.com/hashicorp/consul-template/blob/main/watch/view.go

Other than raising http_max_conns_per_client in consul (which is a bit unfortunate because the ceiling only needs to be raised for the inital startup/first pass burst of a consul-template instance), it seems that the best we can currently do on the consul-template side is to tune -consul-retry-backoff and -consul-retry-max-backoff to limit the max backoff to a second or two.

Currently we're using consul-template to generate nginx proxy config and due to consul's client rate limiting and consul-template's retry backoff we end up with a long delay when starting up a consul-template instance watching ~200 services.

@pierresouchay
Copy link

pierresouchay commented Nov 20, 2023

@srstsavage if you have issues with that, you might have a look at consul-templaterb which has heuristics to avoid this: https://github.com/criteo/consul-templaterb/

Also an article about it: https://medium.com/criteo-engineering/template-based-discovery-with-consul-templaterb-8ff88434c457

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug consul Related to the Consul integration question waiting-reply
Projects
None yet
Development

No branches or pull requests

4 participants