rate-limiting plugin works randomly (in redis/cluster mode) #4379

WojciechBaran-TomTom · 2019-03-11T14:24:57Z

Summary

The rate-limiting plugin fails to throttle when configured in redis/cluster mode.
It seems that under a load the kong node does not decrease the limit and/or does not respond with the X-RateLimit-* headers.

Steps To Reproduce

Create a test service/route pair with a rate-limiting plugin running in redis or cluster mode
Perform a N synchronous GET requests on that endpoint
Check if all the responses contain X-RateLimit-* headers
Check if the remaining limit has been decreased by N

Additional Details & Logs

Kong version 1.0.2

It seems that the code defined outside of RateLimitingHandler:access is sometimes just not run.
It affects both the RateLimitingHandler:header_filter (which results in no rate-limiting headers being sent in the response) and the RateLimitingHandler:log (which results in no limit being consumed by the request).

h3xar0n · 2019-03-11T17:49:22Z

Hi @wbarantt , could you please share the following:

the environment in which you're running Kong
kong.conf values
Kong error logs

WojciechBaran-TomTom · 2019-03-12T08:04:13Z

It's just stock a stock kong with no custom configuration.
I have configured a per consumer rate-limitimg on my test route and started the GET traffic.

No errors in the logs.
If I break the redis host/port configuration then it will fail with 500 or return 200 (depending on the fault tolerant setting) so it does not seem to be a problem with kong<>redis communication.
It works fine if there's no other traffic (going through another route) but once the number of requests increases the plugin execution seems to become disordered.

p0pr0ck5 · 2019-03-12T20:34:24Z

@wbarantt we'd still like to know a bit more about your Kong configuration :) Are you deploying a cluster of Kong nodes behind an LB? Can you please share the exact plugin config you've enabled? And how many distinct consumers are sending requests to Kong when you observe this behavior?

TBH this sounds a bit like #3329, but we'd love to learn more about your use case to get a better understanding of what's going on. Thanks!

WojciechBaran-TomTom · 2019-03-13T11:41:46Z

This is my whole test config:
kong_config.txt

This is how I test it:
for i in seq -w 1 20;do echo -e "RUN:\t=====\t#${i}\t====="; curl -sD - -o /dev/null -H "Host:httpbin.org" http://localhost:8000/httpbin_01/headers;done

This is what I get with no other traffic coming to this kong node:
RUN: ===== #11 =====
HTTP/1.1 429 Too Many Requests
Date: Wed, 13 Mar 2019 11:26:47 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
WWW-Authenticate: Key realm="kong"
Content-Length: 37
X-RateLimit-Remaining-second: 8
X-RateLimit-Limit-second: 10
X-RateLimit-Remaining-minute: 0
X-RateLimit-Limit-minute: 10
Server: kong/1.0.2

When I turn on the regular traffic the kong starts to skip and results in sth like this:
RUN: ===== #17 =====
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 281
Connection: keep-alive
WWW-Authenticate: Key realm="kong"
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Date: Wed, 13 Mar 2019 10:44:55 GMT
Server: nginx
X-RateLimit-Remaining-second: 9
X-RateLimit-Limit-second: 10
X-RateLimit-Remaining-minute: 0
X-RateLimit-Limit-minute: 10
X-Kong-Upstream-Latency: 72
X-Kong-Proxy-Latency: 1
Via: kong/1.0.2

RUN: ===== #18 =====
HTTP/1.1 429 Too Many Requests
Date: Wed, 13 Mar 2019 10:44:57 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
WWW-Authenticate: Key realm="kong"
Content-Length: 37
X-RateLimit-Remaining-second: 10
X-RateLimit-Limit-second: 10
X-RateLimit-Remaining-minute: 0
X-RateLimit-Limit-minute: 10
Server: kong/1.0.2

So 7 additional requests have passed through the kong...

Here's the full log:
kong_log.txt
Some responses are missing the X-RateLimit headers, some of them were not counted.

anujjalan · 2019-03-23T08:00:09Z

We are facing a similar issue after upgrade.

We were earlier on 0.14.0 version where this used to work fine but after our upgrade to 1.0.2, rate limiting numbers are not right and we suspect plugin is not even getting executed for many concurrent calls.

We did not found any errors related to redis / plugin but still only 60% of requests were counted as part of rate limiting.

qiao-meng-zefr · 2019-03-29T02:06:27Z

I am getting the same issue on Kong 1.0.3

In one of my test I made total of 40 calls (with 20 workers) to kong, rate limited was set to 10 per hour on the consumer level. So the expected result is only 10 calls can go through and the rest 30 should get 429 response.

However, all calls got 200 response. 10 response has the X-RateLimit* headers. It looks like only the X-RateLimit* headers are missing from the other 30 calls. All 40 calls have other kong headers such as X-Kong-Upstream-Latency, etc.

ganeshs · 2019-04-22T06:50:39Z

I'm facing similar issue after upgrading from 0.14.0 to 1.0.2. The plugin seem to be skipped for certain % of requests.

ganeshs · 2019-05-04T03:51:33Z

I did some debugging and could zero down the issue to this piece of code below,

function RateLimitingHandler:log(_)
  if kong.ctx.plugin.timer then
    kong.ctx.plugin.timer()
  end
end

I configured rate-limiting plugin to use redis with a threshold of 1000 requests per min. In the test run, I fired 2000 requests at a concurrency of 100, the redis counter was incremented only 367 times instead of 1000.

The kong.ctx.plugin.timer is null for rest of them. I suspect its getting cleared somewhere else in the flow. As per documentation, kong.ctx.plugin is accessible within the plugin instance at request level. The same test with 0.14.1 works pretty fine.

I had key-auth, rate-limiting, prometheus and syslog plugins configured for my route. When I disabled syslog and prometheus, the rate-limiting works fine. When I enable one of them, it doesn't

@hbagdi @thibaultcha Would really appreciate if you can give some pointers on changes related to the kong.ctx.plugin in 1.x release. Would be more than happy to fix and submit a PR. Thanks

ganeshs · 2019-05-04T05:19:51Z

Btw, this change was introduced in this commit,
0ad4aa3

The 0.14.1 version was incrementing the redis counter in the access phase. In 1.x, its done in the log phase.

After making the below fix, I can confirm that rate-limiting works as before,

--kong.ctx.plugin.timer = function()
    local ok, err = timer_at(0, increment, conf, limits, identifier, current_timestamp, 1)
    if not ok then
      kong.log.err(“failed to create timer: “, err)
    end
--end

--function RateLimitingHandler:log(_)
--  if kong.ctx.plugin.timer then
--    kong.ctx.plugin.timer()
--  end
--end

bungle · 2019-05-05T18:35:24Z

@ganeshs yes, that is a good workaround. we need to check what could be overwriting (or why it is cleared or perhaps not even se in a first place) the context value.

xuanit · 2019-07-22T07:47:40Z

I'm facing the same issue. Do we have a plan to fix this issue in Kong?

ordith · 2019-09-26T19:27:34Z

We are also facing this issue. Is there any update on when this might be addressed?

rainest · 2020-01-21T23:34:44Z

#5460 should likely fix this. An unknown issue clears the plugin ctx contents, and rate-limiting increments counters based on data stored in it.

#5459 may address the issue with ctx, but we do not have definitive proof of that issue's exact cause yet. As such, 5460 ignores any problem with ctx whatsoever by moving all rate-limiting functionality into the access phase.

* Store namespace keys in `ngx.ctx` to ensure all dynamically generated namespaces are isolated on a per-request basis. * Introduce a new global API (which is a private API for use by core only) to explicitly delete a namespace. A note on the benefits of this new implementation: * By using a table to keep track of namespaces in `ngx.ctx`, we ensure that users of `ngx.ctx` cannot access the table namespaces, thus properly isolating it and avoiding polluting `ngx.ctx` for core and/or plugins. We can think of it as a double-pointer reference to the namespace. Each request gets a pre-allocated table for 4 namespaces (of course not limited to 4), with the assumption that most instances do not execute more than 4 plugins using `kong.ctx.plugin` at a time. * We ensure that namespace key references cannot be `nil` to ensure a safe usage from the core. * All namespace key references are still weak when assigned to a namespace, thus tying the lifetime of the namespace to that of its key reference. Similarly to the previous implementation, this is done to ensure we avoid potential memory leaks. That said, the `kong.ctx.plugin` namespace does use the plugin's conf for that purpose anymore (See 40dc146), which alleviates such concerns in the current usage of this API. * All tables allocated for namespace key references and namespaces keys themselves will be released when `ngx.ctx` will be GC'ed. * We also ensure than `kong.ctx.*` returns `nil` when `ngx.ctx` is not supported ("init" phase). Co-Authored-By: tdelaune <[email protected]> Fix #4379 Fix #5853 See #5851 See #5459

p0pr0ck5 added the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Mar 12, 2019

p0pr0ck5 added task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not. and removed pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... labels Mar 20, 2019

bungle mentioned this issue May 8, 2019

feat(plugins) log request TLS version #4581

Merged

tdelaune mentioned this issue May 3, 2020

fix(pdk) kong.ctx.plugin.* must be stored in ngx.ctx #5851

Closed

thibaultcha mentioned this issue May 8, 2020

fix(pdk) ensure 'kong.ctx.plugin' and 'kong.log' are light-threads safe #5873

Merged

thibaultcha closed this as completed in #5873 May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rate-limiting plugin works randomly (in redis/cluster mode) #4379

rate-limiting plugin works randomly (in redis/cluster mode) #4379

WojciechBaran-TomTom commented Mar 11, 2019

h3xar0n commented Mar 11, 2019

WojciechBaran-TomTom commented Mar 12, 2019

p0pr0ck5 commented Mar 12, 2019

WojciechBaran-TomTom commented Mar 13, 2019

anujjalan commented Mar 23, 2019

qiao-meng-zefr commented Mar 29, 2019

ganeshs commented Apr 22, 2019

ganeshs commented May 4, 2019 •

edited

Loading

ganeshs commented May 4, 2019 •

edited

Loading

bungle commented May 5, 2019 •

edited

Loading

xuanit commented Jul 22, 2019

ordith commented Sep 26, 2019

rainest commented Jan 21, 2020

rate-limiting plugin works randomly (in redis/cluster mode) #4379

rate-limiting plugin works randomly (in redis/cluster mode) #4379

Comments

WojciechBaran-TomTom commented Mar 11, 2019

Summary

Steps To Reproduce

Additional Details & Logs

h3xar0n commented Mar 11, 2019

WojciechBaran-TomTom commented Mar 12, 2019

p0pr0ck5 commented Mar 12, 2019

WojciechBaran-TomTom commented Mar 13, 2019

anujjalan commented Mar 23, 2019

qiao-meng-zefr commented Mar 29, 2019

ganeshs commented Apr 22, 2019

ganeshs commented May 4, 2019 • edited Loading

ganeshs commented May 4, 2019 • edited Loading

bungle commented May 5, 2019 • edited Loading

xuanit commented Jul 22, 2019

ordith commented Sep 26, 2019

rainest commented Jan 21, 2020

ganeshs commented May 4, 2019 •

edited

Loading

ganeshs commented May 4, 2019 •

edited

Loading

bungle commented May 5, 2019 •

edited

Loading