Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACL plugin cache invalidation failures #988

Closed
msmathers opened this issue Feb 11, 2016 · 3 comments
Closed

ACL plugin cache invalidation failures #988

msmathers opened this issue Feb 11, 2016 · 3 comments
Assignees
Labels

Comments

@msmathers
Copy link

We've begun having problems with Kong's in-memory cache updating after ACL rules are updated. What's tricky is that this seems to be happening in a non-deterministic way.

We're performing the following sequence of events:

  1. Verify that a consumer exists (GET /consumers/id)
  2. Verify that an API exists (GET /apis/id)
  3. If not, create a new API (POST /apis)
  4. Add a new group to a consumer's ACL plugin (POST /consumers/id/acls)
  5. Add the key-auth plugin to the new API (POST /apis/id/plugins)
  6. Add the ACL plugin to the new API with the new group (POST /apis/id/plugins)

The first several times we perform this sequence, we can successfully authenticate and authorize requests to our newly created APIs. However, at a certain point we begin receiving 403 errors when making requests to newly created APIs, with the message "You cannot consume this service" (returned by the ACL plugin).

Once this happens, all new APIs we create suffer from the same problem. When we retrieve the cached ACL key (GET /cache/acls:consumer_id), the payload returned doesn't include the newly created ACL groups. Hitting the live admin endpoint (GET /consumers/id/acls/) does include the newly created ACL groups. Restarting Kong or manually hitting DELETE /cache remedies the problem.

In short, all this points to a failure in Kong to invalidate its in-memory cache when ACL rules are updated on the consumer. The fact that it takes a variable number of hits to produce this behavior suggests a possible race condition?

We're using the following Kong installation:

  • CloudFormation HVM AMI us-east-1
  • Kong (v0.6.1) with Cassandra (v2.2.4)

Here are the nginx logs demonstrating the sequence of Admin requests we're performing:
10.61.211.206 - - [11/Feb/2016:18:13:34 +0000] "GET /consumers/1 HTTP/1.1" 200 99 "-" "-"
10.61.211.206 - - [11/Feb/2016:18:13:34 +0000] "GET /apis/project-21 HTTP/1.1" 404 35 "-" "-"
10.61.211.206 - - [11/Feb/2016:18:13:34 +0000] "POST /apis HTTP/1.1" 201 237 "-" "-"
10.61.211.206 - - [11/Feb/2016:18:13:34 +0000] "GET /consumers/1/acls HTTP/1.1" 200 337 "-" "-"
127.0.0.1 - - [11/Feb/2016:18:13:34 +0000] "POST /cluster/events/ HTTP/1.1" 200 5 "-" "LuaSocket 3.0-rc1"
10.61.211.206 - - [11/Feb/2016:18:13:34 +0000] "POST /consumers/1/acls HTTP/1.1" 201 163 "-" "-"
127.0.0.1 - - [11/Feb/2016:18:13:34 +0000] "POST /cluster/events/ HTTP/1.1" 200 5 "-" "LuaSocket 3.0-rc1"
10.61.211.206 - - [11/Feb/2016:18:13:34 +0000] "POST /apis/project-21/plugins HTTP/1.1" 201 232 "-" "-"
127.0.0.1 - - [11/Feb/2016:18:13:34 +0000] "POST /cluster/events/ HTTP/1.1" 200 5 "-" "LuaSocket 3.0-rc1"
10.61.211.206 - - [11/Feb/2016:18:13:34 +0000] "POST /apis/project-21/plugins HTTP/1.1" 201 203 "-" "-"
127.0.0.1 - - [11/Feb/2016:18:13:34 +0000] "POST /cluster/events/ HTTP/1.1" 200 5 "-" "LuaSocket 3.0-rc1"

@subnetmarco subnetmarco self-assigned this Feb 11, 2016
@subnetmarco
Copy link
Member

@msmathers I am trying to replicate your problem in this test case: https://github.com/Mashape/kong/blob/fix/acl/spec/plugins/acl/access_spec.lua#L144-L188

I am adding one API (with it's key-auth and ACL plugins), and it works. I am adding a second API (with key-auth and ACL plugins) and it also works. What operations should I do from now on to replicate your problem?

Specifically this part:

However, at a certain point we begin receiving 403 errors when making requests to newly created APIs, with the message "You cannot consume this service" (returned by the ACL plugin).

How are you creating the new APIs and what plugins are you associating to them?

@msmathers
Copy link
Author

@thefosk Thanks for looking at this. I just wrote a standalone Python script that mimics our application's interaction with the admin API:
https://gist.github.com/msmathers/94daa1069f80ee4ee134

Every time I run this script I'm able to authenticate & authorize against the first API I create, but then all requests against subsequent APIs fail with a 403 status. If I manually delete the Kong cache or restart Kong, the requests begin working.

Let me know if you need anything else, thanks!

@subnetmarco
Copy link
Member

@msmathers I have found the bug and fixed it. The fix will show up in v0.7.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants