Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3scale batcher policy #685

Merged
merged 23 commits into from
May 9, 2018
Merged

3scale batcher policy #685

merged 23 commits into from
May 9, 2018

Conversation

davidor
Copy link
Contributor

@davidor davidor commented Apr 27, 2018

This PR introduces a new policy. Its goal is to reduce latency and increase throughput by significantly reducing the number of requests made to the 3scale backend. In order to achieve that, this policy caches authorization statuses and batches reports. In order to achieve that, it makes some trade-offs regarding rate-limits accuracy. I've tried to document everything in the README included in this PR.

I've tagged this as WIP because there are some things that need to be fixed. I've documented them in a TODO section included in the first lines of the new policy module. I decided to open the PR because I'd appreciate some early feedback, @mikz

I need to perform more tests to check that the policy is indeed fast and correct, but my first tests are really promising. The performance of the policy depends heavily on the cache hit ratio. For use cases with a relatively low number of services, apps, etc. this could easily bring a big improvement over the APIcast policy by sacrificing some accuracy when applying rate limits.

@davidor davidor force-pushed the split-3scale-auth-and-rep-policy branch 9 times, most recently from a0a36ab to 7fe0c01 Compare April 30, 2018 16:13
Copy link
Contributor

@mikz mikz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good stuff.

We should set some goals so we know we are there or not and can choose the right compromise between speed and safety.

@@ -168,6 +173,40 @@ function _M:authorize(...)
return call_backend_transaction(self, auth_uri, authorize_options(using_oauth), ...)
end

local function add_transaction(transactions, index, cred_type, cred, reports)
transactions['transactions[' .. index .. '][' .. cred_type .. ']'] = cred
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use string.format for this to minimize string allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right 👍
Fixed.

@@ -120,6 +121,10 @@ local function auth_path(using_oauth)
'/transactions/authorize.xml'
end

local function report_path()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just module property instead of a function, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right 👍
Fixed.

stored in the 3scale backend database. In summary, going over the defined usage
limits is easier. The APIcast policy reports to 3scale backend every time it
receives a request. Reports are asynchronous and that means that we can go over
the limits for a brief window of time. On the other hand, this policy reports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to explore different strategy for reporting.

Right now there is a timer to report every N seconds.

I could imagine a strategy where it would report continuously.
Lets say there is a request and it triggers backend call. Then another request comes and because the backend call is still active it would be added to shmem. Then when the backend call finishes it can collect cached reports and issue new call.

So basically it would just cache the parallel calls and make them 1:1 to backend.

All these optimizations have compromises and to chose the right one we should have some target. There is plenty ways how to do this with different tradeoffs and performance characteristics. We should define some bar we want to reach and chose the correct way to get there.

Fully caching everything and then reporting to backend in one batch definitely has the best performance, but also the highest chance of going wrong.
This continuous reporting would not do 10x gain, maybe just 2x or 3x, but would be safer and more accurate.

Copy link
Member

@eguzki eguzki May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend report scheduling strategy is a hard problem. Definitely, trial an error with different approaches is the best strategy to come up with the best solution within our constraints.

I think it may depend on cache hit ratio. When hit ratio is high, the best approximation might be waiting as much as we can to achieve the highest batch size in allowed time window. On the other hand, when hit ratio is low, waiting does not make sense, keeping memory file descriptors and timers (through fd's) is expensive.

We could implement a dynamic algorithm that changes strategies depending on cache hit ratio that we could measure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On event based systems, locks are very painful with unfordable penalties in performance.

I suggest we do not use shared resources and IPC mechanisms. Each worker keeps its own state. Performance and low latency is the advantage. The drawback is increasing backend traffic since batching level is lower. Again depending on cache hit ratio. Good when cache hit ratio is high. Does not make sense on high cache miss scenarios.

local service_id = service.id
local credentials = context.credentials

ensure_report_timer_on(self, service_id, backend)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely have to expose some way to init/destroy policies so they can do stuff like this. Global policies can do this in init_worker and we need some mechanics for local policies too. Including destructors because they can get GC'd.

-- - Timeouts in locks are not handled properly. The request simply fails when
-- this happens.
--
-- - Evaluate using a local hash and semaphores in the reports batcher instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local hash would be lost when worker crashes. Maybe using shdict but still parallelizing the collection / reporting across workers?

Using https://github.com/openresty/lua-nginx-module#ngxshareddictrpush we could immediately add reports to the list and they could be processed by several workers.

Maybe to limit congestion and improve cache locality we could have a semaphore to signal how many workers there should be running.

rpush returns the length of the list. We could simply call semaphore:push(1) when that length is over some batch size. That would wake up the timer.every loop waiting for the semaphore by semaphore:wait(t). When timer loop reaches empty list it can go back to semaphore:wait() to wait for some elements in the list.

Just food for though. There are some nice algorithms we could implement like https://elixir-lang.org/blog/2016/07/14/announcing-genstage/

end
else
if cached_auth.status == 200 then
self.reports_batcher:add(service_id, credentials, usage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could go to :log phase so client does not see the impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍
Why log instead of post_action ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do that @mikz

resty.lock.lock() calls ngx.sleep() which cannot be run in the log phase:
https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L151
https://github.com/openresty/lua-nginx-module#ngxsleep

I realized because I saw errors like this one in the logs:

[error] 7127#7127: *150 failed to run log_by_lua*: /usr/local/openresty/lualib/resty/lock.lua:151: API disabled in the context of log_by_lua*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible to do that by creating a timer.at(0) to run in the background, but that we can explore later.

@davidor davidor force-pushed the split-3scale-auth-and-rep-policy branch 2 times, most recently from a368a16 to 339089f Compare May 2, 2018 09:08
@mikz
Copy link
Contributor

mikz commented May 2, 2018 via email

@andrewdavidmackenzie
Copy link
Member

Three related questions David:

  • did you have it as a goal to reduce (gateway) latency (I didn't, but that's OK :-) )?
  • do you think it actually will reduce latency?
  • will you be doing any latency measurements at scale to confirm/deny improvement?

@davidor
Copy link
Contributor Author

davidor commented May 2, 2018

@andrewdavidmackenzie The main goal was to increase throughput by reducing the backend load, but I think latencies will improve too.

Regarding measurements, I did some performance tests on an Openshift cluster with the traffic profile designed by @eguzki . The results were very good: >10x throughput with a TTL for auths of 10s (per gateway), and 10s of batching (per Apicast worker). I plan to add more tests, make some minor changes, and check that everything works as expected. After that, I'll repeat the benchmarks and paste the results here.

@andrewdavidmackenzie
Copy link
Member

andrewdavidmackenzie commented May 2, 2018 via email

@davidor davidor force-pushed the split-3scale-auth-and-rep-policy branch 8 times, most recently from 61c0992 to b1185c7 Compare May 7, 2018 14:27
@davidor
Copy link
Contributor Author

davidor commented May 8, 2018

I tried this and it works quite well.

I performed the tests in an Openshift cluster with apicast and 3scale backend deployed. The cluster has 4 c4.xlarge compute nodes. I configured the policy to cache auths for 10 seconds (per apicast) and batch reports for 10 seconds (per apicast worker).

I tested with a pattern traffic designed by @eguzki . It creates 3 services, and 100 apps and 3 metrics for each of those services. All requests report 1, 2, or 3 metrics.

Using this policy, it’s possible to achieve 18k rps. Which is more than 10x what we get without using this policy. Of course, as explained in the design document added in this PR, the comparison is not fair because of the trade-offs that this policy makes regarding the accuracy of rate-limits. However, given the significant improvement in throughput, I believe this policy can be very useful in some use cases.

I think there’s room for improvement. Also, I’d like to try different strategies for batching the reports as @mikz and @eguzki suggested. However, I’d like to merge this PR first. I think the code is now in mergeable state and well tested. @mikz I’d like you to do a final review before merging to confirm. After merging this, we’ll be able to iterate on this and try other strategies, even evaluate if it would make sense to include config params to select different strategies for batching. For example, some users might prefer to perform a per-worker batching to achieve higher throughput at the risk of losing some reports if a specific worker dies for some reason.

@mikz The majority of the changes I did since your review are in the last 4 commits. I included a README explaining the trade-offs of this policy, and also added unit tests for the policy module (3scale_batcher_spec.lua) and integration tests (apicast-policy-3scale-batcher.t). I also cleaned a bit the code in other classes, added some documentation and addressed your comments.

Let me know what you think @mikz

@davidor davidor changed the title [WIP] 3scale batcher policy 3scale batcher policy May 8, 2018
@@ -159,6 +159,13 @@ http {

lua_shared_dict limiter 1m;

# This shared dictionaries are only used in the 3scale batcher policy.
# This is not ideal, but they'll need to be here until we allow policies to
# modify this template.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record: #477

end
end

local function format_transactions(reports_batch)
Copy link
Contributor

@mikz mikz May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this method should live in the backend client.
It depends on ReportsBatch.

IMO this client should be unaware of any of other objects and just work with plain tables (and possibly __tostring metamethod on them).

edit: or some API exposed by this backend client returning tables that properly serialize.

self.reports_batcher:add(to_batch.service_id, to_batch.credentials, to_batch.usage)
elseif backend_status >= 400 and backend_status < 500 then
local rejection_reason = rejection_reason_from_headers(backend_res.headers)
self.auths_cache:set(service_id, credentials, usage, backend_status, rejection_reason)
Copy link
Contributor

@mikz mikz May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If service_id, credentials and usage are always passed together to the auths_cache then they might deserve own object. And that can be reused to the reports batcher ?

This looks like clear parameter coupling and makes the parameter list really long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about that too. I think there are some refactoring opportunities there. Maybe we could extract some classes like Report or Authorization. I'd rather leave this for a future PR though.

@davidor davidor force-pushed the split-3scale-auth-and-rep-policy branch from 4a25e31 to 3cf29ba Compare May 9, 2018 15:23
local add_ok, add_err = self.storage:safe_add(key, deltas[metric])

if not add_ok then
if add_err == 'no_memory' then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry 🤦‍♂️

Good catch! Should be fixed now.

@davidor davidor force-pushed the split-3scale-auth-and-rep-policy branch from 3cf29ba to ecfa88e Compare May 9, 2018 15:38
Copy link
Contributor

@mikz mikz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@davidor davidor merged commit 7b06e3e into master May 9, 2018
@davidor davidor deleted the split-3scale-auth-and-rep-policy branch May 9, 2018 15:59
@davidor davidor added this to the 3.3 milestone Jun 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants