-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3scale batcher policy #685
Conversation
a0a36ab
to
7fe0c01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 good stuff.
We should set some goals so we know we are there or not and can choose the right compromise between speed and safety.
@@ -168,6 +173,40 @@ function _M:authorize(...) | |||
return call_backend_transaction(self, auth_uri, authorize_options(using_oauth), ...) | |||
end | |||
|
|||
local function add_transaction(transactions, index, cred_type, cred, reports) | |||
transactions['transactions[' .. index .. '][' .. cred_type .. ']'] = cred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use string.format for this to minimize string allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right 👍
Fixed.
@@ -120,6 +121,10 @@ local function auth_path(using_oauth) | |||
'/transactions/authorize.xml' | |||
end | |||
|
|||
local function report_path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be just module property instead of a function, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right 👍
Fixed.
stored in the 3scale backend database. In summary, going over the defined usage | ||
limits is easier. The APIcast policy reports to 3scale backend every time it | ||
receives a request. Reports are asynchronous and that means that we can go over | ||
the limits for a brief window of time. On the other hand, this policy reports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to explore different strategy for reporting.
Right now there is a timer to report every N seconds.
I could imagine a strategy where it would report continuously.
Lets say there is a request and it triggers backend call. Then another request comes and because the backend call is still active it would be added to shmem. Then when the backend call finishes it can collect cached reports and issue new call.
So basically it would just cache the parallel calls and make them 1:1 to backend.
All these optimizations have compromises and to chose the right one we should have some target. There is plenty ways how to do this with different tradeoffs and performance characteristics. We should define some bar we want to reach and chose the correct way to get there.
Fully caching everything and then reporting to backend in one batch definitely has the best performance, but also the highest chance of going wrong.
This continuous reporting would not do 10x gain, maybe just 2x or 3x, but would be safer and more accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backend report scheduling strategy is a hard problem. Definitely, trial an error with different approaches is the best strategy to come up with the best solution within our constraints.
I think it may depend on cache hit ratio. When hit ratio is high, the best approximation might be waiting as much as we can to achieve the highest batch size in allowed time window. On the other hand, when hit ratio is low, waiting does not make sense, keeping memory file descriptors and timers (through fd's) is expensive.
We could implement a dynamic algorithm that changes strategies depending on cache hit ratio that we could measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On event based systems, locks are very painful with unfordable penalties in performance.
I suggest we do not use shared resources and IPC mechanisms. Each worker keeps its own state. Performance and low latency is the advantage. The drawback is increasing backend traffic since batching level is lower. Again depending on cache hit ratio. Good when cache hit ratio is high. Does not make sense on high cache miss scenarios.
local service_id = service.id | ||
local credentials = context.credentials | ||
|
||
ensure_report_timer_on(self, service_id, backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely have to expose some way to init/destroy policies so they can do stuff like this. Global policies can do this in init_worker and we need some mechanics for local policies too. Including destructors because they can get GC'd.
-- - Timeouts in locks are not handled properly. The request simply fails when | ||
-- this happens. | ||
-- | ||
-- - Evaluate using a local hash and semaphores in the reports batcher instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local hash would be lost when worker crashes. Maybe using shdict but still parallelizing the collection / reporting across workers?
Using https://github.com/openresty/lua-nginx-module#ngxshareddictrpush we could immediately add reports to the list and they could be processed by several workers.
Maybe to limit congestion and improve cache locality we could have a semaphore to signal how many workers there should be running.
rpush
returns the length of the list. We could simply call semaphore:push(1)
when that length is over some batch size. That would wake up the timer.every
loop waiting for the semaphore by semaphore:wait(t)
. When timer loop reaches empty list it can go back to semaphore:wait()
to wait for some elements in the list.
Just food for though. There are some nice algorithms we could implement like https://elixir-lang.org/blog/2016/07/14/announcing-genstage/
end | ||
else | ||
if cached_auth.status == 200 then | ||
self.reports_batcher:add(service_id, credentials, usage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could go to :log
phase so client does not see the impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍
Why log
instead of post_action
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do that @mikz
resty.lock.lock()
calls ngx.sleep()
which cannot be run in the log phase:
https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L151
https://github.com/openresty/lua-nginx-module#ngxsleep
I realized because I saw errors like this one in the logs:
[error] 7127#7127: *150 failed to run log_by_lua*: /usr/local/openresty/lualib/resty/lock.lua:151: API disabled in the context of log_by_lua*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be possible to do that by creating a timer.at(0) to run in the background, but that we can explore later.
a368a16
to
339089f
Compare
I believe post_action delay would still be visible to the client but log would be entirely out of band.
2. 5. 2018 v 11:06, David Ortiz <[email protected]>:
… @davidor commented on this pull request.
In gateway/src/apicast/policy/3scale_batcher/3scale_batcher.lua:
> + local backend_res = backend:authorize(formatted_usage, credentials)
+ local backend_status = backend_res.status
+
+ if backend_status == 200 then
+ self.auths_cache:set(service_id, credentials, usage, 200)
+ self.reports_batcher:add(service_id, credentials, usage)
+ elseif backend_status >= 400 and backend_status < 500 then
+ local rejection_reason = rejection_reason_from_headers(backend_res.headers)
+ self.auths_cache:set(service_id, credentials, usage, backend_status, rejection_reason)
+ return error(service, rejection_reason)
+ else
+ return error(service)
+ end
+ else
+ if cached_auth.status == 200 then
+ self.reports_batcher:add(service_id, credentials, usage)
Good idea 👍
Why log instead of post_action ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Three related questions David:
|
@andrewdavidmackenzie The main goal was to increase throughput by reducing the backend load, but I think latencies will improve too. Regarding measurements, I did some performance tests on an Openshift cluster with the traffic profile designed by @eguzki . The results were very good: >10x throughput with a TTL for auths of 10s (per gateway), and 10s of batching (per Apicast worker). I plan to add more tests, make some minor changes, and check that everything works as expected. After that, I'll repeat the benchmarks and paste the results here. |
Understand, cool.
Unclear if test results will include latency measurements also...
On Wed, May 2, 2018, 15:57 David Ortiz ***@***.***> wrote:
@andrewdavidmackenzie <https://github.com/andrewdavidmackenzie> The main
goal was to increase throughput by reducing the backend load, but I think
latencies will improve too.
Regarding measurements, I did some performance tests on an Openshift
cluster with the traffic profile designed by @eguzki
<https://github.com/eguzki> . The results were very good: ~10x throughput
with a TTL for auths of 10s (per gateway), and 10s of batching (per Apicast
worker). I plan to add more tests, make some minor changes, and check that
everything works as expected. After that, I'll repeat the benchmarks and
paste the results here.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#685 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFReLBk_R1OvWawX5gcoC0ZfTaAf9Jb7ks5tubsmgaJpZM4TqiP3>
.
--
regards
Andrew
Andrew Mackenzie
Director of Software Engineering
3scale by Red Hat
|
61c0992
to
b1185c7
Compare
I tried this and it works quite well. I performed the tests in an Openshift cluster with apicast and 3scale backend deployed. The cluster has 4 I tested with a pattern traffic designed by @eguzki . It creates 3 services, and 100 apps and 3 metrics for each of those services. All requests report 1, 2, or 3 metrics. Using this policy, it’s possible to achieve 18k rps. Which is more than 10x what we get without using this policy. Of course, as explained in the design document added in this PR, the comparison is not fair because of the trade-offs that this policy makes regarding the accuracy of rate-limits. However, given the significant improvement in throughput, I believe this policy can be very useful in some use cases. I think there’s room for improvement. Also, I’d like to try different strategies for batching the reports as @mikz and @eguzki suggested. However, I’d like to merge this PR first. I think the code is now in mergeable state and well tested. @mikz I’d like you to do a final review before merging to confirm. After merging this, we’ll be able to iterate on this and try other strategies, even evaluate if it would make sense to include config params to select different strategies for batching. For example, some users might prefer to perform a per-worker batching to achieve higher throughput at the risk of losing some reports if a specific worker dies for some reason. @mikz The majority of the changes I did since your review are in the last 4 commits. I included a README explaining the trade-offs of this policy, and also added unit tests for the policy module ( Let me know what you think @mikz |
@@ -159,6 +159,13 @@ http { | |||
|
|||
lua_shared_dict limiter 1m; | |||
|
|||
# This shared dictionaries are only used in the 3scale batcher policy. | |||
# This is not ideal, but they'll need to be here until we allow policies to | |||
# modify this template. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the record: #477
end | ||
end | ||
|
||
local function format_transactions(reports_batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this method should live in the backend client.
It depends on ReportsBatch.
IMO this client should be unaware of any of other objects and just work with plain tables (and possibly __tostring metamethod on them).
edit: or some API exposed by this backend client returning tables that properly serialize.
self.reports_batcher:add(to_batch.service_id, to_batch.credentials, to_batch.usage) | ||
elseif backend_status >= 400 and backend_status < 500 then | ||
local rejection_reason = rejection_reason_from_headers(backend_res.headers) | ||
self.auths_cache:set(service_id, credentials, usage, backend_status, rejection_reason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If service_id
, credentials
and usage
are always passed together to the auths_cache
then they might deserve own object. And that can be reused to the reports batcher ?
This looks like clear parameter coupling and makes the parameter list really long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought about that too. I think there are some refactoring opportunities there. Maybe we could extract some classes like Report
or Authorization
. I'd rather leave this for a future PR though.
So they can be returned from other policies.
This is needed to have policies that perform authorize and reporting to 3scale. If another policy have already done that, we need a why to avoid repeating the work in the Apicast policy.
4a25e31
to
3cf29ba
Compare
local add_ok, add_err = self.storage:safe_add(key, deltas[metric]) | ||
|
||
if not add_ok then | ||
if add_err == 'no_memory' then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error is no memory
(with
instead of _
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry 🤦♂️
Good catch! Should be fixed now.
3cf29ba
to
ecfa88e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This PR introduces a new policy. Its goal is to reduce latency and increase throughput by significantly reducing the number of requests made to the 3scale backend. In order to achieve that, this policy caches authorization statuses and batches reports. In order to achieve that, it makes some trade-offs regarding rate-limits accuracy. I've tried to document everything in the README included in this PR.
I've tagged this as WIP because there are some things that need to be fixed. I've documented them in a TODO section included in the first lines of the new policy module. I decided to open the PR because I'd appreciate some early feedback, @mikz
I need to perform more tests to check that the policy is indeed fast and correct, but my first tests are really promising. The performance of the policy depends heavily on the cache hit ratio. For use cases with a relatively low number of services, apps, etc. this could easily bring a big improvement over the APIcast policy by sacrificing some accuracy when applying rate limits.