Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kuma-cp) Read only cache manager #634

Merged
merged 2 commits into from
Mar 17, 2020
Merged

Conversation

jakubdyszkiewicz
Copy link
Contributor

Summary

When Envoy connects to CP through the XDS API, the CP starts a goroutine with a reconciliation process that pulls Dataplane and Mesh definition as well as list of policies (TrafficLog, TrafficPermissions etc.). Then it does computation which policies are relevant and builds Envoy config to push.

This process is executed every X second (1s by default) for every goroutine. With let's say 1000 goroutines executing every second. We can cache common requests across goroutines (list of TrafficLogs, TrafficPermissions etc.). Let's say we've got 10 common requests, it can save us 9000rps.

This cache not only saves us time for accessing DB but also we don't need to convert from native model to Kuma model.

The cache is turned on by default with 1s of expiration time.

Performance tests

I did a performance tests on my 2019 Macbook with 6 core i7 and 16GB ram.
I applied 50 TrafficPermissions and TrafficLogs, and I used the test client available here
https://github.com/Kong/kuma/blob/master/pkg/test/xds/client/app/main.go

Before change:
200 dataplanes - Kuma CP ~350% CPU, Postgres ~4% CPU
250 dataplanes - Kuma CP ~800% CPU, Postgres ~8% CPU
300 dataplanes - Kuma CP ~1000% CPU, Postgres ~1% CPU

After change:
300 dataplanes - Kuma CP ~50% CPU, Postgres ~0.1% CPU
1000 dataplanes - Kuma CP ~700% CPU, Postgres ~4% CPU
1500 dataplanes - Kuma CP ~900% CPU, Postgres ~1% CPU

In the last case, I can only guess that CP spent too much on generating config so it won't hit the DB often enough. To find out exact bottlenecks we need to spend more time on profiling and perf tuning, but it's clear that the performance gain is around 5-10x on Kuma CP and Postgres CPU usage and depending on the case.

@jakubdyszkiewicz jakubdyszkiewicz requested review from a team and lobkovilya March 17, 2020 11:43
@jakubdyszkiewicz jakubdyszkiewicz merged commit fe9d6da into master Mar 17, 2020
@jakubdyszkiewicz jakubdyszkiewicz deleted the feat/cache-v3 branch March 17, 2020 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants