-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add internal cache for /api/v1/services with scheduled update #1554
add internal cache for /api/v1/services with scheduled update #1554
Conversation
Here are two related issues on this topic to review before we talk about
new code and where:
cache control:
#718 (comment)
elasticsearch performance: #1526
do you mind commenting inside each, particularly in #1526 note how much you
are ingesting and if you are doing any sampling.
|
Sure, I've seen these issues before, but I'll look for them again and put comments. |
one last thing to verify. we recently compensated for this in #1538 |
Yes, we're already using it, it's a great thing and helps a lot! But I suppose if we increase our workload significantly (and yes, we will do it), we will again face the same problem (and we won't be able to decrease QUERY_LOOPBACK because it's not affordable for our users). |
Yes, we're already using it, it's a great thing and helps a lot! But I
suppose if we increase our workload significantly (and yes, we will do it),
we will again face the same problem (and we won't be able to decrease
QUERY_LOOPBACK because it's not affordable for our users).
cool. just covering the bases..
I think we'll end up with a decision about where to introduce a cache if we
do (in UI code vs in the ES impl).
Also, if that cache needs to be managed like it is here (the response is so
slow that you can't rely on users to ever succeed). I kindof prefer not
having machinery in-process, as it would be handy regardless of the cache
impl for it to be user scheduled. maybe there's something we can do in the
UI to not block, but not make repeated calls, when a user first calls for
service and span names?
then there's also the chance we toy with data format.. I forget which issue
is tracking that one, but flattening service+span names inside ES might end
up being a way out.
|
moved my last comment to #1526 which was the issue I thought I was on! |
@semyonslepov assuming w/current perf we might be able to close this? |
So, results of the feature with new indexes are really good at response times. For our current setup and number of users, it's good enough. We will try to live with upstream version anyway (obviously it's much better than keep our local patches and synchronize them on new releases). In the bad case (I don't expect it now, but our setup is very young and we don't really know yet amount of our future users), we'll have to return to these patches or some similar machinery, because it doesn't depend on user count/RPS. (Or maybe there will be another better way to avoid such problems. For example, we thought about trying Cassandra instead of ES, haven't ever used it with Zipkin yet) |
I would love to hear if you end up with hundreds of users who all refresh
browser cache/expire these headers :) would be a sign of a very effective
deployment.
I do think we will get better and probably simplest might be an caching
intermediary as it could be applied to all and invalidation isnt terribly
important with service names. Simpler than changing storage (plus not
positive of perf differences here or cassandra either). We could also
consider an optional caching decorator storagecomponent.
I do think we will change the format stored in ES by year end though this
will likely only help with trace queries as service span are now simple as
possible.
Meanwhile thanks for all the feedback. You have been very helpful in this!
|
We use Zipkin with ElasticSearch in AWS and have the following problem: on a big amount of data request to /api/v1/services takes too long time and often we have timeouts in UI.
There is a try to implement opt-in internal caching of getServiceNames() result with scheduled updates. Now we are using this patch in our external environment, it improves a bit our users' experience. Hope it can help for upstream (maybe in another way of implementation).