Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance with large Grafana installations #2

Closed
jangaraj opened this issue May 6, 2019 · 7 comments
Closed

Improve performance with large Grafana installations #2

jangaraj opened this issue May 6, 2019 · 7 comments

Comments

@jangaraj
Copy link

jangaraj commented May 6, 2019

1.) Dashboard search run serially
It is slow approach for Grafana instances with many dashboards, e.g.:

2019-05-06 18:12:11,809 [grafana_wtf.core    ] INFO   : Found 1000 dashboards
2019-05-06 18:12:11,809 [grafana_wtf.core    ] INFO   : Fetching dashboards
 20%|███████████████████████▏                  | 198/1000 [00:56<05:12,  2.57it/s

Python multiprocessing (parallel runs) can increase search speed in this case. There should be configuration to limit number of search processes, otherwise it can be "DDoS attack"

2.) Actually, Grafana instance from first example contains more than 1k dashboards, but there is default 1k API limits. There should be config/env variable to configure this limit.

Thanks.

@amotl
Copy link
Contributor

amotl commented May 7, 2019

Dear Jan,

thanks for using grafana-wtf and also thank very much for your suggestions. I recognize that from the perspective of a Grafana installation which is that large, grafana-wtf in its current form is really just a start.

While it already has an appropriate caching subsystem which saves you from hitting Grafana each and every time to do searches on the JSON artefacts, the time-to-live is currently hardcoded to five minutes or so. This essentially renders it useless for your scenario, so you might not even have recognized that there is such a machinery under the hood at all.

Saying that, I definitively second your suggestion about improving raw performance by parallelizing requests and will take this into account for the next iteration on the code.

Also thanks for pointing out the API limit which caps the maximum number of returned results to 1000 as you say. I assume there will be appropriate paging or offset/limit parameters then which should be unlocked by grafana-wtf?

Cheers,
Andreas.

@jangaraj
Copy link
Author

jangaraj commented May 7, 2019

Yes, there is limit – Limit the number of returned results (max 5000) (doc)

@amotl amotl changed the title Performance Improve performance with large Grafana installations May 7, 2019
@amotl
Copy link
Contributor

amotl commented May 7, 2019

Dear Jan,

we just released grafana-wtf-0.5.0 which might improve the situation for you slightly. When requesting dashboards, we are now using limit=5000 and there's a new --cache-ttl option you might want to play around with.

--cache-ttl accepts the cache expiration time in seconds as well as a special literal like --cache-ttl=infinite to turn off cache expiration at all, essentially caching forever. On the other hand, using --cache-ttl=0 disables caching completely, essentially requesting all resources each time again. The value still defaults to 300 seconds.

Increasing the cache expiration time might give you a more reasonable balance between freshness and waiting time, at least you are now under control.

With kind regards,
Andreas.

@amotl
Copy link
Contributor

amotl commented May 7, 2019

Dear Jan,

we just added the --concurrency option which significantly improves performance. It defaults to "5" concurrent requests. Also, --debug has been improved to be able to watch the list of dashboard names grafana-wtf is downloading.

So, you might want to invoke grafana-wtf like

time grafana-wtf find '#299c46' --concurrency=20 --cache-ttl=inf --debug

to warm up your local cache by running twenty requests in parallel and to keep cache content forever. For subsequent invocations, things should be faster than before¹ while trading in a bit of data freshness.

We will be happy to hear about the outcome.

With kind regards,
Andreas.

@amotl
Copy link
Contributor

amotl commented May 7, 2019

¹ Saying that, the SQLite cache and the search method will probably soon become the bottlenecks when operating on the larger data set grafana-wtf will be able to ingest after improving the http transport layer.

Regarding the cache backend, we have been able to improve cache performance for Luftdatenpumpe by using Redis. Regarding the searching itself, we might try PyPy first and will likely have to move on to Go or Rust if this doesn't help.

Let me know if this works out reasonably for you and whether you see there's a chance we can tune the current Python implementation to cope with the installation scenario regarding high numbers of dashboards like you are operating with.

@jangaraj
Copy link
Author

jangaraj commented May 8, 2019

👍 Cold cache test:

$ time grafana-wtf --cache-ttl=600 --concurrency=50 find this-string-doesnt-exist
...
2019-05-08 04:36:54,312 [grafana_wtf.core      ] INFO   : Found 1302 dashboards
  0%|      | 0/1302 [00:00<?, ?it/s]2019-05-08 04:36:54,321 [grafana_wtf.core      ]
INFO   : Fetching dashboards in parallel with 50 concurrent requests
...
real    0m35.215s
user    0m7.111s
sys     0m9.171s

Warm cache for the same search:

real    0m5.284s
user    0m2.970s
sys     0m2.382s

Good job.

BTW: according to doc user needs to use export GRAFANA_URL=https://daq.example.org/grafana/. But that trailing slash is a problem in my case (I use https://domain.org/). Could you handle that in the code, so user will be able to use URL with/without trailing slash, please?

@amotl
Copy link
Contributor

amotl commented May 8, 2019

Happy to see this kind of speedup on the large installation you are operating there. Thanks for letting us know and enjoy your searches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants