Benchmark Dashboard code and discuss/document possible performance improvements #1340

philippeluickx · 2016-07-22T10:33:22Z

Current Dashboard performance is slow - loading is capped at 50K records and this query take 10s. There can be several places where Dashboard performance could be improved, so care should be taken before proceeding with any single solution.

Objective

Before going with further experiments, it is good to have a baseline measure of performance for comparison.

We should try to measure the overall time, on average, that the dashboard takes to render as well as time spent on each stage. This will give us better insight on strategies for optimization.

Rendering stages

There are a few stages involved in rendering the dashboard charts, including:

fetching the data
- server fetch from Elastic
- client fetch from Meteor server
parsing the data
- e.g. aggregation or preparation of chart dimensions
rendering the charts
- e.g. creating DOM elements for data points, animations, etc

Each stage in the rendering sequence adds some amount of delay to the overall user experience.

Proposed changes:

Aggregate data per day / response code / response time group (adds a count column)
Remove data table (can be handled in different ways later: seperate request, download CSV, in a different view,...)
Remove hourly - daily - weekly - monthly aggregation options

Next steps:

Proof of concept: only load first chart with aggregation on Elastic Search (365 records for 1 y)
Add response code aggregation (365 * 5 records for 1 y)
Add response time aggregation (365 * 5 * 10 records for 1 y) - we can group response time in 10 sections (0 - 100 - ... - +1000 ms)

Definition of done

Benchmark(s) exist(s) for Dasboard code
- Team understands, after discussion, sources of significant delay in Dashboard loading sequence
Document exists describing possible enhancements for Dashboard code
- Proposals have been objectively compared,
  - e.g. honest assessment of possible delays with each solution
- Proposals have been compared by estimated amount of implementation effort
- Document acknowledges any functional losses for each proposal (e.g. dis-integration of dashboard charts), as tradeoffs

philippeluickx · 2016-07-22T11:56:59Z

Docs: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations.html
per term: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-terms-aggregation.html
per date range: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-daterange-aggregation.html
per range: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-range-aggregation.html

Tutorial using ES aggregation with d3: https://www.elastic.co/blog/data-visualization-elasticsearch-aggregations

Another tutorial, e.g. stats (if not used already): https://blog.madewithlove.be/post/elasticsearch-aggregations/

frenchbread · 2016-08-02T09:54:20Z

As suggested in here - NREL/api-umbrella#140 (comment), there is a way to fetch analytics using API-Umbrella's REST API and it's more secure.

Analytics drilldown data should be accessible via this URL: https:://<domain>/api-umbrella/v1/analytics/drilldown. The problem is that currently I'm getting 500 error status code. Reported that issue in NREL/api-umbrella#275

brylie · 2016-08-11T07:18:40Z

Basic benchmark

As an example, here are some measurements of the data fetching step with 25,000, 50,000, and 60,000 records:

Time for data fetching, as might be expected, seems to increase proportional to the number of records requested.

The above chart lacks values for the overall page load, which would give us a better idea of the proportion of page load time spent fetching data. A basic estimate would be that 30-50% of page load time is spent waiting for data, with the remainder in other tasks such as parsing and painting (more benchmarking is needed).

Method

The above chart was generated by reloading the dashboard 10 times and with each of the request values:

25,000 records
50,000 records
60,000 records (the server currently has only 64,000 records)

This test can be reproduced by checking out the feature/dashboard-chart-benchmark branch. The requested records value can be changed by editing dashboard/client/dashboard.js:61

frenchbread · 2016-08-11T11:14:39Z

As discussed earlier if using aggredated data which is provided by API Umbrella via REST API this cuts the need in performance improvements since the chart with data loads instantly within any data ranges. NREL/api-umbrella#275. This is partly implemented in https://github.com/frenchbread/apinf-dashboard-example.

Also if fetching data directly from API-Umbrella REST interface, it solves the security issue for elasticsearch being publicly accessible (any data can be fetched from there now with no Auth). So elastic search instance should be hidden from public access and accessible only by api umbrella instance though its own private network on a server.

philippeluickx · 2016-08-11T11:33:30Z

@frenchbread is the ES interface open by default? Would be better to close it up in the default installation...
Also wondering if the REST interface should actually go through apinf? That way we can have all the advantages of security etc.

frenchbread · 2016-08-11T12:02:10Z

is the ES interface open by default?

@philippeluickx yes, it is.

Also wondering if the REST interface should actually go through apinf?

I'm not sure how elastic search works within API umbrella but closing both & routing them though apinf would work well. (Api umbrella requires API key & token so hiding ES may be enough but not sure if it's possible)

philippeluickx · 2016-08-11T14:14:19Z

We should make sure ES interface is closed by default...

brylie · 2016-08-17T08:45:36Z

Based on the above benchmark, it takes around five seconds to fetch 50,000 records to the server from Elastic. This means that any server aggregation function would have to wait for Elastic data to load, and that wait time can quickly exceed the five second target 'dashboard render time'.

We can probably pre-fetch the analytics data, as has been tried in a previous experiment, and store it, for example, in our MongoDB.

Another approach might be to ask Elastic to return aggregated data.

I am also aware of @frenchbread's experiment with the API Umbrella Drilldown Analytics endpoint. If we use the API Umbrella drilldown analytics, what chart(s) could our dashboard contain (e.g. could we include response codes or response time aggregation charts - #s 2 and 3 above)?

brylie · 2016-08-17T08:50:42Z

@philippeluickx and @frenchbread, I opened an issue related to the API Umbrella Elasticsearch server being publicly accessible.

brylie · 2016-08-18T08:28:31Z

@frenchbread , when passing the interval, do the results contain fields such as 'response time' and 'response code'?

For reference in this discussion, the API Umbrella analytics REST API allows us to pass in several arguments:

prefix (frontend prefix?)
timeframe
- start_at
- end_at
- interval (used to bin results by day, month, etc)
filter query
- using jQuery QueryBuilder (JSON) syntax
elasticsearch query string
- which can include aggregation

References

[elasticsearch] aggregations execute quickly and are near real-time, just like search. (source)

One of the exciting aspects of [elasticsearch] aggregations are how easily they are converted into charts and graphs. (source)

Because the [elasticsearch] aggregation operates in the context of the query scope, any filter applied to the query will also apply to the aggregation. (source)

frenchbread · 2016-08-18T14:58:31Z

Results only contain grouped timestamps and host/ipaddr values where request came from.
Probably we can request data using custom aggregation, but I am not sure (have not tried it). Here is reference for filtering by API Backend (via request path) - NREL/api-umbrella#275

Another approach would be filtering by status_code manually, e.g. aggregating by each status code scope (2**, 3**, 4**, 5**) in separate request(s). Elasticsearch query allows to provide regex-like queries.

brylie · 2016-08-19T08:21:50Z

The second approach, filtering by status code and then aggregating on the server, might still leave us with slow chart loading times. E.g. there could easily be a scenario where a server has 50,000+ requests that have 2xx status codes, which would take >5 seconds to transfer to Apinf for the aggregation step.

Lets try to rely directly on Elasticsearch to perform the aggregations by structuring the API Umbrella analytics query with multiple Elasticsearch filter/aggregation parameters.

philippeluickx added enhancement User Experience labels Jul 22, 2016

philippeluickx assigned jykae, frenchbread and 55 Jul 22, 2016

philippeluickx added the backlog label Jul 22, 2016

bajiat added planning and removed backlog labels Aug 1, 2016

bajiat added this to the Sprint 28 milestone Aug 1, 2016

bajiat added the Dashboard 1.0 label Aug 1, 2016

frenchbread added ready in progress and removed planning ready labels Aug 2, 2016

55 unassigned 55 and jykae Aug 3, 2016

bajiat mentioned this issue Aug 9, 2016

Integrate aggregated usage-over-time chart to dashboard #1373

Open

brylie changed the title ~~Dashboard performance improvements~~ Discuss/document possible Dashboard performance improvements Aug 11, 2016

brylie changed the title ~~Discuss/document possible Dashboard performance improvements~~ Benchmark/discuss/document possible Dashboard performance improvements Aug 11, 2016

brylie changed the title ~~Benchmark/discuss/document possible Dashboard performance improvements~~ Benchmark Dashboard code and discuss/document possible performance improvements Aug 11, 2016

bajiat closed this as completed Sep 12, 2016

bajiat removed the in progress label Sep 12, 2016

brylie mentioned this issue Apr 28, 2017

Dashboard stops working properly after 50 000 hits #2456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Dashboard code and discuss/document possible performance improvements #1340

Benchmark Dashboard code and discuss/document possible performance improvements #1340

philippeluickx commented Jul 22, 2016 •

edited by brylie

Loading

philippeluickx commented Jul 22, 2016

frenchbread commented Aug 2, 2016

brylie commented Aug 11, 2016 •

edited

Loading

frenchbread commented Aug 11, 2016 •

edited

Loading

philippeluickx commented Aug 11, 2016

frenchbread commented Aug 11, 2016

philippeluickx commented Aug 11, 2016

brylie commented Aug 17, 2016 •

edited

Loading

brylie commented Aug 17, 2016

brylie commented Aug 18, 2016 •

edited

Loading

frenchbread commented Aug 18, 2016

brylie commented Aug 19, 2016 •

edited

Loading

Benchmark Dashboard code and discuss/document possible performance improvements #1340

Benchmark Dashboard code and discuss/document possible performance improvements #1340

Comments

philippeluickx commented Jul 22, 2016 • edited by brylie Loading

Objective

Rendering stages

Proposed changes:

Definition of done

philippeluickx commented Jul 22, 2016

frenchbread commented Aug 2, 2016

brylie commented Aug 11, 2016 • edited Loading

Basic benchmark

Method

frenchbread commented Aug 11, 2016 • edited Loading

philippeluickx commented Aug 11, 2016

frenchbread commented Aug 11, 2016

philippeluickx commented Aug 11, 2016

brylie commented Aug 17, 2016 • edited Loading

brylie commented Aug 17, 2016

brylie commented Aug 18, 2016 • edited Loading

References

frenchbread commented Aug 18, 2016

brylie commented Aug 19, 2016 • edited Loading

philippeluickx commented Jul 22, 2016 •

edited by brylie

Loading

brylie commented Aug 11, 2016 •

edited

Loading

frenchbread commented Aug 11, 2016 •

edited

Loading

brylie commented Aug 17, 2016 •

edited

Loading

brylie commented Aug 18, 2016 •

edited

Loading

brylie commented Aug 19, 2016 •

edited

Loading