Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark Dashboard code and discuss/document possible performance improvements #1340

Closed
6 tasks
philippeluickx opened this issue Jul 22, 2016 · 12 comments
Closed
6 tasks

Comments

@philippeluickx
Copy link
Contributor

philippeluickx commented Jul 22, 2016

Current Dashboard performance is slow - loading is capped at 50K records and this query take 10s. There can be several places where Dashboard performance could be improved, so care should be taken before proceeding with any single solution.

Objective

Before going with further experiments, it is good to have a baseline measure of performance for comparison.

We should try to measure the overall time, on average, that the dashboard takes to render as well as time spent on each stage. This will give us better insight on strategies for optimization.

Rendering stages

There are a few stages involved in rendering the dashboard charts, including:

  • fetching the data
    • server fetch from Elastic
    • client fetch from Meteor server
  • parsing the data
    • e.g. aggregation or preparation of chart dimensions
  • rendering the charts
    • e.g. creating DOM elements for data points, animations, etc

Each stage in the rendering sequence adds some amount of delay to the overall user experience.

Proposed changes:

  • Aggregate data per day / response code / response time group (adds a count column)
  • Remove data table (can be handled in different ways later: seperate request, download CSV, in a different view,...)
  • Remove hourly - daily - weekly - monthly aggregation options

Next steps:

  1. Proof of concept: only load first chart with aggregation on Elastic Search (365 records for 1 y)
  2. Add response code aggregation (365 * 5 records for 1 y)
  3. Add response time aggregation (365 * 5 * 10 records for 1 y) - we can group response time in 10 sections (0 - 100 - ... - +1000 ms)

Definition of done

  • Benchmark(s) exist(s) for Dasboard code
    • Team understands, after discussion, sources of significant delay in Dashboard loading sequence
  • Document exists describing possible enhancements for Dashboard code
    • Proposals have been objectively compared,
      • e.g. honest assessment of possible delays with each solution
    • Proposals have been compared by estimated amount of implementation effort
    • Document acknowledges any functional losses for each proposal (e.g. dis-integration of dashboard charts), as tradeoffs
@bajiat bajiat added planning and removed backlog labels Aug 1, 2016
@bajiat bajiat added this to the Sprint 28 milestone Aug 1, 2016
@frenchbread
Copy link
Contributor

As suggested in here - NREL/api-umbrella#140 (comment), there is a way to fetch analytics using API-Umbrella's REST API and it's more secure.

Analytics drilldown data should be accessible via this URL: https:://<domain>/api-umbrella/v1/analytics/drilldown. The problem is that currently I'm getting 500 error status code. Reported that issue in NREL/api-umbrella#275

@brylie
Copy link
Contributor

brylie commented Aug 11, 2016

Basic benchmark

As an example, here are some measurements of the data fetching step with 25,000, 50,000, and 60,000 records:

dashboardchartbenchmark-latencyfetchingdashboarddata

Time for data fetching, as might be expected, seems to increase proportional to the number of records requested.

The above chart lacks values for the overall page load, which would give us a better idea of the proportion of page load time spent fetching data. A basic estimate would be that 30-50% of page load time is spent waiting for data, with the remainder in other tasks such as parsing and painting (more benchmarking is needed).

Method

The above chart was generated by reloading the dashboard 10 times and with each of the request values:

  • 25,000 records
  • 50,000 records
  • 60,000 records (the server currently has only 64,000 records)

This test can be reproduced by checking out the feature/dashboard-chart-benchmark branch. The requested records value can be changed by editing dashboard/client/dashboard.js:61

@brylie brylie changed the title Dashboard performance improvements Discuss/document possible Dashboard performance improvements Aug 11, 2016
@brylie brylie changed the title Discuss/document possible Dashboard performance improvements Benchmark/discuss/document possible Dashboard performance improvements Aug 11, 2016
@brylie brylie changed the title Benchmark/discuss/document possible Dashboard performance improvements Benchmark Dashboard code and discuss/document possible performance improvements Aug 11, 2016
@frenchbread
Copy link
Contributor

frenchbread commented Aug 11, 2016

As discussed earlier if using aggredated data which is provided by API Umbrella via REST API this cuts the need in performance improvements since the chart with data loads instantly within any data ranges. NREL/api-umbrella#275. This is partly implemented in https://github.com/frenchbread/apinf-dashboard-example.

Also if fetching data directly from API-Umbrella REST interface, it solves the security issue for elasticsearch being publicly accessible (any data can be fetched from there now with no Auth). So elastic search instance should be hidden from public access and accessible only by api umbrella instance though its own private network on a server.

@philippeluickx
Copy link
Contributor Author

@frenchbread is the ES interface open by default? Would be better to close it up in the default installation...
Also wondering if the REST interface should actually go through apinf? That way we can have all the advantages of security etc.

@frenchbread
Copy link
Contributor

is the ES interface open by default?

@philippeluickx yes, it is.

Also wondering if the REST interface should actually go through apinf?

I'm not sure how elastic search works within API umbrella but closing both & routing them though apinf would work well. (Api umbrella requires API key & token so hiding ES may be enough but not sure if it's possible)

@philippeluickx
Copy link
Contributor Author

We should make sure ES interface is closed by default...

@brylie
Copy link
Contributor

brylie commented Aug 17, 2016

Based on the above benchmark, it takes around five seconds to fetch 50,000 records to the server from Elastic. This means that any server aggregation function would have to wait for Elastic data to load, and that wait time can quickly exceed the five second target 'dashboard render time'.

We can probably pre-fetch the analytics data, as has been tried in a previous experiment, and store it, for example, in our MongoDB.

Another approach might be to ask Elastic to return aggregated data.

I am also aware of @frenchbread's experiment with the API Umbrella Drilldown Analytics endpoint. If we use the API Umbrella drilldown analytics, what chart(s) could our dashboard contain (e.g. could we include response codes or response time aggregation charts - #s 2 and 3 above)?

@brylie
Copy link
Contributor

brylie commented Aug 17, 2016

@philippeluickx and @frenchbread, I opened an issue related to the API Umbrella Elasticsearch server being publicly accessible.

@brylie
Copy link
Contributor

brylie commented Aug 18, 2016

@frenchbread , when passing the interval, do the results contain fields such as 'response time' and 'response code'?

For reference in this discussion, the API Umbrella analytics REST API allows us to pass in several arguments:

  • prefix (frontend prefix?)
  • timeframe
    • start_at
    • end_at
    • interval (used to bin results by day, month, etc)
  • filter query
  • elasticsearch query string
    • which can include aggregation

References

[elasticsearch] aggregations execute quickly and are near real-time, just like search. (source)

One of the exciting aspects of [elasticsearch] aggregations are how easily they are converted into charts and graphs. (source)

Because the [elasticsearch] aggregation operates in the context of the query scope, any filter applied to the query will also apply to the aggregation. (source)

@frenchbread
Copy link
Contributor

Results only contain grouped timestamps and host/ipaddr values where request came from.
Probably we can request data using custom aggregation, but I am not sure (have not tried it). Here is reference for filtering by API Backend (via request path) - NREL/api-umbrella#275

Another approach would be filtering by status_code manually, e.g. aggregating by each status code scope (2**, 3**, 4**, 5**) in separate request(s). Elasticsearch query allows to provide regex-like queries.

@brylie
Copy link
Contributor

brylie commented Aug 19, 2016

The second approach, filtering by status code and then aggregating on the server, might still leave us with slow chart loading times. E.g. there could easily be a scenario where a server has 50,000+ requests that have 2xx status codes, which would take >5 seconds to transfer to Apinf for the aggregation step.

Lets try to rely directly on Elasticsearch to perform the aggregations by structuring the API Umbrella analytics query with multiple Elasticsearch filter/aggregation parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants