Load test plan #252

smit1678 · 2017-10-05T05:50:56Z

Since we're consolidating APIs and moving to AWS, we need to assemble a short plan to test the load of both users hitting the /meta endpoint as well as the workers for processing imagery.

We are currently running production on a t2.xlarge. Is this sufficient? Also related, how many max workers do we want running for processing imagery?

The text was updated successfully, but these errors were encountered:

tombh · 2017-10-05T06:09:50Z

The new deployment of OAM will move from using seperate servers for the Catalog API and Uploader API to using a single server in a combined API. This means that the Catalog API is now responsible for both serving database-accessing API requests and CPU-intensive imagery processing.

Note that Seth's recent work on Marblecutter and Monq worker integration should once again allow the seperation of the API load and processing load.

In order to test the new setup I uploaded a queue of ~200MB raw TIFFs. Note that current settings mean that only one image is processed at a time. I then ran the following load test:

ab -c 100 -n 5000 'http://api-staging.openaerialmap.org/meta?limit=99999'

These are requests for all the currently available imagery that create the highlighted grid squares on the frontend map. Note that this in itself is a significantly unoptimised DB query (approx 400k per request) and should be thought about carefully as more imagery is included in OAM. Also note that this request is only requested by a user when they visit the home page.

From OAM's Google Analytics I can see that there are usually about 2000 visitors per month, therefore concurrent users are rarely if ever going to be above 5. However I will assume a maximum plausible concurrent user surge of 100 after successful marketing. This is reflected in the -c 100 of the benchmark test. The results can be seen below in [1]. Note that mean response times are in the 5000ms range, this is not ideal, but perfectly acceptable if the server is not dropping requests and retains ample RAM, which it does. I also tested a concurrency of 300, see [2] below, which generates mean responses of 15000ms, this is not acceptable, but again there are no dropped responses and RAM remains sufficient.

My first suggestion for scaling is to separate the API service from the imagery processing service, as I mentioned at the beginning. However, if you would either like to support more concurrent imagery uploads or concurrent users above 100, then I would simply recommend adding more cores, the current 8GB is more than enough.

[1]
Document Path:          /meta?limit=99999
Document Length:        412598 bytes

Concurrency Level:      100
Time taken for tests:   246.659 seconds
Complete requests:      5000
Failed requests:        1408
   (Connect: 0, Receive: 0, Length: 1408, Exceptions: 0)
Write errors:           0
Total transferred:      2067514098 bytes
HTML transferred:       2066288853 bytes
Requests per second:    20.27 [#/sec] (mean)
Time per request:       4933.183 [ms] (mean)
Time per request:       49.332 [ms] (mean, across all concurrent requests)
Transfer rate:          8185.61 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   11  97.2      1    1002
Processing:  1173 4890 819.7   4872    7761
Waiting:     1166 4812 811.3   4804    7633
Total:       1178 4901 825.4   4891    7796

Percentage of the requests served within a certain time (ms)
  50%   4891
  66%   5178
  75%   5343
  80%   5520
  90%   5952
  95%   6303
  98%   6775
  99%   7065
 100%   7796 (longest request)

[2]
Document Path:          /meta?limit=99999
Document Length:        407751 bytes

Concurrency Level:      300
Time taken for tests:   494.546 seconds
Complete requests:      10000
Failed requests:        4093
   (Connect: 0, Receive: 0, Length: 4093, Exceptions: 0)
Write errors:           0
Total transferred:      4091156521 bytes
HTML transferred:       4088705296 bytes
Requests per second:    20.22 [#/sec] (mean)
Time per request:       14836.383 [ms] (mean)
Time per request:       49.455 [ms] (mean, across all concurrent requests)
Transfer rate:          8078.66 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   16 116.7      1    1003
Processing:  1609 14639 6885.6  14981   57021
Waiting:     1359 14553 6886.6  14902   56997
Total:       1609 14655 6884.9  14998   57022

Percentage of the requests served within a certain time (ms)
  50%  14998
  66%  16406
  75%  17088
  80%  17566
  90%  21113
  95%  28954
  98%  31162
  99%  40153
 100%  57022 (longest request)

smit1678 assigned tombh Oct 5, 2017

cgiovando added the v2 Features and ideas to be considered for v2 implementation label Apr 13, 2021

cgiovando unassigned tombh Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load test plan #252

Load test plan #252

smit1678 commented Oct 5, 2017

tombh commented Oct 5, 2017 •

edited

Loading

Load test plan #252

Load test plan #252

Comments

smit1678 commented Oct 5, 2017

tombh commented Oct 5, 2017 • edited Loading

tombh commented Oct 5, 2017 •

edited

Loading