Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io_tester: implement latency correction for jobs with RPS #2260

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pwrobelse
Copy link
Contributor

Currently, for jobs with RPS (request-per-second) rate specified,
io_tester measures only service time. It means, that if servicing
one of requests takes much longer than available delta time between
consecutive requests (issuing the next request is delayed), then
the measured latency does not depict that fact. In such case we
issue less requests than required and high latency is reported just
for the first request.

For instance, if there is a latency spike for one of requests, that
exceeds the available time of service according to RPS schedule, then
the total number of scheduled requests does not match the expected
count calculated as 'TOTAL = duration_seconds * RPS'.

Furthermore, the percentiles with latency printed at the end of the
simulation may show inaccurate data. Firstly, the count of samples
is lower than expected. Secondly, if the amount of time needed to
handle requests after the latency spike returned to the ordinary
value, then our statistics show that handling of only one request
was long, but it is not true - io_tester stopped sending requests
at given RPS and this way other requests could not be properly measured.
This indicates that io_tester suffers from coordinated omission problem.

This change implements latency correction flag. When it is enabled,
then io_tester measures the total request latency including the delay
between the expected schedule time and the actual schedule time.
Moreover, if any of requests take more time than available, then
io_tester tries to schedule 'delayed' requests as soon as possible
to return to the defined schedule.

Currently, for jobs with RPS (request-per-second) rate specified,
io_tester measures only service time. It means, that if servicing
one of requests takes much longer than available delta time between
consecutive requests (issuing the next request is delayed), then
the measured latency does not depict that fact. In such case we
issue less requests than required and high latency is reported just
for the first request.

For instance, if there is a latency spike for one of requests, that
exceeds the available time of service according to RPS schedule, then
the total number of scheduled requests does not match the expected
count calculated as 'TOTAL = duration_seconds * RPS'.

Furthermore, the percentiles with latency printed at the end of the
simulation may show inaccurate data. Firstly, the count of samples
is lower than expected. Secondly, if the amount of time needed to
handle requests after the latency spike returned to the ordinary
value, then our statistics show that handling of only one request
was long, but it is not true - io_tester stopped sending requests
at given RPS and this way other requests could not be properly measured.
This indicates that io_tester suffers from coordinated omission problem.

This change implements latency correction flag. When it is enabled,
then io_tester measures the total request latency including the delay
between the expected schedule time and the actual schedule time.
Moreover, if any of requests take more time than available, then
io_tester tries to schedule 'delayed' requests as soon as possible
to return to the defined schedule.

Signed-off-by: Patryk Wrobel <[email protected]>
@pwrobelse pwrobelse force-pushed the io_tester-implement-latency-correction-for-rps-mode branch from fb659bc to 7f0febd Compare May 22, 2024 12:37
@pwrobelse
Copy link
Contributor Author

From ScyllaDB blog - "Coordinated omission" [1]:

Measuring latency

A straightforward way to measure an operation’s latency is the following:

auto start = now();
make_request(i);
auto end = now();

auto latency = end - start;

But this is not exactly what we want.

The system we are trying to simulate sends a request every [1 / throughput] unit of time regardless of how long it takes to process them. We simulate it by the means of N workers (threads) that send a request every [1 / worker_throughput] where worker_throughput ≤ throughput, and N * worker_throughput = throughput.

Thus, a worker has a [1/N] schedule that reflects a part of the schedule of the simulated system. The question we need to answer is how to map the latency measurements from the 1/N worker’s schedule to the full schedule of the simulated open-model system. It’s a straightforward process until requests start to take longer than expected.

Which brings us to the second part of the Coordinated Omission problem: what to do with latency outliers.

Basically, we need to account for workers that send their requests late. So far as the simulated system is concerned, the request was fired in time, making latency look longer than it really was. We need to adjust the latency measurement to account for the delay in the request.

Request latency = (now() – intended_time) + service_time

where

service_time = end – start
intended_time(request) = start + (request – 1) * time_per_request

This is called latency correction.

Latency consists of time spent waiting in the queue plus service time. A client’s service time is the time between sending a request and receiving the response, as in the example above. The waiting time for the client is the time between the point in time a request was supposed to be sent and the time it was actually sent.

@pwrobelse
Copy link
Contributor Author

Also a short example of the behavior without the patch. Given the following configuration file:

- name: latency_reads
  shards: all
  type: randread
  data_size: 20GB
  shard_info:
    parallelism: 10
    reqsize: 131072
    shares: 100
    rps: 100

And executing io_tester with --duration=30 --smp=4 arguments. When the execution is preempted by Ctrl+Z , then only the average and max values are significantly changed:

- shard: 1
  latency_reads:
    throughput: 102539.758  # kB/s
    IOPS: 801.091858
    latencies:  # usec
      average: 3283
      p0.5: 1136
      p0.95: 1881
      p0.99: 2152
      p0.999: 218387
      max: 5836826
stats:
      total_requests: 24035

In spite of expecting 30k issued requests only 24035 were issued. We measured only service time and the missed requests were not counted - thus, the final statistics may not depict the actual state.

When doing the same thing with the code from the patch, then the delay between expected submission time of the request and actual submission time is added to the final latency. Also, the missed requests are scheduled as soon as possible to get on track with the schedule.

- shard: 1
  latency_reads:
    throughput: 127961.07  # kB/s
    IOPS: 999.695862
    latencies:  # usec
      average: 937521
      p0.5: 240024
      p0.95: 3406039
      p0.99: 5197918
      p0.999: 5665090
      max: 5715579
stats:
      total_requests: 30000

@pwrobelse pwrobelse marked this pull request as ready for review May 22, 2024 13:08
@pwrobelse
Copy link
Contributor Author

Hi @avikivity, @xemul,

I tried to implement latency correction in io_tester - the fix is related to coordinated omission problem. Currently, io_tester measures only service_time when RPS is specified for a job.

@pwrobelse pwrobelse requested review from xemul and avikivity May 22, 2024 13:11
@xemul
Copy link
Contributor

xemul commented May 23, 2024

I don't think io-tester should be overly smart. It's enough if we have a watchdog that detects if each request finishes in timely manner and screams into logs otherwise. For that it can be as simple as

  • start a timer before sending a request
  • cancel the timer after it's completed
  • when timer fires it prints a warning into logs or counts the amount of requests that delayed for too much

@xemul
Copy link
Contributor

xemul commented May 23, 2024

Or even simpler -- if the request latency exceeds the "expected" one -- print a warning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants