io_tester: implement latency correction for jobs with RPS #2260

pwrobelse · 2024-05-22T10:55:25Z

Currently, for jobs with RPS (request-per-second) rate specified,
io_tester measures only service time. It means, that if servicing
one of requests takes much longer than available delta time between
consecutive requests (issuing the next request is delayed), then
the measured latency does not depict that fact. In such case we
issue less requests than required and high latency is reported just
for the first request.

For instance, if there is a latency spike for one of requests, that
exceeds the available time of service according to RPS schedule, then
the total number of scheduled requests does not match the expected
count calculated as 'TOTAL = duration_seconds * RPS'.

Furthermore, the percentiles with latency printed at the end of the
simulation may show inaccurate data. Firstly, the count of samples
is lower than expected. Secondly, if the amount of time needed to
handle requests after the latency spike returned to the ordinary
value, then our statistics show that handling of only one request
was long, but it is not true - io_tester stopped sending requests
at given RPS and this way other requests could not be properly measured.
This indicates that io_tester suffers from coordinated omission problem.

This change implements latency correction flag. When it is enabled,
then io_tester measures the total request latency including the delay
between the expected schedule time and the actual schedule time.
Moreover, if any of requests take more time than available, then
io_tester tries to schedule 'delayed' requests as soon as possible
to return to the defined schedule.

Currently, for jobs with RPS (request-per-second) rate specified, io_tester measures only service time. It means, that if servicing one of requests takes much longer than available delta time between consecutive requests (issuing the next request is delayed), then the measured latency does not depict that fact. In such case we issue less requests than required and high latency is reported just for the first request. For instance, if there is a latency spike for one of requests, that exceeds the available time of service according to RPS schedule, then the total number of scheduled requests does not match the expected count calculated as 'TOTAL = duration_seconds * RPS'. Furthermore, the percentiles with latency printed at the end of the simulation may show inaccurate data. Firstly, the count of samples is lower than expected. Secondly, if the amount of time needed to handle requests after the latency spike returned to the ordinary value, then our statistics show that handling of only one request was long, but it is not true - io_tester stopped sending requests at given RPS and this way other requests could not be properly measured. This indicates that io_tester suffers from coordinated omission problem. This change implements latency correction flag. When it is enabled, then io_tester measures the total request latency including the delay between the expected schedule time and the actual schedule time. Moreover, if any of requests take more time than available, then io_tester tries to schedule 'delayed' requests as soon as possible to return to the defined schedule. Signed-off-by: Patryk Wrobel <[email protected]>

pwrobelse · 2024-05-22T12:47:03Z

From ScyllaDB blog - "Coordinated omission" [1]:

Measuring latency

A straightforward way to measure an operation’s latency is the following:
auto start = now();
make_request(i);
auto end = now();

auto latency = end - start;
But this is not exactly what we want.

The system we are trying to simulate sends a request every [1 / throughput] unit of time regardless of how long it takes to process them. We simulate it by the means of N workers (threads) that send a request every [1 / worker_throughput] where worker_throughput ≤ throughput, and N * worker_throughput = throughput.

Thus, a worker has a [1/N] schedule that reflects a part of the schedule of the simulated system. The question we need to answer is how to map the latency measurements from the 1/N worker’s schedule to the full schedule of the simulated open-model system. It’s a straightforward process until requests start to take longer than expected.

Which brings us to the second part of the Coordinated Omission problem: what to do with latency outliers.

Basically, we need to account for workers that send their requests late. So far as the simulated system is concerned, the request was fired in time, making latency look longer than it really was. We need to adjust the latency measurement to account for the delay in the request.
Request latency = (now() – intended_time) + service_time
where
service_time = end – start
intended_time(request) = start + (request – 1) * time_per_request
This is called latency correction.

Latency consists of time spent waiting in the queue plus service time. A client’s service time is the time between sending a request and receiving the response, as in the example above. The waiting time for the client is the time between the point in time a request was supposed to be sent and the time it was actually sent.

pwrobelse · 2024-05-22T13:07:22Z

Also a short example of the behavior without the patch. Given the following configuration file:

- name: latency_reads
  shards: all
  type: randread
  data_size: 20GB
  shard_info:
    parallelism: 10
    reqsize: 131072
    shares: 100
    rps: 100

And executing io_tester with --duration=30 --smp=4 arguments. When the execution is preempted by Ctrl+Z , then only the average and max values are significantly changed:

- shard: 1
  latency_reads:
    throughput: 102539.758  # kB/s
    IOPS: 801.091858
    latencies:  # usec
      average: 3283
      p0.5: 1136
      p0.95: 1881
      p0.99: 2152
      p0.999: 218387
      max: 5836826
stats:
      total_requests: 24035

In spite of expecting 30k issued requests only 24035 were issued. We measured only service time and the missed requests were not counted - thus, the final statistics may not depict the actual state.

When doing the same thing with the code from the patch, then the delay between expected submission time of the request and actual submission time is added to the final latency. Also, the missed requests are scheduled as soon as possible to get on track with the schedule.

- shard: 1
  latency_reads:
    throughput: 127961.07  # kB/s
    IOPS: 999.695862
    latencies:  # usec
      average: 937521
      p0.5: 240024
      p0.95: 3406039
      p0.99: 5197918
      p0.999: 5665090
      max: 5715579
stats:
      total_requests: 30000

pwrobelse · 2024-05-22T13:11:01Z

Hi @avikivity, @xemul,

I tried to implement latency correction in io_tester - the fix is related to coordinated omission problem. Currently, io_tester measures only service_time when RPS is specified for a job.

xemul · 2024-05-23T14:00:00Z

I don't think io-tester should be overly smart. It's enough if we have a watchdog that detects if each request finishes in timely manner and screams into logs otherwise. For that it can be as simple as

start a timer before sending a request
cancel the timer after it's completed
when timer fires it prints a warning into logs or counts the amount of requests that delayed for too much

xemul · 2024-05-23T14:17:15Z

Or even simpler -- if the request latency exceeds the "expected" one -- print a warning

pwrobelse force-pushed the io_tester-implement-latency-correction-for-rps-mode branch from fb659bc to 7f0febd Compare May 22, 2024 12:37

pwrobelse marked this pull request as ready for review May 22, 2024 13:08

pwrobelse requested review from xemul and avikivity May 22, 2024 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

io_tester: implement latency correction for jobs with RPS #2260

io_tester: implement latency correction for jobs with RPS #2260

pwrobelse commented May 22, 2024

pwrobelse commented May 22, 2024

pwrobelse commented May 22, 2024

pwrobelse commented May 22, 2024

xemul commented May 23, 2024

xemul commented May 23, 2024

io_tester: implement latency correction for jobs with RPS #2260

Are you sure you want to change the base?

io_tester: implement latency correction for jobs with RPS #2260

Conversation

pwrobelse commented May 22, 2024

pwrobelse commented May 22, 2024

pwrobelse commented May 22, 2024

pwrobelse commented May 22, 2024

xemul commented May 23, 2024

xemul commented May 23, 2024