Throughput performance drop #367

zezic · 2021-10-07T09:50:47Z

We are experiencing significant throughput regression after commit ef71107 which was introduced with #358.
Before this commit our tests were able to run at ~18000 RPS and it was possible to detect the bottleneck of our service, but after ef71107 we only getting around ~6000 RPS with the same tests and now Goose is bottleneck.

The text was updated successfully, but these errors were encountered:

jeremyandrews · 2021-10-07T10:56:57Z

Are you using .set_wait_time()? If so, what are some example values you're setting?

I'll do some analysis, perhaps my performance regression test needs some enhancement -- I didn't see a regression in my earlier testing.

zezic · 2021-10-07T11:10:45Z

No, I'm not using the .set_wait_time(). Also, I'm not using theGooseDefault::ThrottleRequests or anything which should limit request rate, I'm trying to achieve the maximum request rate.

jeremyandrews · 2021-10-08T10:42:17Z

I was able to duplicate this ... I need to review how it slipped through my pre-release testing, and will work on a fix to restore performance. Thanks for reporting!

jeremyandrews · 2021-10-09T07:35:05Z

I've not had time to work on a proper solution yet. I have run a few tests though, and confirmed that reverting the set_wait_time() PR restores performance (for each test I restarted all the server-side processes and reloaded the database, then ran the drupal_memcache loadtest 3 times in a row for 10 minutes):

0.13.3

10,747 rps
11,201 rps
11,094 rps

0.14.1-dev

8,055 rps
8,410 rps
8,460 rps

0.14.1-dev reverted set_wait_time() changes

10,265 rps
10,702 rps
10,966 rps

jeremyandrews · 2021-10-12T10:57:15Z

@zezic Can you test these changes? #368

zezic · 2021-10-12T11:31:20Z

@jeremyandrews Sure! I will check it today and will report to you as soon as possible.

zezic · 2021-10-12T15:59:11Z

@jeremyandrews I tested your branch https://github.com/jeremyandrews/goose/tree/revert-test and I can confirm that it fixes the issue! Now I'm getting identical results for v0.13.3 and v0.14.1-dev (revert-test), the difference is in the margin of error.

jeremyandrews · 2021-10-13T07:03:36Z

@zezic Thank you for reporting the problem and testing the fix! I'll get this merged and a release rolled soon.

jeremyandrews · 2021-10-13T09:23:20Z

To confirm: this regression was because there is a very small amount of overhead in converting a Duration to an integer, but this adds up when making tens of thousands of requests per second. The fix was to not perform any Duration -> integer conversions when set_wait_time() is not being used.

zezic · 2021-10-13T10:42:23Z

You mean, Duration::as_millis() is producing that noticeable difference? I doubt that it adds significant overhead, because it's just a simple arithmetic with a few type casts – https://doc.rust-lang.org/src/core/time.rs.html#408

pub const fn as_millis(&self) -> u128 {
        self.secs as u128 * MILLIS_PER_SEC as u128 + (self.nanos / NANOS_PER_MILLI) as u128
}

My guess is that the problem has something to do with the granularity and the nature of the Tokio timers. In documentation for tokio::time::sleep() function i read that:

Sleep operates at millisecond granularity and should not be used for tasks that require high-resolution timers.

Maybe that's not the exact reason for the regression but I have suspicion that its related to some mechanic of the Tokio timers which makes awaiting on a sleep with a 0 ms duration to produce some unwanted latency.

jeremyandrews mentioned this issue Oct 10, 2021

optimize fast path with Duration-based wait time #368

Merged

jeremyandrews added the bug Something isn't working label Oct 12, 2021

jeremyandrews closed this as completed in #368 Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throughput performance drop #367

Throughput performance drop #367

zezic commented Oct 7, 2021 •

edited

Loading

jeremyandrews commented Oct 7, 2021

zezic commented Oct 7, 2021

jeremyandrews commented Oct 8, 2021

jeremyandrews commented Oct 9, 2021 •

edited

Loading

jeremyandrews commented Oct 12, 2021

zezic commented Oct 12, 2021

zezic commented Oct 12, 2021 •

edited

Loading

jeremyandrews commented Oct 13, 2021

jeremyandrews commented Oct 13, 2021

zezic commented Oct 13, 2021 •

edited

Loading

Throughput performance drop #367

Throughput performance drop #367

Comments

zezic commented Oct 7, 2021 • edited Loading

jeremyandrews commented Oct 7, 2021

zezic commented Oct 7, 2021

jeremyandrews commented Oct 8, 2021

jeremyandrews commented Oct 9, 2021 • edited Loading

jeremyandrews commented Oct 12, 2021

zezic commented Oct 12, 2021

zezic commented Oct 12, 2021 • edited Loading

jeremyandrews commented Oct 13, 2021

jeremyandrews commented Oct 13, 2021

zezic commented Oct 13, 2021 • edited Loading

zezic commented Oct 7, 2021 •

edited

Loading

jeremyandrews commented Oct 9, 2021 •

edited

Loading

zezic commented Oct 12, 2021 •

edited

Loading

zezic commented Oct 13, 2021 •

edited

Loading