Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: mpsc performance optimization about cache line #5830

Closed
wathenjiang opened this issue Jun 28, 2023 · 3 comments
Closed

sync: mpsc performance optimization about cache line #5830

wathenjiang opened this issue Jun 28, 2023 · 3 comments
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-sync Module: tokio/sync

Comments

@wathenjiang
Copy link
Contributor

wathenjiang commented Jun 28, 2023

Motivation

Cache line optimization on mpsc.

Solution

Cacheline optimization on mpsc by using CachePadded.

Benchmark

The origin benchmark result:

    $ cargo bench --bench sync_mpsc
    running 10 tests
    test contention_bounded      ... bench:   1,008,359 ns/iter (+/- 412,814)
    test contention_bounded_full ... bench:   1,427,243 ns/iter (+/- 500,287)
    test contention_unbounded    ... bench:     845,013 ns/iter (+/- 394,673)
    test create_100_000_medium   ... bench:         182 ns/iter (+/- 1)
    test create_100_medium       ... bench:         182 ns/iter (+/- 1)
    test create_1_medium         ... bench:         181 ns/iter (+/- 2)
    test send_large              ... bench:      16,525 ns/iter (+/- 329)
    test send_medium             ... bench:         628 ns/iter (+/- 5)
    test uncontented_bounded     ... bench:     478,514 ns/iter (+/- 1,923)
    test uncontented_unbounded   ... bench:     303,990 ns/iter (+/- 1,607)
test result: ok. 0 passed; 0 failed; 0 ignored; 10 measured

The current benchmark result:

 $ cargo bench --bench sync_mpsc
    running 10 tests
    test contention_bounded      ... bench:     606,516 ns/iter (+/- 402,326)
    test contention_bounded_full ... bench:     727,239 ns/iter (+/- 340,756)
    test contention_unbounded    ... bench:     760,523 ns/iter (+/- 482,628)
    test create_100_000_medium   ... bench:         315 ns/iter (+/- 5)
    test create_100_medium       ... bench:         317 ns/iter (+/- 6)
    test create_1_medium         ... bench:         315 ns/iter (+/- 5)
    test send_large              ... bench:      16,166 ns/iter (+/- 516)
    test send_medium             ... bench:         695 ns/iter (+/- 6)
    test uncontented_bounded     ... bench:     456,975 ns/iter (+/- 18,969)
    test uncontented_unbounded   ... bench:     306,282 ns/iter (+/- 3,058)
    
    test result: ok. 0 passed; 0 failed; 0 ignored; 10 measured

It also can been seen in #5829

@wathenjiang wathenjiang added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Jun 28, 2023
@Darksonn Darksonn added the M-sync Module: tokio/sync label Jun 28, 2023
@wathenjiang
Copy link
Contributor Author

Here is the performance improvement summary for each test:

  • contention_bounded: Improved from 1,008,359 ns/iter to 606,516 ns/iter, a decrease of 401,843 ns/iter (39.8% improvement)
  • contention_bounded_full: Improved from 1,427,243 ns/iter to 727,239 ns/iter, a decrease of 700,004 ns/iter (49.0% improvement)
  • contention_unbounded: Improved from 845,013 ns/iter to 760,523 ns/iter, a decrease of 84,490 ns/iter (10.0% improvement)
  • create_100_000_medium: Decreased performance from 182 ns/iter to 315 ns/iter, an increase of 133 ns/iter (73.1% degradation)
  • create_100_medium: Decreased performance from 182 ns/iter to 317 ns/iter, an increase of 135 ns/iter (74.2% degradation)
  • create_1_medium: Decreased performance from 181 ns/iter to 315 ns/iter, an increase of 134 ns/iter (74.0% degradation)
  • send_large: Improved from 16,525 ns/iter to 16,166 ns/iter, a decrease of 359 ns/iter (2.2% improvement)
  • send_medium: Decreased performance from 628 ns/iter to 695 ns/iter, an increase of 67 ns/iter (10.7% degradation)
  • uncontented_bounded: Improved from 478,514 ns/iter to 456,975 ns/iter, a decrease of 21,539 ns/iter (4.5% improvement)
  • uncontented_unbounded: Slightly worsened performance from 303,990 ns/iter to 306,282 ns/iter, an increase of 2,292 ns/iter (0.8% degradation)

Tests with prefix create: test the performance of creating channel. The reason of performance degradation is that it might take more memory after alignment. But the performance of creating channel should not be the major concern.

Tests with prefix send: test the performance of creating channel, and then sending message only once. Creating is slower, sending is faster in new version, so the results of two performance tests are almost the same.

Tests with prefix contention: test the performance of sending message by channel. The performance is highly improved.

@wathenjiang
Copy link
Contributor Author

@wathenjiang wathenjiang changed the title mpsc performance optimization about cache line sync: mpsc performance optimization about cache line Jun 29, 2023
@Darksonn
Copy link
Contributor

Darksonn commented Aug 8, 2023

Closing as resolved by #5829.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-sync Module: tokio/sync
Projects
None yet
Development

No branches or pull requests

2 participants