Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix two race conditions where an ASIO wakeup notifications can be lost #2561

Conversation

slfritchie
Copy link
Contributor

@dipinhora's commit (described immediately below) is the minimum required to fix the two race conditions that I'd found in the Pony 0.21.3 runtime that would cause occasional failures of the Wallaroo integration test suite.

Prior to this commit, there were two race conditions where an ASIO
event could be received and not result in a wake up notification
to scheduler 0. One was because scheduler 0 had not actually
suspended yet so the ASIO thread didn't send a notification at all.
The other was because scheduler thread 0 had not released the mutex
so the ASIO thread wasn't able to send a notification because it
wasn't able to acquire the mutex and didn't retry. Both would result
in hangs due to the lost wake ups.

This commit fixes things by ensuring that even after scheduler 0
decides to suspend, it does one final check of the inject queue
to see if any new work has been queued by the ASIO thread. If yes,
it doesn't suspend to prevent lost wake ups and the associated
hangs. It also changes the logic so that the ASIO thread will retry
waking up scheduler thread 0 if it is unable to acquire the mutex
until it is successful.

After Dipin's work above, I had suggested and implemented three changes:

  • make pop_global() more flexible to adapt to 3 callers in steal()
  • refactor two nearly identical pieces of steal() into a new function suspend_scheduler()
  • refactor two other nearly identical pieces of steal() into a new function maybe_suspend_scheduler()

Dipin gave an informal "there might be a performance regression" to my 3 review suggestions, which are also included in this branch, informal approval to number 2, and hasn't yet looked at number 3. Reviewers are welcome to critique any/all of them. Taken all together, they make steal() much easier to read, IMHO.

Regarding performance regression, I've seen on:

  • OS X: no significant degradation in examples/message-ubench with 14 different combinations of pinger actors, messages sent, and Pony runtime threads used. OR ELSE this branch is about 5% faster, which is both nice and also not well understood.
  • Linux: I'm currently running more tests with examples/message-ubench on a 36 core AWS c4.8xlarge instance now.
  • Linux + Wallaroo: I'll try my hand at this tomorrow: @JONBRWN has been overhauling the Wallaroo performance test automation to the point where I might be able to run it myself.

Copy link
Contributor

@dipinhora dipinhora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Thanks for cleaning things up and making the code sane again!

@jemc
Copy link
Member

jemc commented Feb 21, 2018

@slfritchie says he's running some microbenchmarks to test the performance impact of this change before merging.

So, let's put the "don't merge" tag on this for now until his investigation is complete.

@jemc jemc added the do not merge This PR should not be merged at this time label Feb 21, 2018
@slfritchie slfritchie added the changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge label Feb 21, 2018
@slfritchie
Copy link
Contributor Author

slfritchie commented Feb 23, 2018

I'm pretty sure there's a race condition with a single active thread, something I've seen 3x now.

  • Times 1 & 2 were running ./message-ubench.slf-review-fix_dynamic_scaling_race_condition2 --initial-pings=1 --pingers=1 --ponythreads=32 --report-interval=10 --report-count=60 --ponynoblock where: 1st) 60 lines of output, but the process didn't exit, and 2nd) no output after the 28th report. (OS X)
  • The 3rd time was Wallaroo's "sender" process that stopped sending data & stopped consuming CPU time while running giles/sender/sender -h wallaroo-leader-1:7001 -m 10000000000 -s 450 -i 2_500_000 -f ~/wallaroo/testing/data/market_spread/nbbo/350-symbols_nbbo-fixish.msg -r --ponythreads=1 -y -g 46 --ponypinasio -w —ponynoblock (Ubuntu Xenial Linux/16.04 LTS)

@slfritchie slfritchie force-pushed the slf-review-fix_dynamic_scaling_race_condition2 branch from fc036a0 to 243298d Compare February 28, 2018 01:52
@slfritchie slfritchie removed the do not merge This PR should not be merged at this time label Mar 1, 2018
@slfritchie
Copy link
Contributor Author

These are important runtime bugfixes. The performance observations that I've been able to make over the last week are:

  • Wallaroo's "market spread" application is @SeanTAllen's top choice for performance benchmarking of a "real" Pony app. (https://github.com/WallarooLabs/wallaroo/blob/master/testing/performance/apps/market-spread/PERFORMANCE_TESTING_MARKET_SPREAD.md) On AWS c4.8xlarge VM instances, I saw no performance decrease (throughput or latency) on 18, 8, and 4 core/scheduler thread runs at saturation (about 3.4M operations/second), 90% saturation, or 10% saturation.

  • The examples/message-ubench microbenchmark's workload creates a lot of message-passing between actors with little computation by those actors. On AWS c4.8xlarge VM instances, I've seen approximately 1-3% throughput decrease in most scenarios when using 18, 4, and 1 core/scheduler threads values for --ponythreads. This decrease tends to be smaller (or disappear) as the value of --initial-pingers increases beyond 1: larger values for this flag gives actors more messages to work on before draining their mailbox or hitting the runtime's 100 consecutive message scheduling limit. My test runs changed the following flags:

    --initial-pings=1, 10, or 100
    --pingers=1, 10, 25, 250, or 2500
    --ponythreads=1, 4, or 18

The ministat reports below are typical of the throughput variations that I saw. If I witnessed variation larger than 3%, I repeated the runs several more times to see how repeatable the large variation was ... but big variations were not, as far as my patience & willingness to pay for the AWS time permitted.

RUN: message-ubench --initial-pings=1 --pingers=25 --ponythreads=4 --report-interval=10 --report-count=60 --ponynoblock

x message-ubench.base,1,25,4
+ message-ubench.slf-review-fix_dynamic_scaling_race_condition2,1,25,4
+--------------------------------------------------------------------------+
|                                                        +                 |
|                                                        +        x        |
|                                                        +        x        |
|                                                       ++        x        |
|                                                       ++        xx x     |
|                                                       ++       xxxxx     |
|                                                    +  ++       xxxxx     |
|                                                    ++ ++       xxxxx     |
|                                                  + +++++  +    xxxxx x   |
|                                                  + ++++++ +    xxxxx x   |
|                                                  + ++++++ +  xxxxxxx x   |
|                                                 ++ ++++++ + xxxxxxxx x   |
|+   x                                            +++++++++++ xxxxxxxx x   |
|                                              |_______AM|____|___AM______||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  60      14043917      15785394      15666087      15644451     217933.65
+  60      13937605      15503371      15388010      15348762     197644.72
Difference at 95.0% confidence
        -295689 +/- 74445
        -1.89006% +/- 0.475855%
        (Student's t, pooled s = 208037)

RUN: message-ubench --initial-pings=1 --pingers=250 --ponythreads=4 --report-interval=10 --report-count=60 --ponynoblock

x message-ubench.base,1,250,4
+ message-ubench.slf-review-fix_dynamic_scaling_race_condition2,1,250,4
+--------------------------------------------------------------------------+
|           +          +                                     x             |
|          ++          +                                     x             |
|          ++         ++  +       x        +               x xx        x   |
|         +++ +       ++ +++    x xx       +    xx         x xx      x x   |
|       + +++ +      +++++++   xxxxx       * +  xxx       xx xx x  x xxx   |
|+    + + +++++   +++++++++++  xxxxx  x+ ++*+*x xxx     x xx xx xx x xxxx x|
|          |__________A__________|     |_____________A_____M_______|       |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  60      14691252      15027855      14910002      14865926     108330.72
+  60      14457171      14803675      14624660      14621815     89014.645
Difference at 95.0% confidence
        -244111 +/- 35478.3
        -1.64208% +/- 0.238655%
        (Student's t, pooled s = 99144.2)

RUN: message-ubench --initial-pings=1 --pingers=250 --ponythreads=18 --report-interval=10 --report-count=60 --ponynoblock

x message-ubench.base,1,250,18
+ message-ubench.slf-review-fix_dynamic_scaling_race_condition2,1,250,18
+--------------------------------------------------------------------------+
|                    +         ++         x  xxx           +           x   |
|          +     +  ++  ++ +   *+  +x+ xx x  xxx x         +           x   |
|          +     +  ++  +++++ +*+ ++*+ xxxx+ xxx x         +           x   |
|+  +      +   ++++++++ ++*++ +*+x+*** xxx**xxxxxxxx  x    +x          xx x|
|               |__________MA______|____|____MA_________|                  |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  60      19705073      21169976      20288239      20313202     318382.17
+  60      18944185      20726794      19734108      19772982     372183.33
Difference at 95.0% confidence
        -540220 +/- 123932
        -2.65945% +/- 0.610107%
        (Student's t, pooled s = 346329)

@SeanTAllen
Copy link
Member

Those perf numbers look good to me and I wouldn't oppose merging on the basis of them.

@jemc
Copy link
Member

jemc commented Mar 1, 2018

If y'all are confident, feel free to remove the "DO NOT MERGE" label and merge.

@dipinhora
Copy link
Contributor

@slfritchie Thanks for putting in all the effort to get this working and the code cleaned up. I appreciate all your help in helping fix the swiss cheese that has been this dynamic scheduler scaling in relation to scheduler thread 0 suspending.

The performance difference is acceptable to me given that the logic changes are necessary for improving stability of scheduler thread 0 being able to suspend which I believe is overall a useful feature to have.

…lang#2561)

Prior to this commit, there were two race conditions where an ASIO
event could be received and not result in a wake up notification
to scheduler 0. One was because scheduler 0 had not actually
suspended yet so the ASIO thread didn't send a notification at all.
The other was because scheduler thread 0 had not released the mutex
so the ASIO thread wasn't able to send a notification because it
wasn't able to acquire the mutex and didn't retry. Both would result
in hangs due to the lost wake ups.

This commit fixes things by ensuring that even after scheduler 0
decides to suspend, it does one final check of the inject queue
to see if any new work has been queued by the ASIO thread. If yes,
it doesn't suspend to prevent lost wake ups and the associated
hangs. It also changes the logic so that the ASIO thread will retry
waking up scheduler thread 0 if it is unable to acquire the mutex
until it is successful.

Additional refactorings include:

* make pop_global() more flexible to adapt to 3 callers in steal()
* refactor 2 pieces of identical code into new suspend_scheduler() function
* refactor 2 pieces of identical code into new maybe_suspend_scheduler() function

Notable runtime behavior change:

* Change default value of ponyminthreads from 0 -> 1

There remains at least one bug in thread wakeup when only
scheduler 0 is awake.  To circumvent this bug(s), Dipin suggested
changing the default value of ponyminthreads to 1.  The command
line option `--ponyminthreads=0` may be used, but it must be considered
experimental and may result in misbehavior by the scheduler.

Squashed from commit 6bd6e05 backward to dipinhora/fix_dynamix_scheduling_race_condition commit 01ab145
@slfritchie slfritchie force-pushed the slf-review-fix_dynamic_scaling_race_condition2 branch from 6bd6e05 to 5a11aff Compare March 2, 2018 21:51
@slfritchie
Copy link
Contributor Author

@dipinhora Would you mind taking a look at the new squashed & rebased commit message? If it's alright and if Appveyor is happy, then please merge.

@slfritchie
Copy link
Contributor Author

Dipin's last review found a path where DTrace WORK_STEAL_SUCCESSFUL probe wouldn't fire. The small refactoring in commit 785cbbf takes care of that. All CI checks have passed. Merge time, thanks everyone for your comments & recommendations.

@slfritchie slfritchie merged commit 0ac93d7 into ponylang:master Mar 6, 2018
ponylang-main added a commit that referenced this pull request Mar 6, 2018
dipinhora pushed a commit to dipinhora/ponyc that referenced this pull request Jun 5, 2018
ponylang#2561)

Fix race conditions with scheduler 0 suspending and ASIO events (ponylang#2561)

Prior to this commit, there were two race conditions where an ASIO
event could be received and not result in a wake up notification
to scheduler 0. One was because scheduler 0 had not actually
suspended yet so the ASIO thread didn't send a notification at all.
The other was because scheduler thread 0 had not released the mutex
so the ASIO thread wasn't able to send a notification because it
wasn't able to acquire the mutex and didn't retry. Both would result
in hangs due to the lost wake ups.

This commit fixes things by ensuring that even after scheduler 0
decides to suspend, it does one final check of the inject queue
to see if any new work has been queued by the ASIO thread. If yes,
it doesn't suspend to prevent lost wake ups and the associated
hangs. It also changes the logic so that the ASIO thread will retry
waking up scheduler thread 0 if it is unable to acquire the mutex
until it is successful.

Additional refactorings include:

* make pop_global() more flexible to adapt to 3 callers in steal()
* refactor 2 pieces of identical code into new suspend_scheduler() function
* refactor 2 pieces of identical code into new maybe_suspend_scheduler() function

Notable runtime behavior change:

* Change default value of ponyminthreads from 0 -> 1

There remains at least one bug in thread wakeup when only
scheduler 0 is awake.  To circumvent this bug(s), Dipin suggested
changing the default value of ponyminthreads to 1.  The command
line option `--ponyminthreads=0` may be used, but it must be considered
experimental and may result in misbehavior by the scheduler.
dipinhora pushed a commit to dipinhora/ponyc that referenced this pull request Jun 5, 2018
dipinhora added a commit to dipinhora/ponyc that referenced this pull request Feb 23, 2019
Thanks to @slfritchie's hard work in ponylang#2926 and ponylang#2561 to find and
fix the bugs in the dynamic scheduler scaling logic, scheduler
thread 0 now also suspends and resumes correctly without issues
(in my testing).

This commit changes the default for `--ponyminthreads` from `1` to
`0`. The last time @slfritchie ran into issues was in March and
left a comment with the commands he used at:
ponylang#2561 (comment)

I ran the commands from @slfritchie's comment (with minor changes
to ensure `--ponyminthreads=0`) using Pony 0.26.0 for a few hours
without any issues.
dipinhora added a commit to dipinhora/ponyc that referenced this pull request Feb 26, 2019
Thanks to @slfritchie's hard work in ponylang#2926 and ponylang#2561 to find and
fix the bugs in the dynamic scheduler scaling logic, scheduler
thread 0 now also suspends and resumes correctly without issues
(in my testing).

This commit changes the default for `--ponyminthreads` from `1` to
`0`. The last time @slfritchie ran into issues was in March and
left a comment with the commands he used at:
ponylang#2561 (comment)

I ran the commands from @slfritchie's comment (with minor changes
to ensure `--ponyminthreads=0`) using Pony 0.26.0 for a few hours
without any issues.
SeanTAllen pushed a commit that referenced this pull request Feb 27, 2019
Thanks to @slfritchie's hard work in #2926 and #2561 to find and
fix the bugs in the dynamic scheduler scaling logic, scheduler
thread 0 now also suspends and resumes correctly without issues
(in my testing).

This commit changes the default for `--ponyminthreads` from `1` to
`0`. The last time @slfritchie ran into issues was in March and
left a comment with the commands he used at:
#2561 (comment)

I ran the commands from @slfritchie's comment (with minor changes
to ensure `--ponyminthreads=0`) using Pony 0.26.0 for a few hours
without any issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants