Fix two race conditions where an ASIO wakeup notifications can be lost #2561

slfritchie · 2018-02-21T05:22:12Z

@dipinhora's commit (described immediately below) is the minimum required to fix the two race conditions that I'd found in the Pony 0.21.3 runtime that would cause occasional failures of the Wallaroo integration test suite.

Prior to this commit, there were two race conditions where an ASIO
event could be received and not result in a wake up notification
to scheduler 0. One was because scheduler 0 had not actually
suspended yet so the ASIO thread didn't send a notification at all.
The other was because scheduler thread 0 had not released the mutex
so the ASIO thread wasn't able to send a notification because it
wasn't able to acquire the mutex and didn't retry. Both would result
in hangs due to the lost wake ups.

This commit fixes things by ensuring that even after scheduler 0
decides to suspend, it does one final check of the inject queue
to see if any new work has been queued by the ASIO thread. If yes,
it doesn't suspend to prevent lost wake ups and the associated
hangs. It also changes the logic so that the ASIO thread will retry
waking up scheduler thread 0 if it is unable to acquire the mutex
until it is successful.

After Dipin's work above, I had suggested and implemented three changes:

make pop_global() more flexible to adapt to 3 callers in steal()
refactor two nearly identical pieces of steal() into a new function suspend_scheduler()
refactor two other nearly identical pieces of steal() into a new function maybe_suspend_scheduler()

Dipin gave an informal "there might be a performance regression" to my 3 review suggestions, which are also included in this branch, informal approval to number 2, and hasn't yet looked at number 3. Reviewers are welcome to critique any/all of them. Taken all together, they make steal() much easier to read, IMHO.

Regarding performance regression, I've seen on:

OS X: no significant degradation in examples/message-ubench with 14 different combinations of pinger actors, messages sent, and Pony runtime threads used. OR ELSE this branch is about 5% faster, which is both nice and also not well understood.
Linux: I'm currently running more tests with examples/message-ubench on a 36 core AWS c4.8xlarge instance now.
Linux + Wallaroo: I'll try my hand at this tomorrow: @JONBRWN has been overhauling the Wallaroo performance test automation to the point where I might be able to run it myself.

dipinhora

Looks great to me. Thanks for cleaning things up and making the code sane again!

jemc · 2018-02-21T20:39:04Z

@slfritchie says he's running some microbenchmarks to test the performance impact of this change before merging.

So, let's put the "don't merge" tag on this for now until his investigation is complete.

slfritchie · 2018-02-23T03:40:54Z

I'm pretty sure there's a race condition with a single active thread, something I've seen 3x now.

Times 1 & 2 were running ./message-ubench.slf-review-fix_dynamic_scaling_race_condition2 --initial-pings=1 --pingers=1 --ponythreads=32 --report-interval=10 --report-count=60 --ponynoblock where: 1st) 60 lines of output, but the process didn't exit, and 2nd) no output after the 28th report. (OS X)
The 3rd time was Wallaroo's "sender" process that stopped sending data & stopped consuming CPU time while running giles/sender/sender -h wallaroo-leader-1:7001 -m 10000000000 -s 450 -i 2_500_000 -f ~/wallaroo/testing/data/market_spread/nbbo/350-symbols_nbbo-fixish.msg -r --ponythreads=1 -y -g 46 --ponypinasio -w —ponynoblock (Ubuntu Xenial Linux/16.04 LTS)

slfritchie · 2018-03-01T17:04:25Z

These are important runtime bugfixes. The performance observations that I've been able to make over the last week are:

Wallaroo's "market spread" application is @SeanTAllen's top choice for performance benchmarking of a "real" Pony app. (https://github.com/WallarooLabs/wallaroo/blob/master/testing/performance/apps/market-spread/PERFORMANCE_TESTING_MARKET_SPREAD.md) On AWS c4.8xlarge VM instances, I saw no performance decrease (throughput or latency) on 18, 8, and 4 core/scheduler thread runs at saturation (about 3.4M operations/second), 90% saturation, or 10% saturation.
The examples/message-ubench microbenchmark's workload creates a lot of message-passing between actors with little computation by those actors. On AWS c4.8xlarge VM instances, I've seen approximately 1-3% throughput decrease in most scenarios when using 18, 4, and 1 core/scheduler threads values for --ponythreads. This decrease tends to be smaller (or disappear) as the value of --initial-pingers increases beyond 1: larger values for this flag gives actors more messages to work on before draining their mailbox or hitting the runtime's 100 consecutive message scheduling limit. My test runs changed the following flags:

--initial-pings=1, 10, or 100
--pingers=1, 10, 25, 250, or 2500
--ponythreads=1, 4, or 18

The ministat reports below are typical of the throughput variations that I saw. If I witnessed variation larger than 3%, I repeated the runs several more times to see how repeatable the large variation was ... but big variations were not, as far as my patience & willingness to pay for the AWS time permitted.

RUN: message-ubench --initial-pings=1 --pingers=25 --ponythreads=4 --report-interval=10 --report-count=60 --ponynoblock

x message-ubench.base,1,25,4
+ message-ubench.slf-review-fix_dynamic_scaling_race_condition2,1,25,4
+--------------------------------------------------------------------------+
|                                                        +                 |
|                                                        +        x        |
|                                                        +        x        |
|                                                       ++        x        |
|                                                       ++        xx x     |
|                                                       ++       xxxxx     |
|                                                    +  ++       xxxxx     |
|                                                    ++ ++       xxxxx     |
|                                                  + +++++  +    xxxxx x   |
|                                                  + ++++++ +    xxxxx x   |
|                                                  + ++++++ +  xxxxxxx x   |
|                                                 ++ ++++++ + xxxxxxxx x   |
|+   x                                            +++++++++++ xxxxxxxx x   |
|                                              |_______AM|____|___AM______||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  60      14043917      15785394      15666087      15644451     217933.65
+  60      13937605      15503371      15388010      15348762     197644.72
Difference at 95.0% confidence
        -295689 +/- 74445
        -1.89006% +/- 0.475855%
        (Student's t, pooled s = 208037)

RUN: message-ubench --initial-pings=1 --pingers=250 --ponythreads=4 --report-interval=10 --report-count=60 --ponynoblock

x message-ubench.base,1,250,4
+ message-ubench.slf-review-fix_dynamic_scaling_race_condition2,1,250,4
+--------------------------------------------------------------------------+
|           +          +                                     x             |
|          ++          +                                     x             |
|          ++         ++  +       x        +               x xx        x   |
|         +++ +       ++ +++    x xx       +    xx         x xx      x x   |
|       + +++ +      +++++++   xxxxx       * +  xxx       xx xx x  x xxx   |
|+    + + +++++   +++++++++++  xxxxx  x+ ++*+*x xxx     x xx xx xx x xxxx x|
|          |__________A__________|     |_____________A_____M_______|       |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  60      14691252      15027855      14910002      14865926     108330.72
+  60      14457171      14803675      14624660      14621815     89014.645
Difference at 95.0% confidence
        -244111 +/- 35478.3
        -1.64208% +/- 0.238655%
        (Student's t, pooled s = 99144.2)

RUN: message-ubench --initial-pings=1 --pingers=250 --ponythreads=18 --report-interval=10 --report-count=60 --ponynoblock

x message-ubench.base,1,250,18
+ message-ubench.slf-review-fix_dynamic_scaling_race_condition2,1,250,18
+--------------------------------------------------------------------------+
|                    +         ++         x  xxx           +           x   |
|          +     +  ++  ++ +   *+  +x+ xx x  xxx x         +           x   |
|          +     +  ++  +++++ +*+ ++*+ xxxx+ xxx x         +           x   |
|+  +      +   ++++++++ ++*++ +*+x+*** xxx**xxxxxxxx  x    +x          xx x|
|               |__________MA______|____|____MA_________|                  |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  60      19705073      21169976      20288239      20313202     318382.17
+  60      18944185      20726794      19734108      19772982     372183.33
Difference at 95.0% confidence
        -540220 +/- 123932
        -2.65945% +/- 0.610107%
        (Student's t, pooled s = 346329)

SeanTAllen · 2018-03-01T17:11:26Z

Those perf numbers look good to me and I wouldn't oppose merging on the basis of them.

jemc · 2018-03-01T17:42:57Z

If y'all are confident, feel free to remove the "DO NOT MERGE" label and merge.

dipinhora · 2018-03-01T17:43:53Z

@slfritchie Thanks for putting in all the effort to get this working and the code cleaned up. I appreciate all your help in helping fix the swiss cheese that has been this dynamic scheduler scaling in relation to scheduler thread 0 suspending.

The performance difference is acceptable to me given that the logic changes are necessary for improving stability of scheduler thread 0 being able to suspend which I believe is overall a useful feature to have.

…lang#2561) Prior to this commit, there were two race conditions where an ASIO event could be received and not result in a wake up notification to scheduler 0. One was because scheduler 0 had not actually suspended yet so the ASIO thread didn't send a notification at all. The other was because scheduler thread 0 had not released the mutex so the ASIO thread wasn't able to send a notification because it wasn't able to acquire the mutex and didn't retry. Both would result in hangs due to the lost wake ups. This commit fixes things by ensuring that even after scheduler 0 decides to suspend, it does one final check of the inject queue to see if any new work has been queued by the ASIO thread. If yes, it doesn't suspend to prevent lost wake ups and the associated hangs. It also changes the logic so that the ASIO thread will retry waking up scheduler thread 0 if it is unable to acquire the mutex until it is successful. Additional refactorings include: * make pop_global() more flexible to adapt to 3 callers in steal() * refactor 2 pieces of identical code into new suspend_scheduler() function * refactor 2 pieces of identical code into new maybe_suspend_scheduler() function Notable runtime behavior change: * Change default value of ponyminthreads from 0 -> 1 There remains at least one bug in thread wakeup when only scheduler 0 is awake. To circumvent this bug(s), Dipin suggested changing the default value of ponyminthreads to 1. The command line option `--ponyminthreads=0` may be used, but it must be considered experimental and may result in misbehavior by the scheduler. Squashed from commit 6bd6e05 backward to dipinhora/fix_dynamix_scheduling_race_condition commit 01ab145

slfritchie · 2018-03-02T22:50:34Z

@dipinhora Would you mind taking a look at the new squashed & rebased commit message? If it's alright and if Appveyor is happy, then please merge.

slfritchie · 2018-03-06T17:00:21Z

Dipin's last review found a path where DTrace WORK_STEAL_SUCCESSFUL probe wouldn't fire. The small refactoring in commit 785cbbf takes care of that. All CI checks have passed. Merge time, thanks everyone for your comments & recommendations.

ponylang#2561) Fix race conditions with scheduler 0 suspending and ASIO events (ponylang#2561) Prior to this commit, there were two race conditions where an ASIO event could be received and not result in a wake up notification to scheduler 0. One was because scheduler 0 had not actually suspended yet so the ASIO thread didn't send a notification at all. The other was because scheduler thread 0 had not released the mutex so the ASIO thread wasn't able to send a notification because it wasn't able to acquire the mutex and didn't retry. Both would result in hangs due to the lost wake ups. This commit fixes things by ensuring that even after scheduler 0 decides to suspend, it does one final check of the inject queue to see if any new work has been queued by the ASIO thread. If yes, it doesn't suspend to prevent lost wake ups and the associated hangs. It also changes the logic so that the ASIO thread will retry waking up scheduler thread 0 if it is unable to acquire the mutex until it is successful. Additional refactorings include: * make pop_global() more flexible to adapt to 3 callers in steal() * refactor 2 pieces of identical code into new suspend_scheduler() function * refactor 2 pieces of identical code into new maybe_suspend_scheduler() function Notable runtime behavior change: * Change default value of ponyminthreads from 0 -> 1 There remains at least one bug in thread wakeup when only scheduler 0 is awake. To circumvent this bug(s), Dipin suggested changing the default value of ponyminthreads to 1. The command line option `--ponyminthreads=0` may be used, but it must be considered experimental and may result in misbehavior by the scheduler.

@slfritchie

Thanks to @slfritchie's hard work in ponylang#2926 and ponylang#2561 to find and fix the bugs in the dynamic scheduler scaling logic, scheduler thread 0 now also suspends and resumes correctly without issues (in my testing). This commit changes the default for `--ponyminthreads` from `1` to `0`. The last time @slfritchie ran into issues was in March and left a comment with the commands he used at: ponylang#2561 (comment) I ran the commands from @slfritchie's comment (with minor changes to ensure `--ponyminthreads=0`) using Pony 0.26.0 for a few hours without any issues.

@slfritchie

Thanks to @slfritchie's hard work in ponylang#2926 and ponylang#2561 to find and fix the bugs in the dynamic scheduler scaling logic, scheduler thread 0 now also suspends and resumes correctly without issues (in my testing). This commit changes the default for `--ponyminthreads` from `1` to `0`. The last time @slfritchie ran into issues was in March and left a comment with the commands he used at: ponylang#2561 (comment) I ran the commands from @slfritchie's comment (with minor changes to ensure `--ponyminthreads=0`) using Pony 0.26.0 for a few hours without any issues.

@slfritchie

Thanks to @slfritchie's hard work in #2926 and #2561 to find and fix the bugs in the dynamic scheduler scaling logic, scheduler thread 0 now also suspends and resumes correctly without issues (in my testing). This commit changes the default for `--ponyminthreads` from `1` to `0`. The last time @slfritchie ran into issues was in March and left a comment with the commands he used at: #2561 (comment) I ran the commands from @slfritchie's comment (with minor changes to ensure `--ponyminthreads=0`) using Pony 0.26.0 for a few hours without any issues.

dipinhora approved these changes Feb 21, 2018

View reviewed changes

jemc added the do not merge This PR should not be merged at this time label Feb 21, 2018

slfritchie added the changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge label Feb 21, 2018

slfritchie force-pushed the slf-review-fix_dynamic_scaling_race_condition2 branch from fc036a0 to 243298d Compare February 28, 2018 01:52

slfritchie removed the do not merge This PR should not be merged at this time label Mar 1, 2018

slfritchie force-pushed the slf-review-fix_dynamic_scaling_race_condition2 branch from 6bd6e05 to 5a11aff Compare March 2, 2018 21:51

Review: fix up DTrace WORK_STEAL_SUCCESSFUL probe

785cbbf

slfritchie merged commit 0ac93d7 into ponylang:master Mar 6, 2018

ponylang-main added a commit that referenced this pull request Mar 6, 2018

Update CHANGELOG for PR #2561 [skip ci]

608efeb

winksaville mentioned this pull request Mar 7, 2018

Assert in pony_asio_event_unsubscribe fired #2580

Closed

clearyf mentioned this pull request Mar 7, 2018

Multiple concurrent Process Monitor processes sometimes fails to finish #2529

Closed

dipinhora pushed a commit to dipinhora/ponyc that referenced this pull request Jun 5, 2018

Update CHANGELOG for PR ponylang#2561 [skip ci]

a6357de

dipinhora mentioned this pull request Feb 23, 2019

Change --ponyminthreads default to 0 #3020

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix two race conditions where an ASIO wakeup notifications can be lost #2561

Fix two race conditions where an ASIO wakeup notifications can be lost #2561

slfritchie commented Feb 21, 2018

dipinhora left a comment

jemc commented Feb 21, 2018

slfritchie commented Feb 23, 2018 •

edited

Loading

slfritchie commented Mar 1, 2018

SeanTAllen commented Mar 1, 2018

jemc commented Mar 1, 2018

dipinhora commented Mar 1, 2018

slfritchie commented Mar 2, 2018

slfritchie commented Mar 6, 2018

Fix two race conditions where an ASIO wakeup notifications can be lost #2561

Fix two race conditions where an ASIO wakeup notifications can be lost #2561

Conversation

slfritchie commented Feb 21, 2018

dipinhora left a comment

Choose a reason for hiding this comment

jemc commented Feb 21, 2018

slfritchie commented Feb 23, 2018 • edited Loading

slfritchie commented Mar 1, 2018

SeanTAllen commented Mar 1, 2018

jemc commented Mar 1, 2018

dipinhora commented Mar 1, 2018

slfritchie commented Mar 2, 2018

slfritchie commented Mar 6, 2018

slfritchie commented Feb 23, 2018 •

edited

Loading