-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Akka.Net actors freezing #4376
Comments
Thanks for the really detailed write-up - we'll look into it |
The bug happened when you fail multiple children of a pool node with On a failure, the child will suspend itself before bubbling a The child was suspended here: https://github.com/akkadotnet/akka.net/blob/dev/src/core/Akka/Actor/ActorCell.FaultHandling.cs#L231 |
@Arkatufus so it sounds like the issue is that the state for the superviser, when using |
This is, somehow, a pool router only problem |
@Arkatufus may I ask you to check things again? I was able to reproduce the issue for a non-router actor (the |
@Nk185 have you followed @Arkatufus 's work on this issue on #4393 ? Might want to review the code and changes there. |
@Nk185 I might need to make a more comprehensive test on it, I've only managed to fail the pool router so far. |
As progress goes, I've found that there is a code discrepancy between Akka.NET and scala Akka on this line: The check should be living inside the switch statement, and not outside of it. Fixing that fixes the spurious exception handling by the Actor supervisor, but something else is still blocking the Mailbox from getting resumed. |
This seems to fix the bug, need further testing |
@Arkatufus nice work - I'll review it |
Environment
Symptoms
After several restarts, actors behind a pool router stop working.
How to reproduce
Either use the code from the gist or follow the following:
Side note: you may need to play with a number of thrown exceptions
Gist overview
In the Gist I'm trying in different ways to make the
SimpleActor
to throw an exception and then print a text.Six main scenarios tried:
For each of the scenarios with a router (so all except the last one), I'm forcing the
SimpleActor
to throw an exception. Then I'm getting all routees and "asking" them to print a text with their address to stdout both with directTell
and indirect via a router.Main observations based on Gist code
Tell
s.Tell
s.MailboxStatus
equal to 4 (SuspendUnit
) andSuspendMask
set.ReceiveActor
-based router which means that issue is not in (or is not only in) implementation ofRoutedActorCell
andRouterActor
.Assumptions
Taking into account that the number of exceptions and their frequency has an effect and each time number of frozen routees is different and that I wasn't able to reproduce the issue with non-Escalate supervision strategy, I can presume that there is a race condition during escalations in
Dispatcher
or inMailbox
that prevents removal ofSuspendUnit
from actor's mailbox.Workaround
Option 1
The easiest way to work this out is to specify a needed supervision strategy at router level like
Props.Create<YourAwesomeActor>().WithRouter(new RoundRobinPool(routeesNr, null, new OneForOneStrategy(Decider.From(Directive.Restart)), Dispatchers.DefaultDispatcherId));
as it leads to same default behaviour - restarts your routee (unless you overrode it in configs).Option 2
If you have to have more complicated logic that based on router parent's state, you have to write your own router actor but be aware that this will lead to a performance impact.
The text was updated successfully, but these errors were encountered: