add comparison with Orleans #8

rkuhn · 2015-04-30T18:40:46Z

No description provided.

gabikliot · 2015-04-30T19:55:54Z

There is a lot of inaccuracies in the description of Orleans (not sure why people pick to assume the worse when they lack information, it probably in our human nature), but the main one is about messaging guarantees. Orleans guarantees at most once and not at least once, by default:
http://dotnet.github.io/orleans/Runtime-Implementation-Details/Messaging-Delivery-Guarantees.html

gabikliot · 2015-04-30T20:02:36Z

Another inaccuracy:
For example: "Grains can only work in either the fully blocked or fully reentrant modes, limiting the user’s choices to a safe one and a fast one".
Of course this is completely not true. One can pick to queue a continuation (await) on a different scheduler (thread pool for example). This is achieved by the fact that we fully integrate with .NET TPL library and this is natively supported by TPL: Task.StartNew, Task.Run, Task.ContinueWith all provide overloads to specify a different scheduler or synchronization context.

gabikliot · 2015-04-30T20:05:52Z

If you are truly interested I can write a detailed bullet by bullet response to the inaccuracies in this summary.

rkuhn · 2015-04-30T20:17:31Z

Of course I'm interested! But please consider that I did not make anything up, all information has been extracted from the source I cited above. If the TR is not the authoritative source of information then please send me the real one so I can redo the comparison myself—although it is a shame that I should have wasted many hours on this by trusting that a paper from the team at Microsoft Research is correct in describing their own product.

rkuhn · 2015-04-30T20:23:23Z

Sorry about the tone of my previous comment, time is my most previous resource and the idea of wasting it got to me. I am truly interested in understanding Orleans and thereby extending my understanding of Akka, which hopefully explains my passionate reply.

gabikliot · 2015-04-30T20:23:52Z

There was one mistake in the paper, about Messaging Delivery Guarantees. All the rest is correct.
The other inaccuracies I think stem from making assumptions about things that were not described or described partly. Naturally, one cannot describe all details in a short paper.
The paper, as its name says, is about the Virtual Actor abstraction and its benefits. Its not a full 100% detailed explanation of all we did in project Orleans in 6 years.

Much more details are in our documentation web site: http://dotnet.github.io/orleans/.
I will provide more detailed comments.

EDIT: Also, the paper was written quiet a while ago, the system of course kept evolving and we added more capabilities and fixed/changed/improved certain things since the paper. The paper is not to be blamed for that, right?

EDIT One thing that I can fully agree with: Akka has a much nicer and detailed online documentation that Orleans. Orleans is still young on Github (we only got to GH in January), so we did not yet fully catch up with documentation. We will.

rkuhn · 2015-04-30T20:27:40Z

Yes, please comment on the individual lines so that discussion happens within the right context. Thanks.

gabikliot · 2015-04-30T20:51:09Z

ComparisonWithOrleans.md

+
+  * Akka as a toolkit for building distributed systems, offering the full power but also exposing the inherent essential complexity.
+
+  * Orleans restricts applicability in order to allow seamless use without understanding distributed computing.


Orleans allows non distributed computing experts to write distributed services, but it does not mean that an expert can not find it useful or necessarily limited in applicability or with less performance. An expert can configure/extend it for a wide range of applications. I would phrase it: "Orleans simplifies distributed computing allowing non experts to write distributed services. At the same time there is enough extension points and flexibility in Orleans to allow even an expert to customize his services in a flexible way". Arguably, Akka has more extensibility than Orleans at this point (I don't know enough Akka to say if this is the case for sure, but if someone who knows both systems says that, I would easily believe her). One also does not want to provide too many extensibility points, otherwise it becomes too clunky, too complicated to understand and use.
But it would not be true to say that Orleans does not provide a lot of extensibilities as well or is only good for "dumb non-experts".

The intent of the point-by-point comparison is to highlight the differences between both approaches, not to sell one or the other. Therefore since both Orleans and Akka provide extensibility, the interesting question in this particular section is what each solution targets, and reading the TR I came to the conclusion that—bluntly speaking—Akka requires developers to understand distributed computing while Orleans aims at avoiding the necessity for that. This is supported by quotes such as

To build a correct solution to such problems in the application, the developer must be a distributed systems expert. To avoid these complexities, we built the Orleans programming model and runtime, which raises the level of the actor abstraction.

and

This level of indirection provides the runtime with the opportunity to solve many hard distributed systems problems that must otherwise be addressed by the developer.

Since I assume that you are one of the co-authors of the paper, you are in a good position to answer the question: what is the primary focus of Orleans? It is completely fine if Orleans also does other things as secondary concerns, but I get the feeling that the primary focus of Orleans and Akka is fundamentally different, so that is what I would like to highlight here.

I’ll flesh out the intro of “different focus” to clarify this intent of this section.

Yes, I am one of the authors. Undoubtedly, the primary focus of Orleans is to simplify distributed computing and allow not experts to write efficient, scalable and reliable distributed services. "Guide the developers down a path of best practices" principle is exactly about that.
As a secondary focus we also provide a flexible platform for more expert developers.

gabikliot · 2015-04-30T22:31:53Z

@rkuhn, I am replying inline.
At this stage I also wanted to separate clarifying Orleans issues from asking questions about Akka. We do have a lot of questions about Akka, but I will hold all of them and first would try to get clarity about Orleans, and then as a second step get deeper into Akka.
Would that be OK?

gabikliot · 2015-04-30T22:35:14Z

ComparisonWithOrleans.md

+
+#### Virtual Actor Space
+
+  * In Orleans each type of Grain corresponds to a practically infinite space of Grain instances that conceptually all exist from the beginning of the universe to its end. The relation to the physical Actors that implement the Grains is explained similar to virtual and physical memory, but this comparison is misleading since the virtual address space of a process is explicitly populated with the desired contents instead of containing the whole system’s information by default.


The analogy was used in the on demand paging, and separation of virtual addresses vs. physical addresses. In that context we do think this is a very suitable, an even a great analogy. Virtual actors have multiple aspects, not all of them are similar to virtual memory (the eternal existence is not), but on demand instantiation and automatic reclamination (similar to virtual memory swap out) is.

Okay, will clarify that the analogy applies to that aspect.

gabikliot · 2015-05-04T17:40:06Z

Thank you @rkuhn. Overall looks much more accurate. There are still a couple of places where you did not incorporate my feedback.

the optimized local msgs - I explained how both our approaches are similar and that "Akka’s lack of indirection DOES NOT allow substantial performance optimizations in this regard", compared to Orleans. We do the same.
I think you did not stress that Akka breaks grain isolation, by "the default executors do not respect this constraint and will typically run Future continuations concurrently with the Actor that scheduled them."
"Grains can only work in either the fully blocked or fully reentrant modes" I explained how this is not correct. Grain can have a mixed mode as well (via TPL primitives). And Grains can also support "becoming".

rkuhn · 2015-05-04T19:33:01Z

Yes, I have not yet incorporated everything, I ran into the wall of my time box (this week is rather busy, I’ll come back to this).

rkuhn · 2015-05-28T14:01:15Z

Sorry that it took a little longer, I had added a commit that should address the outstanding comments, please review. If everything is good then I’d like to merge.

gabikliot · 2015-05-28T17:00:26Z

Looks good. Thanks.

gabikliot · 2015-05-28T17:30:22Z

Ronald, I have a question:
@sergeybykov wanted to comment on one of your previous comments:

"The important refinement that is missing here is that neither horizontal nor vertical is the solution, instead we must acknowledge that both are orthogonal concerns and contribute to the solution. For the definitions I’m using please refer to the glossary of the Reactive Manifesto: Errors are made by clients and need to be signaled back to them while Failures render the service unable to perform its function and need vertical help for recovery. We call that supervision, it might be called differently, but the crucial notion is that clients shall not receive Failures. The other notable fact is that stack traces are meaningless in distributed systems, which makes throwing exceptions even less appropriate for Error handling than it was in classical (local) OO programming: the service should reply with a normal value instead that denominates the error.
In any case, can we conclude that I fix the description of shared-state concurrency problems to clarify that low-level data races are removed while high-level message races remain? And I shall also create a new paragraph describing the difference in failure/error handling philosophy."

but we seem not to be able to find that original comment.

So is my comment about "Distributed Systems Bibles" and Reactive Manifesto not being one of them (for me personally). I actually think my comment is important and expresses something that people need to hear.

What happened to those comments?

rkuhn · 2015-05-28T17:37:28Z

Comments are shown as “outdated diff” in the history above, you can expand them and still read them, but since I corrected the text line that they referred to they don’t apply to the current version any longer (in github’s opinion).

One way to avoid this “comment folding” is to make overall comments (as we are doing right now) or comment on individual commits instead of on the PR—this works better if the history is kept, which I intend to do here.

In any case, thanks for lifting these important ones up to this level, they are easily accessible now.

Concerning failures/errors I just added another commit that should clarify the wording and more precisely capture the difference in philosophy between Orleans and Akka. If the current text is okay from your side then I’d like to merge it—we can always discuss and make changes via new issues or PRs on this repository.

gabikliot · 2015-05-28T17:47:38Z

Ohh, I see now! Thanks! That was tricky.

Yes, go ahead please. I say lets merge it all and you can just "publish" this whole doc and then we can potentially submit new pull requests to further extend some points. I think it is in a wip branch. Can maybe make it to the main branch. The current version is definitely good enough as version 1.

rkuhn · 2015-05-28T18:05:32Z

Thanks for your help, @gabikliot !

add comparison with Orleans

sergeybykov · 2015-05-29T15:19:32Z

The other notable fact is that stack traces are meaningless in distributed systems, which makes throwing exceptions even less appropriate for Error handling than it was in classical (local) OO programming: the service should reply with a normal value instead that denominates the error.

@rkuhn It's interesting that you mentioned it. In Orleans, because of the async RPC as the primary mode of interacting with actors and the automatic propagation of errors, stack traces are actually meaningful. Let me illustrate this with an example.

Say you have a web frontend (FE) that upon receiving a REST request makes a call to method X of grain A. As part of executing A.X(), grain A makes a call to method Y of grain B. B.Y() in its turn makes a call to storage (ST) that throws an exception, e.g. storage is temporarily unavailable. So you have a call chain of FE -> A.X() -> B.Y() -> ST.

It is enough to put a try/catch at the FE level to handle such errors AND to have a meaningful call stack of the error. So if you write something like

try
{
    var x = await a.X(args);
}
catch(Exception exc)
{
   log(exc);
}

then exception exc will contain a chain of inner exceptions and their call stacks: the original exception throw by ST and exceptions re-thrown by B and A with their respective call stacks:

ExceptionA (A.X()/A.XImpl()/A.CallB())
ExceptionB (B.Y()/B.YInternal()/B.SaveToStorage())
ExceptionST (ST.Foo()/ST.Bar())

This is the default behavior, with no error handling logic in A or B. Of course, one can put a try/catch within A.X() and/or B.Y() to analyze the error and retry or report it or to alternate execution of the method, e.g. write to a queue in case the primary storage is unavailable.

We believe this is a very important feature of Orleans - distributed and asynchronous propagation of exceptions. It makes reasoning about errors in a distributed app almost comparable to it in a single process app and allows developers to write the minimum amount of code for most common cases, as in the example above - put error handling logic only at the FE level. With pure one-way message passing such simplicity is pretty much impossible to achieve.

rkuhn · 2015-06-02T08:34:43Z

@sergeybykov Yes, you’re right: if the service’s model fits request–response well then that gives you a convenient handle on error propagation, as noted in the comparison document. My point is that assuming an exception to be an error (as opposed to a failure) unless stated otherwise can misguide system design, and I certainly do not agree that all error handling logic should be moved up to the FE level. What should be done at that level is user input validation, and if valid inputs lead to errors further down then someone did a mistake internally—which means service failure and not user error.

sergeybykov · 2015-06-02T15:20:38Z

@rkuhn I didn't mean that error handling should be done at the FE level. It can definitely be done at any level in the call chain that makes sense for the app. What we see in practice in interactive services though is that it's a rather rare case that a useful error handling can be done at lower levels.

For example, a retry can be attempted by any actor in the call chain. However, in most cases the lower levels cannot know if a retry is desirable by the scenario, which only the top layers in the call chain, FE or not, may know.

I'm not sure I agree with your black and white picture of errors vs. failures. In a distributed system sometimes you simply don't know for sure. A socket error may indicate a failure of the node on the other end or just a temporary network glitch. From the application perspective they are indistinguishable until we learn that this is in fact a failure of the remote node. In the meantime, the app usually cannot wait for such a fact to be established, and has to treat this ambiguous situation as an error.

rkuhn · 2015-06-03T10:41:05Z

We should probably meet sometime and discuss this over a beverage of your choice ;-)

One more thought here is that the black/white separation’s feasibility depends on the definitions: it gets pretty clear and simple if you apply HTTP thinking, as in 4xx status codes mean “you did something wrong” (i.e. an Error) while 5xx status codes mean “I did something wrong” (i.e. a Failure). Then it does not matter whether I know that another remote node is down or not—if I cannot provide my service then it is a Failure on my part. This keeps all observations and their reactions nicely local, which is a big plus in distributed systems.

sergeybykov · 2015-06-03T14:15:05Z

Definitely. So many questions can be sorted out over a drink. :-) Let me know when you travel to Seattle next time.

add comparison with Orleans

245ea5e

rkuhn mentioned this pull request Apr 30, 2015

Virtual actors/endpoints #3

Closed

gabikliot reviewed Apr 30, 2015
View reviewed changes

update Orleans comparison with G.Kliot’s input

5d3fbc5

incorporate remaining feedback from @gabikliot

18f252d

clarify error/failure distinction

f1a3e11

rkuhn added a commit that referenced this pull request May 28, 2015

Merge pull request #8 from akka/wip-Orleans

f193b15

add comparison with Orleans

rkuhn merged commit f193b15 into master May 28, 2015

rkuhn deleted the wip-Orleans branch May 28, 2015 18:05

mertant mentioned this pull request Jul 19, 2020

Documentation on the principles of virtual actors & comparison to Akka orbit/orbit#435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add comparison with Orleans #8

add comparison with Orleans #8

rkuhn commented Apr 30, 2015

gabikliot commented Apr 30, 2015

gabikliot commented Apr 30, 2015

gabikliot commented Apr 30, 2015

rkuhn commented Apr 30, 2015 via email

rkuhn commented Apr 30, 2015 via email

gabikliot commented Apr 30, 2015

rkuhn commented Apr 30, 2015 via email

gabikliot Apr 30, 2015

rkuhn May 1, 2015

gabikliot May 1, 2015

gabikliot commented Apr 30, 2015

gabikliot Apr 30, 2015

rkuhn May 1, 2015

gabikliot commented May 4, 2015

rkuhn commented May 4, 2015

rkuhn commented May 28, 2015

gabikliot commented May 28, 2015

gabikliot commented May 28, 2015

rkuhn commented May 28, 2015

gabikliot commented May 28, 2015

rkuhn commented May 28, 2015

sergeybykov commented May 29, 2015

rkuhn commented Jun 2, 2015

sergeybykov commented Jun 2, 2015

rkuhn commented Jun 3, 2015

sergeybykov commented Jun 3, 2015


		* Akka as a toolkit for building distributed systems, offering the full power but also exposing the inherent essential complexity.

		* Orleans restricts applicability in order to allow seamless use without understanding distributed computing.


		#### Virtual Actor Space

		* In Orleans each type of Grain corresponds to a practically infinite space of Grain instances that conceptually all exist from the beginning of the universe to its end. The relation to the physical Actors that implement the Grains is explained similar to virtual and physical memory, but this comparison is misleading since the virtual address space of a process is explicitly populated with the desired contents instead of containing the whole system’s information by default.

add comparison with Orleans #8

add comparison with Orleans #8

Conversation

rkuhn commented Apr 30, 2015

gabikliot commented Apr 30, 2015

gabikliot commented Apr 30, 2015

gabikliot commented Apr 30, 2015

rkuhn commented Apr 30, 2015 via email

rkuhn commented Apr 30, 2015 via email

gabikliot commented Apr 30, 2015

rkuhn commented Apr 30, 2015 via email

gabikliot Apr 30, 2015

Choose a reason for hiding this comment

rkuhn May 1, 2015

Choose a reason for hiding this comment

gabikliot May 1, 2015

Choose a reason for hiding this comment

gabikliot commented Apr 30, 2015

gabikliot Apr 30, 2015

Choose a reason for hiding this comment

rkuhn May 1, 2015

Choose a reason for hiding this comment

gabikliot commented May 4, 2015

rkuhn commented May 4, 2015

rkuhn commented May 28, 2015

gabikliot commented May 28, 2015

gabikliot commented May 28, 2015

rkuhn commented May 28, 2015

gabikliot commented May 28, 2015

rkuhn commented May 28, 2015

sergeybykov commented May 29, 2015

rkuhn commented Jun 2, 2015

sergeybykov commented Jun 2, 2015

rkuhn commented Jun 3, 2015

sergeybykov commented Jun 3, 2015