-
Notifications
You must be signed in to change notification settings - Fork 23
Conversation
There is a lot of inaccuracies in the description of Orleans (not sure why people pick to assume the worse when they lack information, it probably in our human nature), but the main one is about messaging guarantees. Orleans guarantees at most once and not at least once, by default: |
Another inaccuracy: |
If you are truly interested I can write a detailed bullet by bullet response to the inaccuracies in this summary. |
Of course I'm interested! But please consider that I did not make anything up, all information has been extracted from the source I cited above. If the TR is not the authoritative source of information then please send me the real one so I can redo the comparison myself—although it is a shame that I should have wasted many hours on this by trusting that a paper from the team at Microsoft Research is correct in describing their own product.
|
Sorry about the tone of my previous comment, time is my most previous resource and the idea of wasting it got to me. I am truly interested in understanding Orleans and thereby extending my understanding of Akka, which hopefully explains my passionate reply.
|
There was one mistake in the paper, about Messaging Delivery Guarantees. All the rest is correct. Much more details are in our documentation web site: http://dotnet.github.io/orleans/. EDIT: Also, the paper was written quiet a while ago, the system of course kept evolving and we added more capabilities and fixed/changed/improved certain things since the paper. The paper is not to be blamed for that, right? EDIT One thing that I can fully agree with: Akka has a much nicer and detailed online documentation that Orleans. Orleans is still young on Github (we only got to GH in January), so we did not yet fully catch up with documentation. We will. |
Yes, please comment on the individual lines so that discussion happens within the right context. Thanks.
|
|
||
* Akka as a toolkit for building distributed systems, offering the full power but also exposing the inherent essential complexity. | ||
|
||
* Orleans restricts applicability in order to allow seamless use without understanding distributed computing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orleans allows non distributed computing experts to write distributed services, but it does not mean that an expert can not find it useful or necessarily limited in applicability or with less performance. An expert can configure/extend it for a wide range of applications. I would phrase it: "Orleans simplifies distributed computing allowing non experts to write distributed services. At the same time there is enough extension points and flexibility in Orleans to allow even an expert to customize his services in a flexible way". Arguably, Akka has more extensibility than Orleans at this point (I don't know enough Akka to say if this is the case for sure, but if someone who knows both systems says that, I would easily believe her). One also does not want to provide too many extensibility points, otherwise it becomes too clunky, too complicated to understand and use.
But it would not be true to say that Orleans does not provide a lot of extensibilities as well or is only good for "dumb non-experts".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent of the point-by-point comparison is to highlight the differences between both approaches, not to sell one or the other. Therefore since both Orleans and Akka provide extensibility, the interesting question in this particular section is what each solution targets, and reading the TR I came to the conclusion that—bluntly speaking—Akka requires developers to understand distributed computing while Orleans aims at avoiding the necessity for that. This is supported by quotes such as
To build a correct solution to such problems in the application, the developer must be a distributed systems expert. To avoid these complexities, we built the Orleans programming model and runtime, which raises the level of the actor abstraction.
and
This level of indirection provides the runtime with the opportunity to solve many hard distributed systems problems that must otherwise be addressed by the developer.
Since I assume that you are one of the co-authors of the paper, you are in a good position to answer the question: what is the primary focus of Orleans? It is completely fine if Orleans also does other things as secondary concerns, but I get the feeling that the primary focus of Orleans and Akka is fundamentally different, so that is what I would like to highlight here.
I’ll flesh out the intro of “different focus” to clarify this intent of this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am one of the authors. Undoubtedly, the primary focus of Orleans is to simplify distributed computing and allow not experts to write efficient, scalable and reliable distributed services. "Guide the developers down a path of best practices" principle is exactly about that.
As a secondary focus we also provide a flexible platform for more expert developers.
@rkuhn, I am replying inline. |
|
||
#### Virtual Actor Space | ||
|
||
* In Orleans each type of Grain corresponds to a practically infinite space of Grain instances that conceptually all exist from the beginning of the universe to its end. The relation to the physical Actors that implement the Grains is explained similar to virtual and physical memory, but this comparison is misleading since the virtual address space of a process is explicitly populated with the desired contents instead of containing the whole system’s information by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The analogy was used in the on demand paging, and separation of virtual addresses vs. physical addresses. In that context we do think this is a very suitable, an even a great analogy. Virtual actors have multiple aspects, not all of them are similar to virtual memory (the eternal existence is not), but on demand instantiation and automatic reclamination (similar to virtual memory swap out) is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will clarify that the analogy applies to that aspect.
Thank you @rkuhn. Overall looks much more accurate. There are still a couple of places where you did not incorporate my feedback.
|
Yes, I have not yet incorporated everything, I ran into the wall of my time box (this week is rather busy, I’ll come back to this). |
Sorry that it took a little longer, I had added a commit that should address the outstanding comments, please review. If everything is good then I’d like to merge. |
Looks good. Thanks. |
Ronald, I have a question: "The important refinement that is missing here is that neither horizontal nor vertical is the solution, instead we must acknowledge that both are orthogonal concerns and contribute to the solution. For the definitions I’m using please refer to the glossary of the Reactive Manifesto: Errors are made by clients and need to be signaled back to them while Failures render the service unable to perform its function and need vertical help for recovery. We call that supervision, it might be called differently, but the crucial notion is that clients shall not receive Failures. The other notable fact is that stack traces are meaningless in distributed systems, which makes throwing exceptions even less appropriate for Error handling than it was in classical (local) OO programming: the service should reply with a normal value instead that denominates the error. but we seem not to be able to find that original comment. So is my comment about "Distributed Systems Bibles" and Reactive Manifesto not being one of them (for me personally). I actually think my comment is important and expresses something that people need to hear. What happened to those comments? |
Comments are shown as “outdated diff” in the history above, you can expand them and still read them, but since I corrected the text line that they referred to they don’t apply to the current version any longer (in github’s opinion). One way to avoid this “comment folding” is to make overall comments (as we are doing right now) or comment on individual commits instead of on the PR—this works better if the history is kept, which I intend to do here. In any case, thanks for lifting these important ones up to this level, they are easily accessible now. Concerning failures/errors I just added another commit that should clarify the wording and more precisely capture the difference in philosophy between Orleans and Akka. If the current text is okay from your side then I’d like to merge it—we can always discuss and make changes via new issues or PRs on this repository. |
Ohh, I see now! Thanks! That was tricky. Yes, go ahead please. I say lets merge it all and you can just "publish" this whole doc and then we can potentially submit new pull requests to further extend some points. I think it is in a wip branch. Can maybe make it to the main branch. The current version is definitely good enough as version 1. |
Thanks for your help, @gabikliot ! |
@rkuhn It's interesting that you mentioned it. In Orleans, because of the async RPC as the primary mode of interacting with actors and the automatic propagation of errors, stack traces are actually meaningful. Let me illustrate this with an example. Say you have a web frontend (FE) that upon receiving a REST request makes a call to method X of grain A. As part of executing It is enough to put a try/catch at the FE level to handle such errors AND to have a meaningful call stack of the error. So if you write something like try
{
var x = await a.X(args);
}
catch(Exception exc)
{
log(exc);
} then exception exc will contain a chain of inner exceptions and their call stacks: the original exception throw by ST and exceptions re-thrown by B and A with their respective call stacks:
This is the default behavior, with no error handling logic in A or B. Of course, one can put a try/catch within A.X() and/or B.Y() to analyze the error and retry or report it or to alternate execution of the method, e.g. write to a queue in case the primary storage is unavailable. We believe this is a very important feature of Orleans - distributed and asynchronous propagation of exceptions. It makes reasoning about errors in a distributed app almost comparable to it in a single process app and allows developers to write the minimum amount of code for most common cases, as in the example above - put error handling logic only at the FE level. With pure one-way message passing such simplicity is pretty much impossible to achieve. |
@sergeybykov Yes, you’re right: if the service’s model fits request–response well then that gives you a convenient handle on error propagation, as noted in the comparison document. My point is that assuming an exception to be an error (as opposed to a failure) unless stated otherwise can misguide system design, and I certainly do not agree that all error handling logic should be moved up to the FE level. What should be done at that level is user input validation, and if valid inputs lead to errors further down then someone did a mistake internally—which means service failure and not user error. |
@rkuhn I didn't mean that error handling should be done at the FE level. It can definitely be done at any level in the call chain that makes sense for the app. What we see in practice in interactive services though is that it's a rather rare case that a useful error handling can be done at lower levels. For example, a retry can be attempted by any actor in the call chain. However, in most cases the lower levels cannot know if a retry is desirable by the scenario, which only the top layers in the call chain, FE or not, may know. I'm not sure I agree with your black and white picture of errors vs. failures. In a distributed system sometimes you simply don't know for sure. A socket error may indicate a failure of the node on the other end or just a temporary network glitch. From the application perspective they are indistinguishable until we learn that this is in fact a failure of the remote node. In the meantime, the app usually cannot wait for such a fact to be established, and has to treat this ambiguous situation as an error. |
We should probably meet sometime and discuss this over a beverage of your choice ;-) One more thought here is that the black/white separation’s feasibility depends on the definitions: it gets pretty clear and simple if you apply HTTP thinking, as in 4xx status codes mean “you did something wrong” (i.e. an Error) while 5xx status codes mean “I did something wrong” (i.e. a Failure). Then it does not matter whether I know that another remote node is down or not—if I cannot provide my service then it is a Failure on my part. This keeps all observations and their reactions nicely local, which is a big plus in distributed systems. |
Definitely. So many questions can be sorted out over a drink. :-) Let me know when you travel to Seattle next time. |
No description provided.