Error and cancellation handling #729

kyouko-taiga · 2022-09-22T00:23:11Z

kyouko-taiga
Sep 22, 2022
Maintainer

I'm opening this thread to continue the discussion about error and cancellation handling, started by Eric Niebler here: https://github.com/val-lang/val-lang.github.io/discussions/32#discussioncomment-3694357.

The goal is to discuss the optimal design for Val's error and "serendipitous-success" handling in terms of expressiveness and ergonomics.

lucteo · 2022-09-22T03:33:52Z

lucteo
Sep 22, 2022
Collaborator

IMHO, I agree with Dimitri in saying that cancellation can be implemented using exceptions.

Disclaimer: I was not close to the P1677 discussions, and I don't have in-depth knowledge on how exceptions work in Val. I would try to lay down some of my intuitions and arguments based on that; I haven't done the exercise to make this too formal.

I think at the core of the dispute, there are a few types of arguments:

semantic argument -- what cancellation means
composability -- how do we compose code with cancellation
usability and significance for senders/receivers
assumptions related to exceptions and performance implications

For all of these, it's important to notice that there are fundamental differences between C++ and Val, so arguments cannot simply be transferred from one language to the other.

1. What cancellation means

In P1677 and S/R framework, we accept as "real" scenarios the following:

completion with success
completion with error / exceptional case
cancelled

We insist that "cancelled" is as real as "completion with success" because we encounter it in practice so often.

But, by the same argument, “partial success” should also be a “real scenario”. For example, while trying to read 10 MB from the network we only get 1 MB of data; this can happen because the network was disconnected, or that the user cancelled the download. For this example, it is probably useful to keep track of the downloaded data, so this case cannot be simply mapped to error or cancellation completions. We can frequently find examples like this, so, should we create new types of completions?

We can most likely imagine other real scenarios that would make us believe that we need to add other completion types. Maybe we want to distinguish between upstream and downstream cancellation, or whatever…

I do believe that a good way to break completion into categories is to look at pre- and post-conditions. A success completion signal would guarantee that post-conditions of the operations are met. All the other scenarios would not guarantee this.

For example, for a stack-push operation, the post-condition is that the stack is not empty. If we have an error during the push operation, or if we cancel this operation, the post-condition cannot be always met.

Thus, from a semantic perspective, it makes sense to divide the types of completions into two parts:

successful completion
non-successful completions

2. Composability

Looking at P1677, we see examples on how exceptions may not compose. But, I would argue that these cases are due to limitations of C++ ecosystem, and how we are trying to build S/R on top of it.

I would argue that a good error hygiene avoids the composability problems.

Let’s look at an example on how one might write concurrent code in Val. We want to start a new series of computations (f, g and h) on a new scheduler. This would be expressed as regular function in Val:

fun computation() -> Void {
  switch_to_my_thread_pool() 	// ex::schedule(my_thread_pool)
  let a = f()					// ex::let_value(f)
  let b = g(a)					// ex::let_value(g)
  let c = h(b)					// ex::let_value(h)
  // error and cancellation is handled outside
  return c
}

I am assuming that all these computations are magically using a shared stop token (not very Val-friendly, but possible)

Now, without knowing how the error and cancellation is handled for this computation piece, we can see that this composes really well. Just like it would compose in P2300.

Please note that composing computations is a monadic operation. That is true when we write S/R code with respect to errors and cancellation. But that is also true when we write code that may throw exceptions (that is: 90% or more C++ code uses monads for composition).

The thing that is important in the above example is that errors and cancellation composes the exact same way: in a monadic fashion. Thus, from a composability point of view, it makes sense to use the same mechanism for handling cancellation as we use for handling errors.

3. Usability and significance for S/R

S/R framework can complete a computation in 3 ways:

calls set_value(vals...)
calls set_error(err)
calls set_stopped()

This can be transformed, without losing generality, into something like: set_completed(optional<expected<vals..., err>>. Or, into something like: set_completed(expected<vals..., optional<err>>). Or, maybe into something like: set_completed(expected<vals..., variant<err, stoped>>).

From this perspective, one can argue that it is not relevant how we pack the states together, as long if we can convey the same information. We can put cancellation and error case toghether, without losing the anything from the computation.

The only difference is how we actually use this result. In C++, the usability may be a bit cumbersome, but that is because of C++ design choices. In Val, it turns out that expressing cancellation as yet another type of error is not a usability problem (as far as I understood it; Dimitri and Dave can correct me).

Moreover, if we simplify a computation to only produce a value type, then all the computations would have a shape of Async<T>. Users can think of concurrent computations just as simple functions with a return value (which may be a variant or tuple or whatever). This simplifies a lot the concurrency design for Val.

This is actually the main selling point for me to make Val combine cancellation with exceptions.

4. Exceptions assumptions

Coming from a C++ world, we typically make the following assumptions for exceptions:
a) they are slow
b) they should be used extremely rare

And here, the second point mostly derives from the first point. Other than that, it’s just an arbitrary convention.

My understanding is that Val fixes the performance issue. Thus, it may be acceptable for Val to say that exception can be used more often in cases of cancellation.

—
Hope all these make sense

1 reply

dabrahams Sep 25, 2022
Maintainer

My understanding is that Val fixes the performance issue.

Like C++, Val's spec does not describe the details of the unwinding mechanism. A valid C++ implementation could propagate all exceptions by checking a thread-local variable after every non-noexcept call under the covers, and simply returning if it is set. That would make throwing faster than a “zero overhead” table-based approach, at the expense of performance in the non-failure case. Likewise, Val can do it any way that achieves the right semantics. There's nothing about Val's design that makes efficient unwinding more possible. If we choose to say memory allocation failure is non-recoverable, we may reduce the number of APIs that need to unwind, but if what @ericniebler told me—that approximately all async code needs to be cancellable—is true, well, it's not clear how much of a reduction that will amount to.

kirkshoop · 2022-09-24T23:51:08Z

kirkshoop
Sep 24, 2022

@dabrahams:

I've found "expectedness" to be a fuzzy and useless distinction in discussions of error handling, and I strongly suspect the same applies to cancellation. IMO the well-defined bucket that both of these things fall into is “failure to satisfy postconditions.”
...
Can somebody lay out the argument [for cancellation is not an error] in a single paragraph?

Up to now I have been on my phone stealing time from other work. This reduces expressiveness and leads to even more misunderstanding.

I am not sure where to start:

Explore error values that we agree are not errors
Explore results that exist due to races (eg. on shared resources)
Establish agreement that 'cancellation' has three parts, and that this conversation is about the third part
Establish that, without cancellation, there are multiple success results that each have different postconditions
Explore the motivations for both jthread and fiber papers to create mechanisms that allow clean shutdown using the exception path and why each was removed
I could write a single paragraph

I have problems getting what is in my head into a form that fits into someone else's head. I have had a few coworkers that had the patience to iterate with me until they were able to say back to me in their own words something that matched what was in my head. Often the process improves what is in my head and develops better ways to express the ideas. I revere these people.

I think that the most important thing to add is that others don't always agree with what is in my head - and once I know that they disagree with what is actually in my head - I am content. I have no interest in persuasion or manipulation. I just want to share my perspective accurately.

Any preference?

@dabrahams:

I find the paper immensely frustrating. It says early on, “we cannot avoid the hard task of explaining why something like a cancelled_error exception or error_code is a poor representation of a cancelled result” but then it appears to do just that. It doesn't give me anything I can understand as an explanation.
...
I can't imagine that this takes 36 pages to explain. Can somebody lay out the argument in a single paragraph?
...
maybe that would be a good place to start the new thread

Yes, others are much better at writing papers than I am. 🤷 I am not happy about this. I put a lot of effort into communication in all forms.

2 replies

dabrahams Sep 25, 2022
Maintainer

Since you ask my preference, yeah, it's what I asked for, the last bullet: a single paragraph. Thanks!

dabrahams Sep 25, 2022
Maintainer

In case it helps with your writing: the question I think we're asking is, “do there need to be two distinct language features that unwind the stack?” My default position is no, because the bar is very high for adding a language feature. To convince me otherwise, it would have to be demonstrated that performance, correctness, or ease-of-use would be significantly adversely impacted by using just one feature for all unwinding patterns. I'm not at all interested in a discussion of whether cancellation is classified as an error or success or something else unless it supports such a demonstration.

kirkshoop · 2022-09-25T18:54:00Z

kirkshoop
Sep 25, 2022

Attempt 1

Discover concepts by studying Data Structures and Algorithms

"In some sense the only way for someone to fully understand why they have to be the way they are is by trying hundreds of different algorithms and finding the abstraction that allows the most beautiful and efficient representation of them."
...
"Good abstractions come from efficient algorithms and data structures and not from “architectural” considerations."

Stepanov, A., 2006, Notes On Programming, Lecture 13. Iterators (pdf)

@dabrahams:

To convince me otherwise, it would have to be demonstrated that performance, correctness, or ease-of-use would be significantly adversely impacted by using just one feature for all unwinding patterns.

auto populateMovies(auto maxTime, auto serverCA, auto serverUS, auto filter, auto tileContainer) {
  return then(
    timeout(when_all(
      when_any(retry(requestMovies(filter, serverCA)), retry(requestMovies(filter, serverUS))),
      when_any(retry(requestThumbnails(filter, serverCA)), retry(requestThumbnails(filter, serverUS)))), maxTime),
    [tileContainer](auto movies, auto thumbnails){
      for (auto movie : movies) {
        tileContainer.add(make_movie_tile(movie, thumbnails));
      }
    });
}

when_any() will cancel the losing requestMovies(server..) and requestThumbnails(server..)
on the failure of one when_any(), when_all() will cancel the other and then complete with the failure
retry() will restart requestMovies() and requestThumbnails() whenever they fail (so it either suppresses cancellation errors or selectively fails only with cancellation error)
timeout() will cancel the internally scheduled timer on success

Some issues with cancellation as an error

[correctness] In C++, noexcept functions cannot unwind a cancellation error
[correctness] In C++, all catch(...) must be called when a cancellation error is unwinding, since many catch(...) are used to cleanup and restore invariants (This has to be said, because it was proposed, multiple times, that the unwind of the cancellation error would not call catch(...))
[correctness] In C++, all catch() must distinguish between the cancellation error and other errors
- Cancellation error is, very often, not a failure for the application
- Cancellation error must not be implicitly translated to a library-specific or function-specific error (this would hide cancellation error from the application)
- catch(...), in generic code, must rethrow cancellation error (retry() would become a very tight loop if it retried on cancellation error after a server.. was placed in a cancel-requested state)
[correctness] A thread and a fiber would exit cleanly on a cancellation error, but terminate on any other error (In C++, this was the motivation for cancellation error in the jthread and fiber papers)
[ease-of-use] When debugging code like populateMovies(), with breakpoint-on-error enabled, the debugger will stop for every cancellation error
- Unless the debugger adds a filter for cancellation error or the user adds error filters to the breakpoint
[ease-of-use] Error report generators should be aware of cancellation error (it would be wasteful to generate and send a report for every cancellation to the developers)

0 replies

dabrahams · 2022-09-25T20:31:23Z

dabrahams
Sep 25, 2022
Maintainer

This is a good start.

Quoting Stepanov is usually a good way to soften me up, but honestly though I try to live by those words, I don't see the relevance here, other than that we're looking at a specific example 🤷
I believe I understand your example (even though I have no real knowledge of these S/R components), so the EDSL is working :-)
“In C++, noexcept functions cannot unwind a cancellation error” — I don't see why that's an issue. The key reason to know which functions throw is that it tells you where broken invariants may need to be restored because the straight-line code that was going to restore those invariants will be skipped. That's an issue regardless of the reasons for unwinding, so noexcept had better mean “no unwinding.”
I don't know what “failure for the application” means. This seems perilously close to an argument about whether cancellation is an error or a failure or some kind of success, which I think is pretty irrelevant without some rigorous definition of those categories.
“A thread and a fiber would exit cleanly…” do you mean ”should exit cleanly?“ The response of threads or fibers to otherwise-unhandled exceptions seems easy enough to control, since the library owns the launch point.
Debuggers and error report generators can be taught about special exception types

To me, none of the issues seem decisive (admittedly I don't understand the “failure for the application” issue). It seems like you might have discovered some kind of truth within the constraints of C++ as practiced today, but I'm not even sure of that. I certainly don't see anything determinative for a new language, yet. Am I missing something?

1 reply

kirkshoop Sep 26, 2022

jthread and fiber papers in C++

"In C++, noexcept functions cannot unwind a cancellation error” — I don't see why that's an issue. The key reason to know which functions throw is that it tells you where broken invariants may need to be restored because the straight-line code that was going to restore those invariants will be skipped. That's an issue regardless of the reasons for unwinding, so noexcept had better mean “no unwinding.”

“A thread and a fiber would exit cleanly…” do you mean ”should exit cleanly?“ The response of threads or fibers to otherwise-unhandled exceptions seems easy enough to control, since the library owns the launch point.

As explained, clean exit was the primary motivation in both the jthread and fiber papers (written by different authors).

Specifically, the papers proposed that each jthread and fiber had a cancellation state. Each paper defined a set of functions (condition_variable::wait.. or fiber_yield) that could exit with a cancellation error that must cleanly unwind any stack of functions back to the 'launch point' which would then exit cleanly without error. I was able to prove that every function used on the thread and fiber had to cooperate to properly restore invariants and not suppress the cancellation result (when the jthread and fiber were in the cancellation requested state).

Owning the launch-point is insufficient for their scenario. they also own the leaf functions and that is still insufficient for their scenario. Their scenarios (and async cancellation scenarios) require cooperation from all the functions in the stack.

I do not dismiss scenarios, I try to incorporate them into solutions that are general across scenarios.

The nice thing about cancellation being cooperative is that it must restore invariants and that it can restore invariants. cancellation requests can be used to restore invariants, sometimes that means completing the work rather than exiting early. The key is cooperation in responding to cancellation requests. The application never enters an invalid state. If the operation or invariant restoration fails, then the result is an error describing that failure - not a cancellation error.

Search Adobe application code.

Find random samples of error handling that does not mention cancellation error, but would include cancellation error. How many of those sites would unintentionally suppress cancellation error? Some might capture all errors and produce a new error. Others, might capture a 'base' error that includes cancellation error and produce a new error. Some might log and suppress a set of errors, and some might be noexcept, Etc..
These block the clean shutdown scenarios.

special handling

Debuggers and error report generators can be taught about special exception types

I don't know what “failure for the application” means.

Search Adobe application code.

How many times is cancellation error mentioned? What percentage of those mentions convert the cancellation error into a benign outcome?

How many tools must filter cancellation out of errors? How many libraries? Why is this the error that is such a cross-cutting concern that so many error handlers and processors need special code to treat it differently?

The algorithms have to specially handle cancellation error.
retry() should not retry after a cancellation error.
when_any() should return a success result from the first task and not the cancellation error from the other tasks.
when_all() should return an error result from the failing task and not the cancellation error from the other tasks.

status quo

I don't see the relevance here, other than that we're looking at a specific example

The relevance is that there is a long history behind the development of these algorithms and they all require that a cancellation result must not leave any violated invariants. Cancellation result is just an event that lets them complete with the actual result.

I certainly don't see anything determinative for a new language, yet.

I have not seen any language that does not need this. I consider it fundamental to computation. I brought this to C++ first and now to carbon and Val. I have considered approaching circle and nim. The only benefit for proposing this to a new language is that they might be a little more open to breaking changes.

This is also a small part of a larger change to function definition fundamentals that I am also proposing for carbon. Eventually the larger changes will be proposed to C++ and Val.

dabrahams · 2022-09-26T20:53:55Z

dabrahams
Sep 26, 2022
Maintainer

I was able to prove that every function used on the thread and fiber had to cooperate to properly restore invariants and not suppress the cancellation result (when the jthread and fiber were in the cancellation requested state).

First, I don't believe that every function needs to cooperate, unless you count being unconcerned with exceptions as “cooperating.” Any function that doesn't break invariants needs no catch blocks (or other cleanup mechanisms for that matter) and can just let exceptions pass through. That's the vast majority of code, so in fact most functions don't need to cooperate in any meaningful way.

Second, the kind of cooperation that is required is—unless I'm missing something—exactly the same kind of cooperation required for correct error handling. So AFAICT, this is not some new, unique problem.

Search Adobe application code.

I'm not gonna do that. Photoshop alone is > 30M lines. @sean-parent should feel free to correct me, but IIUC, Adobe application code is full of (user-initiated) cooperative cancellation that is propagated via an exception. That's certainly the way I did user cancellation when I was writing my own desktop application. Worked a treat.

The algorithms have to specially handle cancellation error.

You mean specifically these async compositional primitives. Sure, there are a few fundamental ones, and if you invent a new async combinator, I'd expect you to have special cancellation handling there, too. Seems fine to make those do a little work to check the reasons any sub-task "failed."

they all require that a cancellation result must not leave any violated invariants.

As with any other unwinding process. In fact, everything in the whole program must be written to not leave any violated invariants. That's sort of the meaning of “invariant.”

Aside from needing special attention in async combinators and not (usually) requiring any direct reporting to the application's user. I still don't see anything fundamental that distinguishes cancellation from other kinds of unwinding. 🤷‍♂️

3 replies

kirkshoop Sep 27, 2022

I see.

Thank you for the replies, but this is not the level of engagement that I hoped for.

I do not think that this is the right place to invest my energy and time.

dabrahams Sep 27, 2022
Maintainer

Sorry to hear that, truly. I have been trying to engage very actively with everything you've said. I don't know how I could have been more engaged, but if you can explain it, I can try. The fact that I'm not convinced may be discouraging, but it doesn't mean I'm not listening.

kirkshoop Sep 28, 2022

This would not be related to tech. I spent some time trying to find a DM option on Twitter and stlabs slack. I am open on both if you would like to reach out.

ericniebler · 2022-09-27T22:37:31Z

ericniebler
Sep 27, 2022

To me, the issue is pretty simple. Kirk already said it all, but I'll reiterate/summarize: The fundamental difference is:

ERROR: I wanted this result, but apparently I can't have it for Reasons. Tell me why.
STOPPED: I didn't want this result.

Everything flows from there. "ERROR" feels failure-ish to me and "STOPPED" doesn't. The algorithms we've written very frequently want to handle these differently. If I use the rule of thumb in C++ to only use exceptions for unexpected and infrequently-occurring conditions, then "ERROR" is an exception and "STOPPED" isn't. Why? Because if I'm waiting for a result, I expect to get it ... unless I tell it to stop, in which case I expect it to stop.

Stop requests are normal operating procedure for some very common generic algorithms like when_any. If you treat them as exceptional and optimize accordingly (for the "common", non-exceptional case), then you're pessimizing any generic algorithm that launches work speculatively.

I know there are some APIs that can't "fail" ever (i.e., won't do what I ask it to do), but I still want them to be stoppable. And I want that distinction surfaced in the type system and the programming model. "Will-always-do-what-you-ask-me-to-do" is a nice thing to know about a component.

3 replies

dabrahams Sep 28, 2022
Maintainer

"ERROR" feels failure-ish to me and "STOPPED" doesn't.
I don't know what to do with statements like that. It's not a technical argument, is it?

…the rule of thumb in C++ to only use exceptions for unexpected… conditions…

The idea that the conditions are unexpected has never made any sense to me, and I've always taken it to be part of the fallacious thinking around exceptions. Obviously you expect them to occur or you wouldn't be checking for them and reporting them to the caller, just like you expect the same semantic error if it was reported a different way.

"ERROR" is an exception and "STOPPED" isn't. Why? Because if I'm waiting for a result, I expect to get it ... unless I tell it to stop, in which case I expect it to stop.

I don't follow your logic here. First of all, you expect to get the result, sure, but you're leaving out part of that statement: “unless the result isn't produced, in which case you expect an exception”. Secondly, the caller of a function that cooperatively detects cancelation and initiates unwinding is almost never the code that “tells it to stop,” and is more likely just a larger operation of which the detecting function is just a part. In fact the cancellation request normally comes from a different thread of execution, so for everything on the call stack through which cancellation needs to unwind, the point of view is precisely “I'm waiting for a result,” and if cancellation occurs it means the result wasn't produced. Just like for errors.

Stop requests are normal operating procedure for some very common generic algorithms like when_any. If you treat them as exceptional and optimize accordingly (for the "common", non-exceptional case), then you're pessimizing any generic algorithm that launches work speculatively.

Even if I have five things doing compute-intensive work and—let's say I'm being wildly speculative—four of those five will be canceled, don't I still want whichever one is going to succeed to do so as fast as possible, with as little as possible cost paid for the possible cancelation that never occurred? Don't assume that Val will use an unwinding implementation like what most C++ implementation use for most things or even for anything. We very well might choose different tradeoffs. Swift does unwinding by essentially returning a variant<T, Error> under the covers. The question here is what should be surfaced to the language user, not how those semantics are implemented.

Will-always-do-what-you-ask-me-to-do is not something you can enforce in the type system, since cancellation is cooperative, and because unwinding needs to run destructors. You can always create functions that never check for cancellation or write a destructor that does arbitrary (or even endless) work. Just because you want to say something about a function does not mean that there should be a language feature to say that thing. That something is nice to know is not a reason to represent it in the type system or build a language feature around it. The point of this argument is that you're asking for additional language complexity and, normally complexity needs to be justified by more than how things feel or what one thinks is nice. So far, you're not giving me anything I can understand as a reason to design/document/maintain/teach two features instead of one. I'm listening for something, but I'm not hearing it.

ericniebler Sep 28, 2022

I'm not hearing it.

That's clear. It's your thang, do whatcha wanna do.

dabrahams Sep 28, 2022
Maintainer

I feel really bad about how this is playing out, like a design choice will be made without both sides understanding one another. I normally like to reach a common understanding of what the concerns are and either find a compromise or make a conscious choice that some concerns have priority over others. That's what I'm committed to, and I don't feel like I achieved that.

Clearly there's something you've been saying that I don't understand yet, and when I argue it's an attempt to get y'all to help me understand. I never want people to give up because of my inability to comprehend a problem, and if that's what's happening here, I've failed badly.

If you think it would help bridge the understanding gap, I'd be happy to do a video call with one or both of you.

kyouko-taiga · 2022-10-01T14:19:43Z

kyouko-taiga
Oct 1, 2022
Maintainer Author

It is unfortunate that the discussion seems to have stalled. I think it is a great topic and I would hate to miss an opportunity to contribute important insights to language design as a discipline. Perhaps stepping back from technical details might help us get back on track.

After some time trying to let the different arguments simmer in my mind, I think the main contention is a philosophical question. When @ericniebler says that some things feel "failure-ish" and some don't, I get the sense that "failure-ish" is a personal appreciation.

To give an example, in this comment, @kirkshoop said:

When [a matrix multiply that cannot fail] is cancelled it means the the post-conditions might have been met and that the state is not corrupted and that the whole operation that called this function will be discarded.

Cancellation is very different from an error.

I answered:

I would not be shocked to say that cancelling your multiplication will result in a thrown error.

I think both point of views are valid, depending on one's definition of "failure-ish".

My penchant for rigorous formal definitions causes me to typically dislike "-ish" qualifiers. Revisiting the entire discussion so far, I think we still haven't found clear formal criteria. That does not necessarily mean there can't be a useful distinction between specific instances in specific applications, it only means we're lacking a general-purpose and unambiguous definition.

When confronted to this kind of situation, a reasonable approach might be to identify the characteristics of the different concepts and unify them under one abstraction. In our case, I've been using the term "error" to denote this abstraction, but for the sake of the discussion, let's use a different one: "unconventional result". Let's also put aside any assumption on handling mechanisms, type system representations, or any other kind of technical detail for the moment.

Using that new term, let's see if we can agree on the following statements:

errors are unconventional results;
cancellations are unconventional results;
there must be a way to clean-up unfinished work after an unconventional result;
some operations may produce unconventional results, some can't; and
generic algorithms must be able to handle all kinds of operations.

Crucially, notice that I am not saying that cancellations are errors and that I'm not arguing against the fact that cancellations should be expected.

Checkpoint: do we agree so far?

Now, I posit that there are likely more than two categories of unconventional results. If that assumption is correct, then it is likely simpler to handle all of them with a single general-purpose mechanism.

To give an example, I wrote a parser combinator library where each composable parser may produce a "hard failure" or a "soft failure". A hard failure occurs when a parser failed to consume a suffix after having consumed a prefix. It typically propagates far, and requires cleanup to restore the character stream. A soft failure occurs when a parser failed to recognize a prefix and is typically handled close to its origin.

I also posit that it is likely some unconventional results fall under multiple categories. If that assumption is correct, a strict tree-shaped hierarchy would be difficult to define. It would be best to assign an unconventional result to one or more categories using an attribute system (e.g., traits).

Checkpoint: do we agree so far?

Until we can find a set of criteria that let us take any unconventional result and determine whether it falls under a single rigorously defined general-purpose category, I posit that a general-purpose language should not get different features to interact with them.

To give an example, I think that neither cancellations nor soft failures fit the requirement. The former do not have a rigorous definition (AFAICT) and the latter are not general-purpose.

Finally, I note that if I were concerned about the compiler pessimizing the handling of unconventional results that are likely to occur, one reasonable solution might be to provide the compiler with a hint in the form of an annotation.

34 replies

lucteo Oct 11, 2022
Collaborator

Exception handling

do {} inspect {} is nothing like try {} catch() {}

Please note that we are taking about a new language, that may have different implementation strategies than C++. We can implement try {} catch() {} without any memory allocation and RTTI. AFAIK, Val would not be the first language to do this. I may be wrong, but I think I remember Dave saying that he plans to implement exceptions in an efficient manner.

If we can implement exceptions w/o memory allocation and RTTI, then the two structures are alike.

Handling unwinding

The example above shows that unwinding for cancellation has the same unwinding impact on the code that exceptions have. The body for f1() looks exactly the same if we are talking about exceptions, or about propagating cancellation.

The only real difference is in the function declaration.

The main difference: function declaration

Ignoring syntax and ignoring possible implementation choices, it seems that cancellation behaves the same as exceptions in the body of the functions.

The main difference that I can see right now between what Kirk is suggesting and what I can understand from Dave/Dimitri is how function declaration is expressed.

From what I can understand from the examples is that Kirk argues for explicit declaration of all possible return values, all possible exceptions and all possible stop conditions.

Claim 21: Functions should be explicit in declaring all possible result types, all possible exception types and all possible cancellation conditions

Claim 22: Functions can have multiple result types.
(I made a mistake on my example to allow g() to also return void, but I'm glad that made that mistake as it opened a new perspective)

Claim 23: Although exceptions and cancellation can behave similarly in code, function declarations would call out separately exceptions and cancellation.

@kirkshoop : is this correct?

While I can see debates on Claims 21 and 22, I believe the main disagreement on this subject is actually on Claim 23.

kirkshoop Oct 12, 2022

it seems that cancellation behaves the same as exceptions in the body of the functions.

Due to the vagaries of human communication I am unable to agree with that statement even when it is true from my point of view.

Humans will interpret the word 'behaves' to mean more than I agree with.

Rephrased "it seems that cancellation and exception results exit the calling function by default."

This is the only thing that they have in common.

For instance,

"It seems that success and error both have a value result."

And with my proposal

"It seems that success, error, and cancellation all continue in the caller using a tail call"

Listing the things that errors have in common with success (which is what all of these statements do) has nothing to do with the case I am making that a cancellation result is a success case that returns from the calling function by default.

Even if the quality of implementation for cancellation result is terrible, it does not change that a cancellation result is a representation of success. Blending a success case into the error channel (a <some word> value that is matched by type) is the same as throwing a c++ exception to exit a loop instead of using break. Given that this turns out to be accepted practice for exiting nested loops, imagine if that was expressed as return break rather than throw,is that a better pattern?

kirkshoop Oct 12, 2022

Blending a success case into the error channel also describes error values like ERROR_SUCCESS, which "works", but which I have grown to loathe.

Blending an error case into the success channel, like INVALID_HANDLE and npos, may "work", but I loathe those as well.

kirkshoop Oct 12, 2022

Having looped back to the beginning, this seems like the right place for me to bow out.

Thank you for the attempts to understand my proposals.

lucteo Oct 14, 2022
Collaborator

Thank you very much @kirkshoop for all the patience for going with us over these points.

At least for me, this clarifies a lot your point of view. If I may rephrase this, you believe that treating cancellation as exceptions would be like introducing SuccessError exception class.

This was a long journey, but I think it was worth it. Once again, thank you very much.

sean-parent · 2022-10-01T21:29:32Z

sean-parent
Oct 1, 2022
Maintainer

This is a very long thread with some good observations that I don't have time to fully respond to at the moment but I did want to make an observation. I believe (and may be wrong) that part of Kirk's argument is that the function result is not just propagating "was canceled" but also "cancel was requested". That is how I read that x can satisfy the post conditions for U. If the post conditions for U are satisfied, the operation is complete and cannot be canceled or retrun an error - but it may (it some system) return it completed successfully and also received a request to cancel. If the above is correct then we need to discuss how requests to cancel are signaled and if that can be separated from handling the cancelation request. I believe it can be. Part of what makes cancelation different is that it is an out of band signal. Once a request is received, I can't see any difference in how it should be handled compared to an error - which is different than how success is handled. Is there a "basic canceled guarantee" that is different from the basic exception guarantee? I don't believe so. For when_any, the when_any may request cancelation of the dependent tasks once it receives a non-error value. The operation performing the when_any may receive a cancelation request which is forwarded to all dependent operations of the when any. Any operation being performed by the when_any may receive a cancelation request from any external agent or may return an error. I don't see a reason that when_any would handle a canceled operation any differently than a failed operation. I think there is a very large conversation to be had about how requests for cancelation are signaled.

…

________________________________ From: Kirk Shoop ***@***.***> Sent: Saturday, October 1, 2022, 1:45 PM To: val-lang/val-lang.github.io ***@***.***> Cc: Sean Parent ***@***.***>; Mention ***@***.***> Subject: Re: [val-lang/val-lang.github.io] Error and cancellation handling (Discussion val-lang/val-lang.github.io#33) I am personally comfortable with the approach I suggested and do not think it is violating any fundamental principle. Interesting. I think I understand your logic. Here is the same logic applied to U. "I think it is expressive enough to implement the use cases [of set_value] that have been shown, based on the fact that I believe it is not only possible but actually easy to create a mechanism that filters any subset of , for any arbitrary predicate. I do not believe that any value in deserves special treatment, because the correct behavior of a program depends on any value that may be in this set." And resulting picture is: struct Receiver { template<class... Un> void set_some_word(Un&&...); }; Here is the same logic applied to "A function that accepts multiple parameter can be represented as a function that accepts a tuple". "I think it is expressive enough to implement the use cases [of multiple arguments] that have been shown, based on the fact that I believe it is not only possible but actually easy to create a mechanism that filters any subset of , for any arbitrary predicate. I do not believe that any value in deserves special treatment, because the correct behavior of a program depends on any value that may be in this set." The result is that functions in this language have exactly one argument value and one result value. — Reply to this email directly, view it on GitHub<https://github.com/val-lang/val-lang.github.io/discussions/33#discussioncomment-3779487>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AARMSLBTJJAVJK43W33YHALWBCPH3ANCNFSM6AAAAAAQSR5M3M>. You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

dabrahams · 2022-10-01T22:52:40Z

dabrahams
Oct 1, 2022
Maintainer

part of Kirk's argument is that the function result is not just propagating "was canceled" but also "cancel was requested"

Why is that an interesting distinction? Doesn't the former imply the latter?

0 replies

lucteo · 2022-10-02T07:43:15Z

lucteo
Oct 2, 2022
Collaborator

In P2300 model, there is no distinguish between the two cases. The receiver just gets a notification that the sender was cancelled, discarding information on how it was cancelled.

There are several main method in which a receiver gets a signal about cancellation:

the sender just sends the cancellation signal, as part of its core logic. Example: just_stopped()
the sender sends the cancellation signal because of of the input senders sends a cancellation signal. Example: then() forwards the signal
the sender receives a stop token, it checks it at different times, and decides to send the cancellation signal. This stop token can be received from the receiver itself, or by some other means.

In all cases, the receiver just knows that the sender has been cancelled, but it doesn't know how.

1 reply

kyouko-taiga Oct 2, 2022
Maintainer Author

I guess we could encode that information in a cancellation type.

sean-parent · 2022-10-11T08:44:03Z

sean-parent
Oct 11, 2022
Maintainer

Because the latter does not imply the former.

…

________________________________ From: Dave Abrahams ***@***.***> Sent: Saturday, October 1, 2022 3:52:50 PM To: val-lang/val-lang.github.io ***@***.***> Cc: Sean Parent ***@***.***>; Mention ***@***.***> Subject: Re: [val-lang/val-lang.github.io] Error and cancellation handling (Discussion val-lang/val-lang.github.io#33) part of Kirk's argument is that the function result is not just propagating "was canceled" but also "cancel was requested" Why is that an interesting distinction? Doesn't the former imply the latter? — Reply to this email directly, view it on GitHub<https://github.com/val-lang/val-lang.github.io/discussions/33#discussioncomment-3779717>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AARMSLEQFZFV3INI43VKKTDWBC6EFANCNFSM6AAAAAAQSR5M3M>. You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

Cpp-Lisa · 2022-10-15T23:42:39Z

Cpp-Lisa
Oct 15, 2022

(This is (more or less) email I sent to Dave. I'm posting it here with a warning: I'm not going to be very present for this conversation.)

Eric Niebler sent me a note saying that you’re considering cancellation in Val, hoping I could warn you away from having Val make the same mistake C++ currently does. Here are my thoughts; feel free to share them.

My position is that errors and cancellation are served by two different design patterns, which I call “failed strategy” and “serendipitous success,” respectively. While both design patterns involve what we may call “exceptional exit” from a function, C++ exceptions are suited to the failed strategy pattern, and not well suited to the serendipitous success pattern.

In the failed strategy pattern, a function determines that it is unable to meet its local goal, and exits (exceptionally) without meeting the goal. This causes a cascade of enclosing functions to also not meet their goals, and exit exceptionally. The cascade stops when it reaches the innermost function that has an alternate strategy for meeting its own goal.

(Sometimes we must phrase the goals carefully to see this pattern, using goals with an “or” in them. One function might have “Do X” as its goal, while an enclosing function might have a goal of “Do XYZ or announce a failure to do XYZ.”)

In the serendipitous success pattern, a function exits (exceptionally) when its local goal becomes moot due to the serendipitous success of some enclosing function’s goal. The local goal may still be achievable, but is no longer necessary to the already-successful enclosing function. This causes a cascade of enclosing functions to also exit, generally without meeting their goals. The cascade stops at the outermost function that has already met its goal.

(Again, we must phrase the goals carefully to see this pattern, sometimes using goals with an “or” in them. One function might have “Do X” as its goal, while an enclosing function might have a goal of “Do XYZ or receive permission to not do XYZ after all.”)

The principal differences between the two patterns are:

• Reasons for failure relate to the lowest-level goals, while reasons for cancellation stem from high level goals. This means that communication in these two patterns runs in different directions. Potential reasons for cancellation need to be communicated from high to low before a potential cancellation, and actual reasons for failure need to be communicated from low to high after failure.

• If two enclosing functions provide alternate strategies, the innermost one stops a failure cascade and enacts its alternate strategy. But if two enclosing functions have serendipitously satisfied goals, the cancellation cascade continues to the outermost satisfied function.

The two patterns also have an interaction: when a low-level function has failed and, despite that, a high-level function has serendipitously succeeded, we wish to exit to the successful high-level function, even if alternate strategies for responding to the failure are provided at intermediate levels.

The C++ exception model is designed around the failed strategy pattern: the decision to throw is made at a low level, reasons are communicated upward, and the exception is caught at the innermost handler. But catch(…), which is essential to the strategy pattern, gets in the way of cancellation. That is, “this intermediate function has another strategy to try” shouldn’t be taken as “the goals of this intermediate function outweigh those of its callers.”

In the “Cancellation is Serendipitous Success” paper, I laid out a scheme for high-level functions to express goals that may be tested by low-level functions at designated cancellation points. Briefly: a new kind of catch clause would specify an expression; during the execution of the try block, the expression would be evaluated at cancellation points; if the expression evaluates to true, the stack is unwound and execution proceeds with the corresponding handler. (There is no exception object in this sort of exceptional exit.)

I later came to think that I might be repeating a design mistake that went into the existing C++ exceptions: providing a complex mechanism without first providing the simple parts that make up that mechanism. Going back to the drawing board, I considered the possibility of building a less elegant library-based cancellation mechanism. One piece was missing: a destructor-respecting longjmp. I sketched out, but didn't finish, a paper on that idea; I'm including it here. Don't pay too much attention to the syntax; I'm sure a younger language could to better.
Throw Goto.pdf

(Looking over that draft, I am reminded that in C++ we have a third form of exceptional exit: the destruction of a suspended coroutine. This also must not be blocked by catch(…). I also note that most of the complexity of the paper came from preventing a jump into a suspended coroutine; that could be simplified by saying coroutines can’t yield from inside a try-block. Or a wacky alternative idea: one could design a mechanism for throwing into a suspended coroutine, reactivating it so that it can catch the exception.)

1 reply

dabrahams Oct 22, 2022
Maintainer

@Cpp-Lisa I want to thank you for sending along something very cogent, and for going through the hassle with GitHub to put it up here. As you know I didn't want to act as an intermediary for what should be a public conversation. These are the thoughts I had when you first wrote to me.

The biggest problems we’ve had with this ongoing argument is:

we acknowledge Kirk and Eric’s point of view as a logically valid view of the world.
Taking that view seems to result in a more complicated programming language/model than other equally valid—or let’s say “workable”—views.
Nobody’s illustrating advantages of this more complicated language/model that seem to outweigh the complexity cost. That is of course a judgement call that may never be fully resolved by technical analysis.

FWIW, my tolerance for complexity in a language design went way down in a few very short years when I stepped away from C++, and what I see of C++ today only validates that point of view: once you let complexity in, it has a tendency to grow without bound. Much of the complexity of today’s C++ is because of design decisions that I participated in, and I don't want to repeat that. One of the main reasons we believe in Val is that we think it is solving the same problems as Rust without introducing a complex mental model, and that’s a property we intend to guard very carefully. There may simply be a cultural gap here between what C++ programmers regularly tolerate and what I want for Val that I’m unable/unwilling to cross.

My position is that errors and cancellation are served by two different design patterns, which I call “failed strategy” and “serendipitous success,” respectively. While both design patterns involve what we may call “exceptional exit” from a function, C++ exceptions are suited to the failed strategy pattern, and not well suited to the serendipitous success pattern.

We’re certainly not going to have “C++ exceptions” in Val (there are design differences), so one question about your position is which elements of the design contribute to this differing (in your view) suitability .

Aside from that, I have no argument with anything in your post until you start to list “principal differences:”

Reasons for failure relate to the lowest-level goals, while reasons for cancellation stem from high level goals. This means that communication in these two patterns runs in different directions. Potential reasons for cancellation need to be communicated from high to low before cancellation, and actual reasons for failure need to be communicated from low to high after failure.

I can’t imagine a case where a low level operation needs to know the reason for cancellation. Do they exist?

I view this differently: a) the fact of cancellation needs to be communicated (though not necessarily always from high to low—I can see it traveling across concurrent threads of execution), and b) in both cases the reasons for what you call “exceptional exit” need to be communicated from low to high. This is an example of my #2 above, where recognizing commonality can lead to a simpler model.

Since you’re really focused on the semantic difference between these two scenarios, I should point out that “serendipitous success” poorly captures what’s going on except exactly at the level where cascading stops. If I have this parallel computation

So the result is whichever of w(x) or z(y(x)) finishes first, and if the winner is w(x) because y is still executing, y will be cancelled. That doesn’t represent the success of y(x) or of z(y(x)). It only represents a success of (w(x) | z(y(x))), where cascading stops. (And I'm not going to assume that w(x) and z(y(x)) are equivalent).

If “cancellation is serendipitous success” is your watchword and you aren’t constantly adding the qualification “where cascading stops,” you may be overlooking a broad area of commonality between cancellation and what you call “failure,” and it’s not surprising that you’d come to very different conclusions than I do.

The C++ exception model is designed around the failed strategy pattern: the decision to throw is made at a low level, reasons are communicated upward, and the exception is caught at the innermost handler. But catch(…), which is essential to the strategy pattern, gets in the way of cancellation. That is, “this intermediate function has another strategy to try” shouldn’t be taken as “the goals of this intermediate function outweigh those of its callers.”

That the decision to throw is made at a low level seems irrelevant, because the only viable model for cancellation in a system with mutation and side-effects is a cooperative one. One needs to be able to reason that certain calls do not “exit exceptionally” in order to maintain invariants. In a cooperative cancellation system using exceptions one “makes a decision to throw” by calling any throwing function.
“this intermediate function has another strategy to try” is almost never the meaning of catch(…) in practice in a well-written system, even if you include reporting errors to the user as a strategy for success (I don't). Good uses of catch(...) almost always do some cleanup and rethrow. I will grant that there's a lot of bad code out there trying to “get control” over exceptions by putting try { ... }catch(...) blocks around everything that throws, and that code is as likely to undesirably stop a cancellation exception as it is to drop an error report. Neither one of those is objectively more serious than the other, AFAICT.

mr-mobster · 2022-12-28T09:42:09Z

mr-mobster
Dec 28, 2022

I am wondering whether there has been some more internal discussion or initial modelling on this topic? I have been closely following this conversation and it was extremely educational for me. Would love to know if there is an update.

1 reply

kyouko-taiga Dec 28, 2022
Maintainer Author

There hasn't been much more internal discussion.

We currently opted for keeping the status quo (i.e., not introduce a cancellation mechanism separate from exception handling) and see where it leads, given that our design for concurrency has not been fully designed yet.

matklad · 2023-05-04T17:09:28Z

matklad
May 4, 2023

rust-analyzer (LSP for Rust) has both errors and cancellation, and might be a somewhat interesting case study. Cancellation is needed to abort in-progress type checking and such when the user modifies source files.

To do cancellation, we rely on dynamic semantics. Nothing in the function signature tells that it is cancelable. Similarly, nothing signals at the call site that cancellation is possible.

For error handling, we use "errors are values" semantics. Fallible functions return a Result<SuccessType, ErrorType>, calling a fallible function requires try operator.

This distinction is useful because we want to minimize the extent of code which can fail, and maximize the extent of code which can be cancelled. More or less everything can be cancelled, very few things can raise/need to handle errors. If a function can't be cancelled, you want to make it cancelable because otherwise you increase latency. If a function can fail, you want to find a way to to make it infallible (eg, by pushing IO to the caller), to simplify the code.

Practically, "errors are values" requires extra ceremony in high order code (eg, you need try_map rather than map), and dynamic cancellation works transparently with existing code. That is, both try_map(can_fail) and map(can_get_canceled) stop on the first "unusual" value, but the latter allows using the "usual" map, rather than the "map designed with fallibility in mind".

In terms of how these two are implemented, I don't think we care much. What happens is that cancellation is handled by stack unwinding via unwind tables, and errors are using normal returns. It seems plausible that using unwinding as an implementation strategy for value-based error semantics could actually be faster.

What helps though is that these use two distinct mechanisms, so code handling errors does not need to worry about accidentally catching cancellation and vice verse.

Which brings me to a third thing! Another case of roughly the same shape we have is panicking. As rust-analyzer is a long-running "desktop" application, we don't really want to outright crash the process if some stupid small feature somewhere has index out of bounds. So, for bugs like assertion failures, we want to abort coarse-grained feature, show a "send bug report" dialog to the user, and then move on.

Internally, panicking is implemented using the same unwinding mechanism as cancellation (with a difference between the two that panicking captures&resolves a backtrace, while cancellation doesn't). This does create a small problem that the code catching cancellations should explicitly pass through unwinds originating from panics.

All three mechanisms (cancellation, error handling, abandonment) lean heavily on "transactional" semantics of the underlying data store which doesn't allow data to get into a bad state. There is a thin slice of code which can mess up state, that bit isn't cancelable, can't fail, and an assertion failure there would crash the process.

0 replies

Error and cancellation handling #729

kyouko-taiga Sep 22, 2022 Maintainer

Replies: 0 comments · 61 replies

lucteo Sep 22, 2022 Collaborator

1. What cancellation means

2. Composability

3. Usability and significance for S/R

4. Exceptions assumptions

dabrahams Sep 25, 2022 Maintainer

dabrahams Sep 25, 2022 Maintainer

dabrahams Sep 25, 2022 Maintainer

Attempt 1

Discover concepts by studying Data Structures and Algorithms

Some issues with cancellation as an error

dabrahams Sep 25, 2022 Maintainer

jthread and fiber papers in C++

special handling

status quo

dabrahams Sep 26, 2022 Maintainer

dabrahams Sep 27, 2022 Maintainer

dabrahams Sep 28, 2022 Maintainer

dabrahams Sep 28, 2022 Maintainer

kyouko-taiga Oct 1, 2022 Maintainer Author

lucteo Oct 11, 2022 Collaborator

Exception handling

Handling unwinding

The main difference: function declaration

lucteo Oct 14, 2022 Collaborator

sean-parent Oct 1, 2022 Maintainer

dabrahams Oct 1, 2022 Maintainer

lucteo Oct 2, 2022 Collaborator

kyouko-taiga Oct 2, 2022 Maintainer Author

sean-parent Oct 11, 2022 Maintainer

dabrahams Oct 22, 2022 Maintainer

kyouko-taiga Dec 28, 2022 Maintainer Author

kyouko-taiga
Sep 22, 2022
Maintainer

Replies: 0 comments 61 replies

lucteo
Sep 22, 2022
Collaborator

dabrahams Sep 25, 2022
Maintainer

dabrahams Sep 25, 2022
Maintainer

dabrahams Sep 25, 2022
Maintainer

dabrahams
Sep 25, 2022
Maintainer

dabrahams
Sep 26, 2022
Maintainer

dabrahams Sep 27, 2022
Maintainer

dabrahams Sep 28, 2022
Maintainer

dabrahams Sep 28, 2022
Maintainer

kyouko-taiga
Oct 1, 2022
Maintainer Author

lucteo Oct 11, 2022
Collaborator

lucteo Oct 14, 2022
Collaborator

sean-parent
Oct 1, 2022
Maintainer

dabrahams
Oct 1, 2022
Maintainer

lucteo
Oct 2, 2022
Collaborator

kyouko-taiga Oct 2, 2022
Maintainer Author

sean-parent
Oct 11, 2022
Maintainer

dabrahams Oct 22, 2022
Maintainer

kyouko-taiga Dec 28, 2022
Maintainer Author