Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

julep: "chain of custody" error handling #7026

Open
StefanKarpinski opened this issue May 29, 2014 · 94 comments
Open

julep: "chain of custody" error handling #7026

StefanKarpinski opened this issue May 29, 2014 · 94 comments
Labels
error handling Handling of exceptions by Julia or the user julep Julia Enhancement Proposal
Milestone

Comments

@StefanKarpinski
Copy link
Member

A of the major issues with try/catch is that it's possible to catch the wrong error and think that you are handling a mundane, expected problem, when in fact something far worse is wrong and the problem is an unexpected one that should terminate your program. Partly for this reason, Julia discourages the use of try/catch and tries to provide APIs where you can check for situations that may cause problems without actually raising an exception. That way try/catch is really left primarily as a way to trap all errors and make sure your program continues as best it can.

This is also why we have not introduced any sort of typed catch clause into the language. This proposal introduces typed catch clauses, but they only catch errors of that type under certain circumstances in a way that's designed to avoid the problem of accidentally catching errors from an unexpected location that really should terminate the program. Here's what a Java-inspired typed catch clause might look like in Julia:

try
    # nonsense
    foo(1,2,3) # throws BazError
    # shenanigans
catch err::BazError
    # deal with BazError only
end

Of course, we don't have this in Julia, but it can be simulated like this:

try
    # nonsense
    foo(1,2,3) # throws BazError
    # shenanigans
catch err
    isa(err, BazError) || rethrow(err)
    # deal with BazError only
end

Although this does limit the kinds of errors one might catch, it fundamentally doesn't solve the problem, which is that while there are some places where you may be expecting a BazError to be thrown, other code that you're calling inside the try block might also throw a BazError where you didn't expect it, in which case you shouldn't try to handle because you'll just be covering up a real, unexpected problem. This problem is especially bad in the generic programming contexts where you don't really know what kind of code a given function call might end up running.

To address this, I'm proposing adding a hypothetical throws keyword that allows you to annotate function calls with a type that they're expected to throw and which you can catch with a type catch clause (it's a little more subtle than this, but bear with me). The above example would be written like this:

try
    # nonsense
    foo(1,2,3) throws BazError
    # shenanigans
catch err::BazError
    # deal with *expected* BazError only
end

The only difference is that the throws BazError comment after the call to foo became syntax. So the question is what does this annotation do? The answer is that without that annotation indicating that you expect the expression to throw an error of type BazError, you can't catch it with a typed catch. In the original version where the throws BazError was just a comment, the typed catch block would not catch a BazError thrown by the call to foo – because the lack of annotation implies that such an error is unxepected.

There is a bit more, however: the foo function also has to annotate the point where the BazError might come from. So, let's say you had these two definitions:

Consider something like this:

function foo1(x,y,z)
    bar(x,2y) throws BazError
end

function foo2(x,y,z)
    bar(x,2y)
end

function bar(a,b)
    throw BazError()
end

This also introduce a hypothetical keyword form of throw – more on that below. These definitions result in the following behavior:

try
    # nonsense
    foo1(1,2,3) throws BazError
    # shenanigans
catch err::BazError
    # error from bar is caught
end

try
    # nonsense
    foo2(1,2,3) throws BazError
    # shenanigans
catch err::BazError
    # error from bar is NOT caught
end

The rule is that a typed catch only catches the an error if every single call site from the current function down to the one that actually throws the error is annotated with throws ErrorType where the actual thrown error object is an instance of ErrorType. An untyped catch still catches all errors, leaving the existing behavior unchanged:

try foo1(1,2,3)
catch
    # error is caught
end

try foo2(1,2,3)
catch
    # error is caught
end

So what is the rationale behind this proposal and why is it any better than just having typed catch clauses? The key point is that in order to do a typed catch, there has to be a "chain of custody" from the throw site all the way up to the catch site, and each step has to expect getting the kind of type that's thrown. There are two ways to catch the wrong error:

  1. catch an error that was not the type of error you were expecting
  2. catch an error that came from a place you weren't expecting it to come from

The first kind of mistake is prevented by giving an error type, while the second kind of mistake is prevented by the "chain of custody" between the throw site and the catch site.

Another way of thinking about this is that the throws ErrorType annotations are a way of making what exceptions a function throws part of its official type behavior. This is akin to how Java puts throws ErrorType after the signature of the function. But in Java it's a usability disaster because you are required to have the throws annotation as soon as you do something that could throw some kind of non-runtime exception. In practice, this is so annoying that it's pretty common to just raise RuntimeErrors instead of changing all the type signatures in your whole system. Instead of making runtime excpetion vs. non-runtime exception a property of the error types as Java does, this proposal makes it a property of how the error occurs. If an error is occurs in an expected way, then it's a non-runtime exception and you can catch it with a typed catch clause. If an error occurs in an unexpected way, then it's a runtime exception and you can only catch it with an untyped catch clause. You can only catch errors by type if they are part of the "official behavior" of the function you're calling, where official behavior is indicated with throws ErrorType annotations.

One detail of this proposal that I haven't mentioned yet is that throw would become a keyword instead of just a function. The reason for this is that writing

throw(BazError()) throws BazError

seems awfully verbose and redundant. For compatibility, we could allow throw() to still invoke the function throw while throw ErrorType(args...) would be translated to

throw(ErrorType(args...)) throws ErrorType

This brings up another issue – what scope should the right-hand-side of throws be evaluated in? In one sense, it really only makes sense to evaluate it in the same scope that the function signature is evalutated in. However, this is likely to be confusing since all other expressions inside of function bodies are evaluated in the local scope of the function. It would be feasible to do this too, and just rely on the fact that most of the time it will be possible to statically evaluate what type expression produces. Of course, when that isn't possible, still be necessary to emit code that will do the right thing depending on the runtime value of the expression. If the r-h-s is evaluated in local scope, then we can say that throws expr is equivalent to this:

throw(expr) throws typeof(expr)

Usually it will be possible to staticly determine what typeof(expr) is, and it is a simpler approach to explain than restricting the syntax of the expression after the throw keyword.

Note that under this proposal, these are not the same:

try
    # stuff
catch
    # catches any error
end

try
    # stuff
catch ::Exception
    # only catches "official" errors
end

Also note that this proposal is, other than syntax additions that are not likely to cause big problems, completely backwards compatible: existing untyped catch clauses continue to work the way they used to – they catch everything. It would only be new typed catch clauses that only catch exceptions that have an unbroken "chain of custody".

@tknopp
Copy link
Contributor

tknopp commented May 29, 2014

+1 for the syntax addition.

I have to admit that I am not sold on the idea of expected and unexpected exceptions. I would find it rather "unexpected" if I indicate to catch something that is then not catched ;-)

For me the real issue in such situations i that the exception hierarchy is not well designed and/or that the wrong exception is used from the exception hierarchy.

@StefanKarpinski
Copy link
Member Author

The problem with typed exceptions is that they give you a sense of precision, but they're not really precise at all. Making the exception part of the type of a function is a good idea – that part Java got right. What's not so good is forcing the caller to handle every kind of exception a function it uses can throw.

Here's a motivating scenario: let's say you implement a new kind of AbstractVector – say a kind of sparse vector or something – let's call it SpArray. Internally, it stores values in normal Arrays. Someone is using this and finds that sometimes they end up making indexing errors – they use a typed catch block to handle this. But you've made a programming error in implementing the SpArray type and it's actually SpArray that's encountering the indexing error internally because of a mistake. But this is caught and treated as if was a user-level indexing error. With this proposal, that situation won't occur because at the point where the SpArray implementation is incorrect and causes an indexing error, there is no throws BoundsError annotations so the resulting exception can't be caught with a typed catch clause.

@tknopp
Copy link
Contributor

tknopp commented May 29, 2014

I absolutely see your point that it can lead to hard to find bugs if one try/catches too generic exceptions and I have been running into this myself in C++ a lot. ("Oh this throws exceptions all the time. Put try { ... } catch(...) {}around it and problem solved").

To play around with your example maybe this should be handled by throwing SpArrayBoundsError in SpArray and ArrayBoundsError in Array. Of course both deriving from BoundsError. In this way one can catch the SpArray specific bounds errors and let all other BoundsError pass through.

@StefanKarpinski
Copy link
Member Author

To play around with your example maybe this should be handled by throwing SpArrayBoundsError in SpArray and ArrayBoundsError in Array. Of course both deriving from BoundsError. In this way one can catch the SpArray specific bounds errors and let all other BoundsError pass through.

That's a common solution offered in languages with exception hierarchies and typed catch, but I don't buy it. The logical extreme of that approach is to have an error type for almost every single location where an exception can be thrown. At that point, what you're really doing is labeling individual exception locations and using labels to catch things – albeit labels that are carefully arranged into a hierarchy. That need for a carefully arranged hierarchy to make the approach tenable is also fishy – and easily abused. I came up with this approach by thinking about how to compromise between a broad classification scheme with a set of standard exception types and individual labels for exception locations.

@tknopp
Copy link
Contributor

tknopp commented May 29, 2014

Yes you are absolutely right. Exception hierarchies have the potential to get very specific up to the point that specific lines get specific error types. But as one subtypes from broader exception types I don't see the issue with that. But maybe I just have bad taste to not think this is fishy :-)
So please lets look what others think about this.

@JeffBezanson
Copy link
Member

Talking about this as "labeling locations" gets me thinking: would it be possible just to label the locations? Every exception could consist of an exception object and a label (symbol). The SpArray library could throw BoundsErrors, labeling them as :SpBoundsError. You could catch by type and/or label.

@aviks
Copy link
Member

aviks commented May 29, 2014

You're right that using typed exceptions correctly in Java is annoying. Everyone throws RuntimeException subclases, and I've trained myself to see the Exception type as primarily informational, useful only for logging.

But my fear is, annotating exceptional (?) call sites all the way down is going to turn out to be equally tedious. Is this something that is easy enough to reason about and implement, so that it will be widely practiced?

@StefanKarpinski
Copy link
Member Author

@JeffBezanson – I thought about just labeling locations, but it seems too granular. Sometimes different locations throw the same error, no? Also, how is that different from having an exception type for every location? I guess you wouldn't have to declare the types, but you also wouldn't get any type hierarchy. How would labels be scoped? By module?

@JeffBezanson
Copy link
Member

Yes, you'd probably want some kind of scoping, which remains to be worked out. But different locations could in fact throw the same error; in my overly-simple formulation they could just use the same symbol.

The idea is that exception and origin are orthogonal: what and where-from. "What" is the exception type hierarchy, which describes problems, and "where-from" is the locations, which might follow the module hierarchy, or have no hierarchy.

@StefanKarpinski
Copy link
Member Author

@aviks – I think this should actually be pretty rare. Having a "throws" annotation is essentially saying "throwing this error is part of the official API of this function". Currently we basically consider all exceptions to be runtime errors, so anything is real problem and not part of the functions official behavior. That would continue to be the default. Only when you decide that something like BoundsErrors or KeyError is part of the official interface of AbstractArray or Associative would you put throws BoundsErrors and throws KeyError annotations on function calls. And hopefully you wouldn't need very many of them. I should probably try out adding these annotations for something like that and see how it goes.

@tknopp
Copy link
Contributor

tknopp commented May 29, 2014

The labeling idea is interesting. Still it increases the dimensionality of the dispatch mechanism from a one-dimensional to a two-dimensional thing.

I am all for putting more context into exceptions. In C# for instance one usually also gets the line numbers where the exception was thrown. Further when debugging exceptions I found it very useful to have exception chains, i.e. when an exception is rethrown (as a different exception type) the original exception is put as an "inner exception" to the outer exception.

My own experience is that in small projects and research code exceptions are mainly an error mechanism for debugging "not yet correct" code. In these situations one will hardly use try/catch and reason about what exception to catch. The intensive use of the error function in Julia base is an indicator that this is what most people use exceptions for. The importance for exceptions increased a lot for me when working on large projects and projects where others are the users (i.e. production code). In these cases one has to prevent by all means that exceptions remain uncatched and thus has to maintain a sensible exception hierarchy and use try/catch at various locations. But this is also usually nothing where the system exception hierarchy is used. Instead I want to throw TheUSBCableIsUnplugged exceptions that are carefully catches and translated to ThePrinterIsNotAvailable and so on. And for this purpose the Julia exception system is already working quite well.

@samoconnor
Copy link
Contributor

I am in the process of porting my Tcl AWS library to Julia as a way to learn Julia, (and so that I can then experiment with implementing projects reliant on AWS in Julia).

In a first-pass of the Julia docs, I saw try/catch/finally and made a mental check "that is supported".
When it comes to writing actual code I am surprised that try/catch doesn't do what I expect. Most Julia features have either made me think "nice, that's how it should be done", or "that's weird" followed by some reading followed by "that's nice". The design of Julia's try/catch seems to be that it is deliberately somewhat crippled to try to wean people off using it. This smells wrong to me.

In distributed, networked systems individual nodes and connections can fail there are propagation delays, data models are eventually-consistent. Clear, robust, exception handling is a must. I want to write code that assumes everything is reliable, immediately globally updated etc and deal with the exceptional glitches in one place in high-level retry or conflict resolution loops.

I am not a fan of type-based exception handling, having had the misfortune of working with Java etc. Tcl has no types, so exceptions are trapped based by matching an error-code prefix. An error-code in Tcl is just a list.

e.g.

proc get {url} {
    ...
    throw {HTTP 404}  "Error: $url Not Found"
    ...
    throw [list HTTP 300 $location] "Error: $url has moved to $location"
    ...
}

try {

   get $url

} trap {HTTP 404} {} {
    puts "get can't find $url"
} trap {HTTP 300} {message info} {
    set location [lindex $info 2]
    get $location
}

# Ignore missing dict key...
proc lazy_get {dict key} {
    try {

        dict get $mydict "key"

    } trap {TCL LOOKUP DICT} {} {}
}

# Catch exit status of shell command...
retry count 2 {

     exec ./create_queue.tcl "foobar"

 } trap EX_TEMPFAIL {} {
    after 60000
}

Every error (exception) in a Tcl program is supposed to have a human-readable error message and a machine-readable errorcode.

The source of an error can include as much or as little machine-readable info as it likes.

Safe exception handling is achieved by trapping only sufficiently specific errorcodes. Of course you can shoot yourself in the foot by writing "trap {}" if you want to...

http://www.tcl.tk/man/tcl8.6/TclCmd/try.htm
http://www.tcl.tk/man/tcl8.6/TclCmd/throw.htm

I think using types to match exceptions isn't great. But since Julia matches methods by type as a core feature, perhaps it makes sense for Julia. I would prefer something like passing a Dict() to throw and having a convenient syntax for catching only exceptions that match a dictionary pattern.

The key thing is readability of the try/catch code.

Below are some fragments of Tcl code that I am face with porting to Julia.
I am open to the idea that there is just a better way to do these things in Julia. But on the other hand, I think these are reasonable uses of an exception mechanism...

try {

    set web_user [create_aws_iam_user $aws $region-web-user]

} trap EntityAlreadyExists {} {
    delete_aws_iam_user_credentials $aws $region-web-user
    set web_user [create_aws_iam_user_credentials $aws $region-web-user]
}
    retry count 4 {

        set res [aws_sqs $aws CreateQueue QueueName $name {*}$attributes]

    } trap QueueAlreadyExists {} {

        delete_aws_sqs_queue [aws_sqs_queue $aws $name]

    } trap AWS.SimpleQueueService.QueueDeletedRecently {} {

        puts "Waiting 1 minute to re-create SQS Queue \"$name\"..."
        after 60000
    }
proc fetch {bucket key {byte_count {}}} {

    if {$byte_count ne {}} {
        set byte_count [list Range bytes=0-$byte_count]
    }
    try {

        s3 get $::aws $bucket $key {*}$byte_count

    } trap NoSuchKey {} {
    } trap AccessDenied {} {}
}
proc aws_ec2_associate_address {ec2 elastic_ip} {

    puts "Assigning Elastic IP: $elastic_ip"
    retry count 3 {

        aws_ec2 $ec2 AssociateAddress InstanceId [get $ec2 id] \
                                      PublicIp $elastic_ip

    } trap InvalidInstanceID {} {
        after [expr {$count * $count * 1000}]
    }
}
proc create_aws_s3_bucket {aws bucket} {

    try {

        aws_rest $aws PUT $bucket Content-Type text/plain Content [subst {
                <CreateBucketConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
                  <LocationConstraint>[aws_bucket_region $bucket]</LocationConstraint>
                </CreateBucketConfiguration>
        }]

    } trap BucketAlreadyOwnedByYou {} {}
}
proc delete_aws_sqs_queue {queue} {

    try {

        aws_sqs $queue DeleteQueue

    } trap AWS.SimpleQueueService.NonExistentQueue {} {}
}
proc aws_s3_exists {aws bucket path} {

    try {

        aws_s3_get $aws $bucket $path Range bytes=0-0
        return 1

    } trap NoSuchKey {} {
    } trap AccessDenied {} {}
    return 0
}
    retry count 3 {

        return [$command $aws {*}$args]

    } trap {TCL LOOKUP DICT AWSAccessKeyId} {} {

        set aws [get_aws_ec2_instance_credentials $aws]

    } trap ExpiredToken {message info} {

        if {![exists ::oc_aws_ec2_instance_credentials]} {
            return -options $info $message
        }

        puts "Refreshing EC2 Instance Credentials..."
        set aws [get_aws_ec2_instance_credentials $aws -force-refresh]
    }
    try {

        aws_iam $aws DeleteInstanceProfile InstanceProfileName $name

    } trap NoSuchEntity {} {}

    set response [aws_iam $aws CreateInstanceProfile \
                               InstanceProfileName $name \
                               Path $path]

@hayd
Copy link
Member

hayd commented Jul 30, 2014

other code that you're calling inside the try block might also throw a BazError where you didn't expect it

This seems like an edge case, if you keep the try block small (e.g. just the line you're expecting a failure) it shouldn't be the case... if it's critically different why not subclass BazError?

+1, a syntax for typed exceptions would be great. IMO isa(err, BazError) ... is quite messy. I like:

catch err::BazError
    # deal with *expected* BazError only
catch err::FooError
    # deal with *expected* FooError only
end

@elextr
Copy link

elextr commented Jul 30, 2014

if you keep the try block small (e.g. just the line you're expecting a failure) it shouldn't be the case...

The syntax proposed by @StefanKarpinski is really just a more compact syntactic sugar for a try block around the important line (or is it intended to apply to an expression?) which passes the exception to the outer catch.

if it's critically different why not subclass BazError?

You can't when its thrown by a library you called, not your code.

The concept of identifying several small regions where "it is understood that this may throw xxxerror, and I'm prepared to handle it in a common handler" seems a good idea, but as currently proposed this is likely to surprise those coming from other languages (who don't RTFM because they expect all languages to be the same :)

This also doesn't provide for one of the traditional use-cases where you want to catch some types of errors no matter where they occur, but let the others go:

 try
    a big fast but flakey algorithm
 catch err::matherror # oh no underflowed, overflowed or divided by zero
    alternative slow algorithm
 end

@StefanKarpinski
Copy link
Member Author

@hayd: This seems like an edge case, if you keep the try block small (e.g. just the line you're expecting a failure) it shouldn't be the case... if it's critically different why not subclass BazError?

Even if the try block only calls a single function, that function can do absolutely anything and throw any kind of exception for many different reasons. In generic code this is particularly bad since you don't really know which implementation of the function you're calling is going to be invoked or how it's implemented. Of course, you shouldn't have to know this to use it correctly – and that's precisely the problem with type-based exception handling: there's massive abstraction leakage between the implementation of the function being trapped and how to trap it. Let's say you're doing try f() catch err::Foo ... end and you happen to know that f only throws the Foo error under certain circumstances and you want to handle that situation. Great. But now, let's say someone changes the implementation of f – or of any function that f calls. The change seems innocuous to them, but they happen to call a function – probably unwittingly – that can throw a Foo error under some obscure circumstances. You will now be trapping this situation even though that's not what you meant to do at all. Their seemingly harmless change has now made your code incorrect.

You could introduce new error types, but that's just completely annoying and impractical. Let's say I add a new subtype of AbstractArray. Since it's array-like, you want to throw IndexError when someone indexes into it wrong. Under the hood, however, it uses normal arrays and indexes into them, which might itself cause an IndexError. In order to distinguish the former condition from the latter, your type needs to introduce a new subtype of AbstractIndexError. This means that every standard error in Base needs to be split into an abstract type and a concrete type per implementation of that type: AbstractIndexError, ArrayIndexError, SparseIndexError, etc. And if you implement a new subtype of AbstractArray, you need your own subtype of AbstractIndexError in order to really correctly implement it. This is almost absurdly unwieldy.

@StefanKarpinski
Copy link
Member Author

@samoconnor – I really don't think try/catch is a good mechanism for handling that kind of networking timeout situation. I'm not sure what a better mechanism is, but this doesn't really feel right. In particular, you probably want to be able to retry things, which try/catch does not handle.

@jakebolewski
Copy link
Member

A networking timeout situation is usually handled by a state machine. We have gotos with labels now which makes writing state machines easier.

@samoconnor
Copy link
Contributor

@StefanKarpinski - Re: "retry", note that several of my examples above use a "retry" variant of "try". Viral Shah said in a separate conversation "The short answer to your question is yes, it is easy to do retry with Julia’s macros". So I'm assuming that "retry" can be made to work as a seamless "try" variant in Julia.

Re: "network timeouts", that is not really what my examples above address. I'm dealing with timeouts like this:

    for wait_time in {0.05, 0.5, 5, 0}

        try

            response = process_response(open_stream(uri, request))

            if (response.finished
            &&  response.status >= 200 && response.status < 500)
                return response
            end

        catch e
            if wait_time == 0 || !isa(e, UVError)
                rethrow(e)
            end
        end

        sleep(wait_time)
    end

I have no choice here but to use try/catch because the underlying sockets library throws UVError.
The retry loop is safe against catching some unexpected variant of UVError, because after a few attempts it throws the error up anyway.

It might be nice to simplify the catch code a little

catch e::UVError
    wait_time > 0 || rethrow(e)

or maybe

catch e if isa(UVError, e)
    wait_time > 0 || rethrow(e)

... the "if" syntax would the allow:

catch e if isa(UVError, e) && uverrorname(e) == "EAI_NONAME"

Re: typing of exceptions, note that in my examples above there is no mention of trapping timeout exemptions. The things I'm trapping are all well-defined and precise. e.g. EntityAlreadyExists, AWS.SimpleQueueService.QueueDeletedRecently, AccessDenied, ExpiredToken, {TCL LOOKUP DICT AWSAccessKeyId}, BucketAlreadyOwnedByYou.

While it is true that there is code out there that makes a mess of exception meta-data (I've written some android code) there are plenty of APIs, e.g. AWS, that have robust exception identification.

It is essential to have an out-of-band, stack-frame-jumping, exception mechanism in complex code that sits on top of imperfect web services. The alternative is that all of the code is dominated by deciding what error information needs to be passed back up the stack and how far.

As an example of converting HTTP exceptions, In my Julia AWS library I currently do this:

        # Handle other responses as error...
        err = TaggedException({
            "verb"      => r.verb,
            "url"       => r.url,
            "code"      => string("HTTP ", response.status),
            "message"   => response.data
        })

        # Look for JSON error message...
        if response.headers["Content-Type"] == "application/x-amz-json-1.0"
            json = JSON.parse(response.data)
            if haskey(json, "__type")
                err.tags["code"] = split(json["__type"], "#")[2]
            end
        end

        # Look for XML error message...
        if response.headers["Content-Type"] in {"application/xml", "text/xml"}
            xml = parse_xml(response.data)
            if (v = get(xml, "Code")) != nothing
                err.tags["code"] = v
            end
            if (v = get(xml, "Message")) != nothing
                err.tags["message"] = v
            end
        end

and

catch e
    if isa(TaggedExeption,e) && e["code"] == "BucketAlreadyOwnedByYou"
        ...
    else
        rethrow(e)
    end
end

I think I'm too new to Julia to contribute great solutions, but I hope that my real-world examples help in some way...

@samoconnor
Copy link
Contributor

Following up on: "In particular, you probably want to be able to retry things, which try/catch does not handle."

I've managed to figure out enough macro-fu to implement an exception handling retry loop.

On the first n-1 attempts, the catch block has the opportunity to "@Retry".

If the catch block does not call @Retry, the error is rethrown automatically.

On the nth attempt the try block is executed naked (without a catch block) so errors are passed up.

Example follows...

(please let me know if my posts on this issue are too off-topic, and/or where a better place to post would be.)

# Try up to 4 times to get a token.
# Retry when we get a "Busy" or "Throttle" exception.
# No retry or catch if we get a "Crash" exception (or no exception).

@with_retry_limit 4 try

    println("Trying to get token...")
    println("Token: " * get_token())

catch e

    if e.msg == "Busy"

        println("Got " * string(e) * ", try again...\n")
        @retry

    elseif e.msg == "Throttle"

        println("Backing off...\n")
        sleep(1 + rand())
        @retry
    end
end



# Unreliable operation simulator.
# Often busy, sometimes asks us to back off, sometimes crashes.

function get_token()

    if rand() > 0.2
        error("Busy")
    end

    if rand() > 0.8
        error("Throttle")
    end

    if rand() > 0.9
        error("Crash")
    end

    return "12345"
end


# Implementation of retry

macro with_retry_limit(max::Integer, try_expr::Expr)

    @assert string(try_expr.head) == "try"

    # Split try_expr into component parts...
    (try_block, exception, catch_block) = try_expr.args

    # Insert a rethrow() at the end of the catch_block...
    push!(catch_block.args, :(rethrow($exception)))

    # Build retry expression...
    retry_expr = quote

        # Loop one less than "max" times...
        for i in [1 : $max - 1]

            # Execute the "try_expr".
            # It can do "continue" if it wants to retry...
            $(esc(try_expr))

            # Only get to here if "try_expr" executed cleanly...
            return
        end

        # On the last of "max" attempts, execute the "try_block" naked 
        # so that exceptions get thrown up the stack...
        $(esc(try_block))
    end
end



# Conveniance "@retry" keyword...

macro retry() :(continue) end

@samoconnor
Copy link
Contributor

Here is an attempt to avoid having to put rethrow(e) at the end of a catch block. This is intended both as a way to remove a bit of noise from the code, and as a safety net to avoid accidentally catching unintended exceptions.

I think the thing that made me most scared about Julia's "catch" at first was that everything is caught by default, then you have to be really careful to rethrow() the right exceptions.

#!/Applications/Julia-0.3.0-rc1-a327b47bbf.app/Contents/Resources/julia/bin/julia


# Re-write "try_expr" to provide an automatic rethrow() safety net.
# The catch block can suppress the rethrow() by doing "e = nothing".

macro safetynet(try_expr::Expr)

    @assert string(try_expr.head) == "try"

    (try_block, exception, catch_block) = try_expr.args

    push!(catch_block.args, :(isa($exception, Exception) && rethrow($exception)))

    return try_expr
end


@safetynet try

    d = {"Foo" => "Bar"}

    println("Foo" * d["Foo"])
    println("Foo" * d["Bar"])

    @assert false # Not reached.

catch e

    if isa(e, KeyError) && e.key == "Bar"
        println("ignoring: " * string(e))
        e = nothing
    end
end

println("Done!")

Question about macros, is there any way I can omit the "try" keyword and have the macro insert it for me? e.g.

@safetry
    ...
catch
    ...
end

I can't figure out if this is possible with Julia's macro system.

@MikeInnes
Copy link
Member

@StefanKarpinski Just for clarification, I take it that the "chain of custody" works on a per-method basis? So to modify your original example:

function test(x)
  try
    # tomfoolery
    foo(x) throws BazError
    # highjinks
  catch e::BazError
    # deal with BazError
  end
end

function foo(x::Number)
  bar(x) throws BazError
end

function foo(x::String)
  bar(x)
end

function bar(a)
  throw BazError()
end

test(1) catches the error, but test("foo") doesn't?

@MikeInnes
Copy link
Member

Also, there's at least one place this would be useful in Base, in the display code. Checking the function that threw the exception improves the situation but is still pretty brittle, and it sounds like this proposal would solve that problem.

@StefanKarpinski
Copy link
Member Author

@one-more-minute, yes, that's how I think it should work. In other words even though foo has BazError as part of its overall interface, a BazError raised by foo(::String) via bar would be considered accidental since it's not annotated. I agree with the sentiment that this is "an edge case", and indeed many languages have gotten by without anything like this. However, the difference between handling edges cases correctly and not quite correctly is a fairly major one.

Also, +1 for "tomfoolery" and "hijinks" :-)

@StefanKarpinski
Copy link
Member Author

@samoconnor – I think that handling this kind of complex I/O error is a really important problem but I also think it's beyond the scope of this issue, which is pretty focused: it solves just the problem of trapping the right errors and not accidentally trapping other ones.

@StefanKarpinski
Copy link
Member Author

@JeffBezanson, so here's what I think is wrong with the type and/or label idea.

Let's say you have code that expects an AbstractArray. Part of the AbstractArray interface is that indexing can throw a bounds error like so:

julia> a = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> a[4]
ERROR: BoundsError()
 in anonymous at ./inference.jl:365 (repeats 2 times)

Your write some code that expects this and handles it somehow:

function frizz(a::AbstractArrary)
  try a[4]
  catch err::BoundsError
    # handle a not being long enough
  end
end

If someone calls frizz(::Array) assuming no other frizz methods, and a BoundsError is thrown bygetindex(::Array, i::Int)` it should clearly be caught here.

Now, let's say someone else comes along and implements a new kind of AbstractArray, XArray and as is quite common, they use and existing kind of AbtsractArray to do it – let's say they use an Array instance. Now further suppose that there's some kind of situation where the inner Array type can cause a BoundsError which doesn't actually correspond to BoundsError for XArray – it's a programming error, or just indicates some other kind of problem with the usage of XArray. This is an entirely plausible situation that actually happens quite a bit in the sparse array implementation.

The key observation is that you can have two situations with the exact same throw and catch sites: one where error should be caught and the other where the error shouldn't be caught. The type and/or label idea can't solve this problem since it still only depends on the throw and catch sites – one error is caught if and only if the other one is. To actually solve the problem, whether an error is caught must depend, not only on the throw and catch sites, but also on the stack between them. This is precisely what the chain of custody idea does: it makes catching an error depend on the stack between the thrower and the catcher. The exact details of how to express that are up for debate, but I think this argument makes it clear that we either decide we don't care about this problem, or we need a solution where catching an error depends on the stack that comes between thrower and the catcher.

@JeffBezanson
Copy link
Member

Ok, well I'm not running right out to implement the label idea :)

The chain of custody is a really clever idea, but I feel it has two fatal issues: (1) It's not clear that it can be implemented efficiently. A naive implementation would probably slow down the whole language. (2) I believe it will strike most people as un-julian. Even though it is better than java's mechanism and not quite as verbose, I suspect most people will still find it fussy.

@StefanKarpinski
Copy link
Member Author

That's why I left "decide we don't care about this problem" as one of our possible courses of action.

@StefanKarpinski
Copy link
Member Author

Keep in mind that this is completely backwards compatible: no existing code would behave any differently with this proposal. So it's hard to argue that it's somehow more annoying that what we have currently. It is possible to argue that it's more annoying than having typed catch clauses would be, but we've been getting by without those, so that's also hard to see as really a huge problem. It's possible that more of the chain could be implicit, but I rather suspect that these chains will typically be very short – usually just one or two calls deep. Deeper errors are certainly possible, but they are usually of the "catch anything and try to recover" variety, not the "catch a very specific, expected error in this specific method call" variety – which by its very nature would tend not to be a very deeply nested error.

@c42f
Copy link
Member

c42f commented Sep 19, 2019

With that idea, you'd probably want some nice syntactic sugar to cover a few cases similar to what Swift does:

  • Catch a given error type and return nothing / substitute with the result of some code (Swift try?)
  • Catch a given error type and convert to a fatal exception, if we want that to be a thing (Swift's try!)

@tkf
Copy link
Member

tkf commented Sep 19, 2019

What if a key goal of "chain of custody" was that it sidesteps the usual unwinding mechanism completely, effectively turning exceptions into error objects?

Sounds like a great feature to have. I guess it fits well with how other things work Julia; start with somewhat slow but correct and simple implementation and then add annotations to make things robust and fast (e.g., @inbounds/@propagate_inbounds).

@c42f
Copy link
Member

c42f commented Sep 19, 2019

Yes I think it would fit nicely. But it's not exactly about making the slow->fast / safe->unsafe tradeoff. With this approach we may get both correctness and speed:

  • Correctness because only a named set of errors would be caught this way (the others propagate as normal)
  • Speed, because exception matching happens at the throw site and can thus avoid some of the costly and non-optional runtime work like collecting backtraces.

@tkf
Copy link
Member

tkf commented Sep 19, 2019

Right, it's more like speed-usability tradeoff rather than speed-safety tradeoff. I still think this is a tradeoff since we loose exactly where the exception is thrown with this throw-by-return approach.

@c42f
Copy link
Member

c42f commented Sep 19, 2019

Fair enough. But I think it's a great tradeoff to make because in the case where you have the chain of custody in place you already know exactly where to expect the error from and you've got some cleanup code ready to go — there's very little reason to want a backtrace in that case. If you do want the backtrace you can use a normal try-catch without the chain of custody.

In this scheme it becomes clearer that certain errors really shouldn't be catchable. For example StackOverflowError is triggered internally in a signal handler and cannot take part in this mechanism. But that's ok! Nobody should be expecting a stack overflow; that's a sure sign of a bug and not something that should be papered-over.

@tkf
Copy link
Member

tkf commented Sep 19, 2019

in the case where you have the chain of custody in place you already know exactly where to expect the error from

I thought the chain of custody is a tree, rather than a linear chain. For example, consider:

foo1() = rand() > 0.5 ? (throw BazError) : 1
foo2() = rand() > 0.5 ? (throw BazError) : 2

function bar()
    x = foo1() throws BazError
    y = foo2() throws BazError
    return x + y
end

bar()

Then I suppose it is impossible to know if it is foo1 or foo2 that threw the error?

@c42f
Copy link
Member

c42f commented Sep 19, 2019

Agreed, it's a tree, and you won't get perfect information about how the error was thrown internally within bar(). But this is very similar to return codes and I wouldn't say it's considered a disadvantage in that case. Arguably it's an advantage which makes the scheme more composable because the callers of bar() don't get to peer inside the implementation.

Another thought about that: if callers do want to distinguish between a failure in foo1 vs foo2, there must be something semantically different about those failures, in which case a different exception type may be warranted. Alternatively we could possibly consider matching on exception values rather than types. I do feel like pattern matching of values (eg Haskell or Elixir — like) lends itself quite naturally to handling exceptions; in many ways more naturally than a simple type check.

@tkf
Copy link
Member

tkf commented Sep 19, 2019

I probably should have said debuggability rather than usability. I don't think the expressiveness is the problem. I was trying to point out that stack trace would contain less information and so impeding fast debug. Something like post-mortem debugger would also be impossible with this approach.

@c42f
Copy link
Member

c42f commented Sep 19, 2019

Oh sorry, I think I see what you're getting at. The point is that the chain of custody is broken "missing a link" because at the top level bar() is called without being annotated?

Interesting. In that case I guess there's a design decision to make about where the rethrow happens but the most obvious would be that the error is rethrown and appear to come from line 1 or 2 of bar(). That seems somewhat sufficient, as the rest of the chain of custody is intact and manifest in the source code?

@tkf
Copy link
Member

tkf commented Sep 19, 2019

On no, I don't think it's broken. I just think throw-by-return is harder to debug. I still think it's a nice feature to have. But I don't think it's entirely free (i.e., there is a speed-debuggability tradeoff).

That seems somewhat sufficient, as the rest of the chain of custody is intact and manifest in the source code?

My first example was the simplest case where the chains were short. But I don't think it is possible to recover the full trace in general. Consider

foo1() = rand() > 0.5 ? (throw BazError) : 1
foo2() = rand() > 0.5 ? (throw BazError) : 2
foo3() = rand() > 0.5 ? (foo1() throws BazError) : (foo2() throws BazError)
foo4() = rand() > 0.5 ? (foo1() throws BazError) : (foo2() throws BazError)

function bar2()
    x = foo3() throws BazError
    y = foo4() throws BazError
    return x + y
end

I don't think the line number of bar2 would tell you the origin of the error accurately.

@tkf
Copy link
Member

tkf commented Sep 19, 2019

@c42f I think another relation of the chain of custody to the structured concurrency is the multi-exception handling. Should throws support CompositeException?

function baz()
    (@sync begin
        @spawn foo() throws FooError
        @spawn bar() throws BarError
    end) throws FooError BarError  # ???
end

qux() = baz() throws FooError BarError  # ???

My guess is that the answer is "no" and the function using @sync should manually re-throw well-typed exceptions (and it's better to discourage using throws CompositeException).

@c42f
Copy link
Member

c42f commented Sep 19, 2019

the function using @sync should manually re-throw well-typed exceptions

Agreed, I think CompositeException should be viewed more as a transport mechanism than something you ever want to match on. That goes for most of the exceptions which wrap exceptions; they don't have much meaning in their own right and you typically want to unwrap to figure out what really went wrong. An ideal system would have a way to automatically unwrap some of these things.

But this observation about "unwrapping to get at the real error" is not even limited to these cases. Consider SystemError; it comes with an error code and it might be this which you want to match on to determine the difference between an "expected error" and something which should continue propagation.

I feel like the way to deal with this is some kind of pattern matching which constructs a predicate from an expression like SystemError(_, FILE_NOT_FOUND, _) (similar to Match.jl or other such systems). If we could pass such a predicate up the call stack, this can communicate precisely which errors are "expected" to the throw site and intercept them before they've been thrown. With that approach the compiler even has a chance of optimizing away parts of the construction of the exception itself if only the matched parts of the structure are returned. For example, in the pattern above we discard the first argument to SystemError which is an error message string which might be expensive to construct.

To expand this suggestion in rough made up syntax

try
    foo(filename) throws IOError
catch @match SystemError(_, FILE_NOT_FOUND, @var(extrainfo))
    # Only get here if a `SystemError` with FILE_NOT_FOUND error code
    # *would have* been thrown inside `foo`.
    # The variable `extrainfo` now contains the value it would have had if
    # the exception actually been thrown.
    println("File not found (extra info $extrainfo)")
end

I'll admit there's a tension here in matching internal structure of types which is often considered an implementation detail in Julia. A rather speculative solution to that could be to make the suggested throw keyword take key value pairs a bit like @info, with pattern matching somehow based on those rather than the internal structure of the exception object.

@tkf
Copy link
Member

tkf commented Sep 20, 2019

In terms of compilation, I guess it is not beneficial to have value-based matching in catch? If so, maybe just allow arbitrary Julia expression as the predicate? For example, how about

try
    ...
catch err::ExceptionType if PREDICATE
    ...
end

The reason to prefer this over if-else blocks inside catch is that the if-else approach requires users to not forget to put else rethrow() at the end. This is a boilerplate and can easily introduce bugs.

This also lets you do the match based on information other than exception

try
    foo(filename) throws SystemError
catch err::SystemError if endswith(filename, ".jl") && err.errnum == FILE_NOT_FOUND
    println("Julia file not found (extra info $(err.extrainfo))")
end

Going to this direction, it would be nice to have some kind of uniform public accessor/query API. One simple API may be dueto(err, reason) :: Bool which can be used as

dueto(err::CompositeException, FooError)
dueto(err::SystemError, FILE_NOT_FOUND)

But defining a nice full API for CompositeException sounds very hard (just reading python-trio/trio#611)....

@c42f
Copy link
Member

c42f commented Sep 20, 2019

In terms of compilation, I guess it is not beneficial to have value-based matching in catch?

From the point of view of the compiler the matching function is just a closure and this doesn't seem very different from the many other places where we currently pass closures around and specialize the called function based on the type of the closure. reduce for example :-)

So I think the general idea is harmonious with the other features we already use in the language and may not be super hard on the compiler, provided that the chain of custody for errors isn't deep on average (cf. #7026 (comment)). (The amount of specialization can also be tuned using compiler heuristics, as usual.)

The reason I like this idea is that it unlocks some optimization opportunities which are more or less impossible in normal exception systems. Usually it's a choice between:

  • Be slow but explanatory: Create a nice formatted error message with lots of metadata so users can figure out what broke.
  • Be fast but opaque: Return some integer error code or equivalent.

The decision about which to use should belong to the caller, but typically it's the callee which makes this decision. (In rare circumstances the callee provides an option to select one or the other. For example the raise option to Meta.parse.)

What I'm suggesting here is a general mechanism which allows every throwing function to benefit from the efficiencies of having a flag like raise. But without any extra effort from the programmer.

@tkf
Copy link
Member

tkf commented Sep 20, 2019

From the point of view of the compiler the matching function is just a closure

What I meant to ask was if it makes sense to have "structured" match expression (e.g., @match ...) rather than allowing any expression in order to help the compiler. But your comment answered my question.

a general mechanism which allows every throwing function to benefit from the efficiencies of having a flag like raise

This would be great 👍

@c42f
Copy link
Member

c42f commented Sep 20, 2019

Right, from the perspective of the compiler the use of @match (or whatever) is mostly irrelevant; that's just one possible front end for generating a closure to destructure the exception object and figure out whether to throw-by-return vs throw by the usual mechanism (currently longjmp).

@c42f
Copy link
Member

c42f commented Sep 27, 2019

I tried implementing a prototype of the exception-matcher-as-closure idea and it seems very promising. However a full prototype seems to require language support because we need the closure to not participate in dispatch (though we very much do want it to participate in specialization). The issues are almost the same as for keyword argument dispatch and Jeff's proposed solution at #9498 (comment) applies in a very similar way.

That makes me think it would be useful to have generic runtime support for arguments which are ignored during dispatch, which have some rule for "filling them in" when they're not specified, and some way to pass them implicitly.

Another case where something like this might be useful is macro calls and the magic __module__ argument. Currently this is implemented with special lowering, but that produces confusing method errors like no method matching @foo(::LineNumberNode, ::Module) where the implicit arguments are exposed to the user. Of course this could be patched up, but the implicit arguments don't need to participate in dispatch and could go via a custom calling convention instead.

@non-Jedi
Copy link
Contributor

As a potential source of inspiration, one piece of prior art I don't see mentioned here is Pony's error handling. On the surface it smells very similar to this proposal.

Basically, any time a function can throw an error it becomes a "partial function" which must be marked with a ? both at the function declaration and at the call-site. Partial functions can only be called within other partial functions or within a try block. From my brief experience with Pony I found this a pleasant enough way of handling exceptions as it forces each function that can throw to explicitly document all conditions under which it can throw.

Of course, Pony without multi-methods that can be extended by users occupies a very different space in language-design than Julia, but I still thought the similarities with this proposal were striking.

@bcasselsND
Copy link

I haven't read through all of the discussion here. But I would encourage people to look at the Common Lisp condition system for inspiration. Dan Weinreb designed a system that covers a lot of useful cases.

Here's a good discussion:
http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html

@bcasselsND
Copy link

Of course, the Common Lisp ideas were carried over to Dylan's exception handling. But I would look at Lunar. It looks like Dave Moon (of Common Lisp and Dylan worlds, among many other accomplishments) has distilled this down and moved it firmly to the generic function world, in his proposed Lunar language:
http://users.rcn.com/david-moon/Lunar/exception_handling.html

@c42f
Copy link
Member

c42f commented Mar 24, 2020

Beautiful, thanks for these links. I agree the common lisp condition system is quite interesting. And it's particularly interesting to have the modern take on it in that Lunar writeup.

There was some very related discussion of algebraic effect systems over at #33248 (comment) (see also the following discussion). Though for a dynamic language like Julia the common lisp condition system seems a better analogy than the algebraic effects systems in strongly typed functional languages.

I had a quick look at the Lunar writeup (will need an in-depth read later). I think it's exactly the kind of thing which would fit in well in Julia; in fact it's almost exactly the design I had in mind after reading about algebraic effects, including the behavior of the chain of catch handlers. In particular, the following paragraph about Lunar was very close to what I had in mind:

The throw function (when called with one actual parameter) executes a method that was made dynamically available by catch. The method is not selected by the usual rules; instead throw invokes the first applicable method in the sequence of available catcher methods, with no consideration of method specificity or ambiguity.

(For people following along, note that Lunar's throw is very different from Julia's throw. In the language of algebraic effects, Lunar's throw would be the perform keyword mentioned at https://overreacted.io/algebraic-effects-for-the-rest-of-us/)

I'd kind of like to bundle all these things which provide restarts under the label "effects systems" and I have a half-written summary about this and how it could be applied in Julia. Actually I feel all this is possible to prototype already — indeed @MikeInnes has prototyped https://github.com/MikeInnes/Effects.jl which is super cool and very much relevant.

The really difficult thing is figuring out how to fit this stuff into the runtime so that it becomes a reliable language feature which doesn't stress the compiler too much and which also generates really fast code when it matters. I feel like we'll only truly succeed with an effects system if the runtime can generate code which

  1. Would be as fast as returning an integer status code by hand and testing that for shallow call stacks where "failure is expected" and dealt with immediately.
  2. Can support unwinding from deeply nested function calls without specializing the intermediate call stack on the effects handlers.

If we don't succeed at (1), people will continue to need return codes; if we don't succeed at (2) the compilation time will (presumably) be unacceptable. But if we can do both things, we may get one consistent error handling strategy everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error handling Handling of exceptions by Julia or the user julep Julia Enhancement Proposal
Projects
None yet
Development

No branches or pull requests