RFC: Error Reform and Failure Box #492

mitsuhiko · 2014-11-29T23:00:09Z

This RFC proposes four things:

a reform to the Error trait to support better messages.
a new Failure<T> wrapper for errors to support optional tracebacks.
a ConstructFailure trait to support interoperability between errors and failures.
New try! and fail! macros that use ConstructFailure instead of FromError.

sfackler · 2014-11-29T23:03:21Z

text/0000-failure-box.md

+solution to this problem is to box the error up and move it to the heap.  While
+heap allocation can sound controversial, the truth is that an `IoResult<u8>` is
+currently 72 bytes large.  Given that the vast majority of IO calls will actually
+succeed this is a huge overhead.


I'd still like to see some concrete examples showing that there is in fact "huge overhead" here.

Yep, i'm going to provide a proper benchmark for this. The main problem is making it representative :)

Not yet in the RFC because I need to validate those numbers, but I get this currently:

test bench_large_result_3 ... bench: 134 ns/iter (+/- 37) test bench_large_result_7 ... bench: 205 ns/iter (+/- 128) test bench_large_result_ok_3 ... bench: 14 ns/iter (+/- 8) test bench_large_result_ok_7 ... bench: 26 ns/iter (+/- 27) test bench_small_result_3 ... bench: 128 ns/iter (+/- 38) test bench_small_result_7 ... bench: 134 ns/iter (+/- 41) test bench_small_result_ok_3 ... bench: 8 ns/iter (+/- 4) test bench_small_result_ok_7 ... bench: 13 ns/iter (+/- 12)

_N means that it went through N stack frames. large_result means that it's thrown without boxing, small_result is the same error in a Box. _ok means Ok branch taken, otherwise it's the Err branch. The error thrown is a regular IoError.

Code is in the rust-incidents repo.

Came here to ask for exactly these numbers. What is still needed to validate your benchmarks?

Well, this is for IoError sized structs. The odd thing is that I even see performance improvements on the OK branch for boxed up plain enums which seems suspicious.

frewsxcv · 2014-11-30T01:31:27Z

text/0000-failure-box.md

+
+# Questions
+
+Q: Why is `name()` and `description()` necessary?  What about translations?


s/is/are/

Thank you, fixed.

bstrie · 2014-11-30T02:22:10Z

cc @aturon

bstrie · 2014-11-30T03:30:32Z

cc @glaebhoerl

pythonesque · 2014-11-30T03:54:12Z

+1, with this RFC I would feel comfortable using Rust's normal error handling in many more scenarios than I currently do (because of size concerns and lack traceback).

arthurprs · 2014-11-30T04:15:07Z

Thanks for bringing up the fat results issue for discussion.
Although useful the hole traceback thing is very complex. Mixed feelings about it.

nielsle · 2014-11-30T07:30:57Z

text/0000-failure-box.md

+
+```rust
+trait ConstructFailure<A> {
+    fn construct_failure(args: A, loc: Option<LocationInfo>) -> Self;


Will this lead to unneccesary allocation? Perhaps loc could be a closure that returned an Option<LocationInfo>

The ideal case is loc bring a uint that holds the instruction pointer and getting debug info from dwarf. There is nothing that prevents this other than that it's more work :)

arielb1 · 2014-11-30T10:43:14Z

@mitsuhiko

Could you open an issue for the missing null pointer optimisation?

mitsuhiko · 2014-11-30T11:22:16Z

@arielb1 filed as rust-lang/rust#19419.

alexcrichton · 2014-12-08T19:32:02Z

cc rust-lang/rust#18629, a PR about conventions with the values returned by these error methods.

aturon · 2014-12-08T23:54:33Z

@mitsuhiko Sorry for the delay in getting back to you on this -- just back from the Mozilla work week in Portland.

Thanks for writing up such a detailed and clear RFC!

At a high level, I really like this proposal. It's highly practical, and increases the ergonomics of debugging without undue burden elsewhere. So mostly, I just have quibbles/questions about the details.

Despite the entry in the FAQ, I'm still not clear on the distinction between name and description. The vast majority of exception types that I'm familiar with just use the description and detail fields as I originally did with the Error trait. Can you give some concrete examples that motivate having all three fields, and explain why you couldn't handle them with two?
Is the use of Box in the ndebug version of Failure strictly to make it more implicit/easier to worked with boxed error types? I had some questions there:
- You say that the use of Box "forces developers to implement the error trait for the box as well." But why not give a blanket implementation of Error for Box<T> where T: Error? I tried this and it seems to work fine.
- You say "it also makes pattern matching over data contained on the error a lot more awkward as it requires deref-ing the error first." But I don't see how the Failure set up changes this -- maybe I'm missing something?
To be clear, do you intend that "leaf" APIs (the ones that originate an error) would just return a Result over that error type directly, whereas applications and abstract libraries would opt in to Failure for tracebacks? You talk a bit about this, but I wanted to clarify this point.
I believe that the ConstructFailure can basically be totally hidden? (If only we had full visibility hygiene for macros!).
The Clone bound is problematic for the new object safety rules. I think right now there is a hole in the rules that allows unsafe super traits, but once this is plugged I believe you won't be able to use Clone here any more. I suspect in the cases where you need it, you can always demand it separately.
The change from Send to 'static is probably ok, though the motivation isn't entirely clear to me (and the result is that you're not guaranteed to be able to communicate an error from one thread to another).
The downcasting infrastructure I'd prefer to leave out for now, as we decided when we originally landed Error. The problem is that it somewhat subverts the goals of the FromError infrastructure and the conventional enum-based representation for errors. We could certainly consider adding it later.
Overall, I found the ConstructFailure implementation a bit daunting. I'd like to iterate with you to see if we can simplify (though it's possible we cannot). Still, seems like it will be mostly hidden from view.
I think it's OK to enforce Error when using the try! macro; as you say, this is probably a good practice anyhow.
Finally, I'm just wondering about the 1.0 implications here. Clearly, the change to the Error trait is breaking, as is the Error requirement for try!. But I don't think anything else is. And if I understand the design correctly, it doesn't seem like the Failure box will actually be used anywhere in libstd itself (which is only an originator of errors). So this infrastructure could plausibly live out of tree for a while. I'm sympathetic to the concern that we want to standardize on this infrastructure, but I also don't want to standardize too quickly. In general, we're hoping that crates.io can serve as a means of experimentation and "market testing" before landing in std. To be clear, I'm not opposed to landing this now -- I just want to lay out the options. After all, on the flip side, having beautiful tracebacks could help make a stronger impression at 1.0.

Is anyone using rust-incidents in their own projects yet? What's your experience been like?

cc @wycats

mitsuhiko · 2014-12-09T00:20:55Z

@aturon

Despite the entry in the FAQ, I'm still not clear on the distinction between name and description. The vast majority of exception types that I'm familiar with just use the description and detail fields as I originally did with the Error trait. Can you give some concrete examples that motivate having all three fields, and explain why you couldn't handle them with two?

I started out with trying to understand the errors as they are reported currently. At present I miss the name the most because it's tricky to find out which exact error was actually raised. As the description is more free form text content and can differ greatly between errors.

The idea was that the error could be rendered like this: "{name}: {description) ({detail})".

The difference between name and description is that the name is not a sentence or phrase. The difference between description and detail is that the detail is computed on the fly whereas description is just a basic textual description of what the problem might be. It's also important to point out that currently very often description is the same as detail which makes generic error rendering read really weird.

You say that the use of Box "forces developers to implement the error trait for the box as well." But why not give a blanket implementation of Error for Box where T: Error? I tried this and it seems to work fine.

There are two problems with this. The first is that because at that point you cannot have tracebacks any more or each error needs to be able to store traceback and location information. The second problem is that I could not make generic interoperability between boxed and unboxed errors work while using generic implementations for Box<T>.

You say "it also makes pattern matching over data contained on the error a lot more awkward as it requires deref-ing the error first." But I don't see how the Failure set up changes this -- maybe I'm missing something?

You are right that the same issue applies to Failure<T> however the big difference is that Failure<T> does not implement Error. It becomes very confusing to have an error implementation on a box and on the inner type. The failure wrapper allows APIs to continue emitting the internal errors directly if they do not want to force memory allocations. Interoperability between errors and failures is still guaranteed through explicit semantics.

With failure it feels like there are two concepts: errors and a failure type that wraps an error. With Box<Error> and Error directly there are two things that both implement error but the way to pattern match on the data is different, one involves the box keyword.

However core problem is still that location information cannot be achieved with Box.

To be clear, do you intend that "leaf" APIs (the ones that originate an error) would just return a Result over that error type directly, whereas applications and abstract libraries would opt in to Failure for tracebacks? You talk a bit about this, but I wanted to clarify this point.

I would say it depends on the API. I would argue that the IoError currently is just too large and it really should be a simple enum. IO operations would just generally return a simple enum that implements Error and be done with it. However for some other APIs it might make sense to use Failures directly. I would not want to draw a clear line where which one is preferable. Voices have been raised that requiring heap allocations for error handling is not a good idea.

I believe that the ConstructFailure can basically be totally hidden? (If only we had full visibility hygiene for macros!).

Yes, that in a way would have been the plan. However because of the few different conversions it can provide I figured exposing it for documentation purposes is not the worst idea.

The Clone bound is problematic for the new object safety rules. I think right now there is a hole in the rules that allows unsafe super traits, but once this is plugged I believe you won't be able to use Clone here any more. I suspect in the cases where you need it, you can always demand it separately.

I agree that Clone is probably not the best idea but it would support tracebacks tremendously because you do not lose the original error information upon conversion. The issue with "in the cases where you need it" is that we do not have a way to say: but if the error is also Clone, tracebacks preserve it, otherwise not.

The change from Send to 'static is probably ok, though the motivation isn't entirely clear to me (and the result is that you're not guaranteed to be able to communicate an error from one thread to another).

The problem with Send is that you cannot have reference stored on them which makes some otherwise neat things impossible. It also currently (due to bugs?) requires lots of methods to be declared as X + Send even though X is already a subtype of send.

The downcasting infrastructure I'd prefer to leave out for now, as we decided when we originally landed Error. The problem is that it somewhat subverts the goals of the FromError infrastructure and the conventional enum-based representation for errors. We could certainly consider adding it later.

The traceback pretty much depends on it. Note that the API can be entirely internally for a start.

Overall, I found the ConstructFailure implementation a bit daunting. I'd like to iterate with you to see if we can simplify (though it's possible we cannot). Still, seems like it will be mostly hidden from view.

I agree with this. I would love to get some feedback on if it would be possible to a similar user experience differently.

Finally, I'm just wondering about the 1.0 implications here. Clearly, the change to the Error trait is breaking, as is the Error requirement for try!. But I don't think anything else is. And if I understand the design correctly, it doesn't seem like the Failure box will actually be used anywhere in libstd itself (which is only an originator of errors). So this infrastructure could plausibly live out of tree for a while. I'm sympathetic to the concern that we want to standardize on this infrastructure, but I also don't want to standardize too quickly. In general, we're hoping that crates.io can serve as a means of experimentation and "market testing" before landing in std. To be clear, I'm not opposed to landing this now -- I just want to lay out the options. After all, on the flip side, having beautiful tracebacks could help make a stronger impression at 1.0.

That's why I would like to split the rfc into two parts; the necessary changes to Error and then everything else. Once the Error trait has been adjusted to support name() and a 'static trait (and an undocumented TypeId :)) everything else can live externally. I can do without Clone but I'm pretty sure TypeId will be necessary but it can be entirely internal and only available for error handling experiments.

Is anyone using rust-incidents in their own projects yet? What's your experience been like?

Because the traits are different using incidents currently is basically too much work :(

alexcrichton · 2014-12-09T18:56:50Z

It also currently (due to bugs?) requires lots of methods to be declared as X + Send even though X is already a subtype of send.

I just tried this out in Cargo (we have a done of Foo + Send where trait Foo: Send), and it looks like this bug has been fixed! (see rust-lang/cargo#1024)

The change from Send to 'static is probably ok

I'll note that in Cargo we rely on being able to send errors across threads (just as a data point). Our use case is that the worker threads (aka parallel builds) are the ones generating errors which need to be communicated back to the main thread.

The problem with Send is that you cannot have reference stored on them which makes some otherwise neat things impossible

I may have missed something, but doesn't 'static also prevent storing references? The main type that Send blocks out today is Rc I think.

The downcasting infrastructure I'd prefer to leave out for now

As another data point, Cargo is using its own Error-like infrastructure (predated the std::error module) and one of the methods we have on our CargoError trait is an is_human method which is a flag indicating whether an error can be displayed to the user or not. I think that a form of downcasting would help a lot in detecting classes of errors like this (e.g. err.is::<HumanError>()). Right now boxed errors are pretty opaque and tough to get any sort of extra info out of (again, just a data point!)

alexcrichton · 2014-12-09T22:24:14Z

@mitsuhiko actually I'm a little confused about one aspect here. In the RFC you've defined the Error trait as:

trait Error: 'static + Send + Clone { ... }

But one your comments seems to contradict this:

The change from Send to 'static is probably ok ...

The problem with Send is that you cannot have reference stored on them which makes some otherwise neat things impossible

I think @aturon misconstrued the addition of 'static to think it was replacing Send, and I think I also did the same thing!

I just wanted to clarify what the intended outcome of the bounds on Error are. Note that Send implies 'static (so the bounds as written are redundant), and 'static implies no references can be contained (which I think contradicts what you were mentioning about storing references). Just making sure we're all on the same page!

mitsuhiko · 2014-12-09T23:50:48Z

I may have missed something, but doesn't 'static also prevent storing references? The main type that Send blocks out today is Rc I think.

You are absolutely correct. I'm not sure why I missed that :-/

actually I'm a little confused about one aspect here. In the RFC you've defined the Error trait as:

As mentioned I'm not sure about the intended trait bounds. The RFC is the conservative approach I think. I will do some experiments with just 'static and Send to see if it works out.

I just wanted to clarify what the intended outcome of the bounds on Error are. Note that Send implies 'static (so the bounds as written are redundant), and 'static implies no references can be contained (which I think contradicts what you were mentioning about storing references). Just making sure we're all on the same page!

Yeah. Just to clarify the intentions of the bounds:

Send was originally added to the error trait to allow sending them around (sensible)
'static was added (in addition) so that TypeId becomes available. I believe this is required to make this work? Maybe it's implied now due to Send?
Clone is from my experiments helpful to allow both error conversion as keeping the original error around. This might be not necessary and I will see if I can due the same without clone.

alexcrichton · 2014-12-10T03:56:54Z

Cool, thanks for the clarification @mitsuhiko!

aturon · 2014-12-17T22:09:31Z

@mitsuhiko Just an update: we do plan to take some action here but are currently swamped with other issues getting ready for the alpha. I will be in touch ASAP.

aturon · 2014-12-27T22:04:41Z

@mitsuhiko I just had a thought: I wonder whether the new fmt::String/fmt::Show changes should have an impact on the Error design?

One issue in the original design was whether detail should return Option<&str> or Option<String>. The latter makes it possible to lazily allocate/compute the string from other fields, but means that each such computation results in a new allocation. And in general we prefer being able to write to a writer, since producing a string is a special case.

So perhaps Error should inherit from fmt::String? But if so, would we want to retain something like detail as a convenience?

mzabaluev · 2015-01-10T20:02:50Z

@aturon If detail is made non-allocating, what does it provide on top of description?

As I see it, the expectation is that detail will be called once or a very few times, likely in some detailed error display or a catch-all logging handler. It may not be called at all, especially if the error is acted upon as non-fatal, chained, or ignored. When dynamic data are used to format the detail, allocating a string internally just in case it may be required would often be wasteful.

Maybe the detail method should take a writer to write to?

aturon · 2015-02-04T00:04:45Z

@mzabaluev Apologies, I didn't see your comment until just now.

To be clear, what I meant was that we could use what is now fmt::Display to replace detail -- and that API in fact takes a writer. This change has, in fact, already been made as part of a separate RFC.

aturon · 2015-02-04T00:08:06Z

@mitsuhiko I'm wondering your thoughts on the change to use fmt::Display in place of detail?

I would like to tease out any remaining backwards-incompatible aspects of this RFC; we are getting ready to stabilize std::error. I don't think there's been a lot of buy-in for further splitting up detail/description; I believe that was the main incompatible change? Please let me know.

Once we've extracted the backcompat hazards, I think this should probably be closed in favor of a crates.io implementation for the time being.

You mentioned splitting up the RFC into those two parts previously; are you still open to doing that? I'd like to make some progress here.

alexcrichton · 2015-06-02T23:27:16Z

Since this RFC was opened, the Error trait in the standard library has changed somewhat significantly (and almost has stable downcasting support) allowing for many sorts of more flavorful error abstractions to be built. There are likely still many extensions, and perhaps concrete errors, which can be added to the standard library, but we think it may be good to take a fresh look at the approach with a new spin on modern Rust (especially with relation to @aturon's last set of questions).

For now I'm going to close this RFC, but thanks regardless for it @mitsuhiko!

mitsuhiko added 7 commits November 28, 2014 23:22

Started work on failure-box rfc

e0d4983

Fixed up formatting

49c3fc0

Added missing rust markers

b1110f9

Updated RFC

5ef2a3c

Added some more comments about the motivation.

df69f40

Fixed various typos

0f4e7a9

Added section of incompleteness

d2d0afa

sfackler reviewed Nov 29, 2014
View reviewed changes

idea -> ideal

38a1ff8

frewsxcv reviewed Nov 30, 2014
View reviewed changes

nielsle reviewed Nov 30, 2014
View reviewed changes

is -> are

62a90e9

nrc assigned aturon Dec 4, 2014

alexcrichton mentioned this pull request Dec 8, 2014

Clarify Error description and detail should be single line rust-lang/rust#18629

Closed

aturon mentioned this pull request Dec 27, 2014

Implement error interoperation rust-lang/rust#17747

Closed

aturon mentioned this pull request Mar 24, 2015

First-class error handling with ? and catch #243

Merged

aturon added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label May 22, 2015

alexcrichton closed this Jun 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Error Reform and Failure Box #492

RFC: Error Reform and Failure Box #492

mitsuhiko commented Nov 29, 2014

sfackler Nov 29, 2014

mitsuhiko Nov 29, 2014

mitsuhiko Nov 30, 2014

bstrie Nov 30, 2014

mitsuhiko Nov 30, 2014

frewsxcv Nov 30, 2014

mitsuhiko Nov 30, 2014

bstrie commented Nov 30, 2014

bstrie commented Nov 30, 2014

pythonesque commented Nov 30, 2014

arthurprs commented Nov 30, 2014

nielsle Nov 30, 2014

mitsuhiko Nov 30, 2014

arielb1 commented Nov 30, 2014

mitsuhiko commented Nov 30, 2014

alexcrichton commented Dec 8, 2014

aturon commented Dec 8, 2014

mitsuhiko commented Dec 9, 2014

alexcrichton commented Dec 9, 2014

alexcrichton commented Dec 9, 2014

mitsuhiko commented Dec 9, 2014

alexcrichton commented Dec 10, 2014

aturon commented Dec 17, 2014

aturon commented Dec 27, 2014

mzabaluev commented Jan 10, 2015

aturon commented Feb 4, 2015

aturon commented Feb 4, 2015

alexcrichton commented Jun 2, 2015


		# Questions

		Q: Why is `name()` and `description()` necessary? What about translations?

RFC: Error Reform and Failure Box #492

RFC: Error Reform and Failure Box #492

Conversation

mitsuhiko commented Nov 29, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bstrie commented Nov 30, 2014

bstrie commented Nov 30, 2014

pythonesque commented Nov 30, 2014

arthurprs commented Nov 30, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arielb1 commented Nov 30, 2014

mitsuhiko commented Nov 30, 2014

alexcrichton commented Dec 8, 2014

aturon commented Dec 8, 2014

mitsuhiko commented Dec 9, 2014

alexcrichton commented Dec 9, 2014

alexcrichton commented Dec 9, 2014

mitsuhiko commented Dec 9, 2014

alexcrichton commented Dec 10, 2014

aturon commented Dec 17, 2014

aturon commented Dec 27, 2014

mzabaluev commented Jan 10, 2015

aturon commented Feb 4, 2015

aturon commented Feb 4, 2015

alexcrichton commented Jun 2, 2015