Proposal: Panics returned as errors if the function has an error result type #32871

alvaroloes · 2019-07-01T14:33:53Z

TL/DR

Make panic stop "bubbling up" in the call stack when it finds a function with an error as last result type and return the panic content as that error.

This will reduce a lot of crashes due to "nil-pointer dereferences" (among others), making the application more stable and suitable for long running tasks (like servers).

Disclaimer

The entire proposal is based on my experience and point of view. The main goal is to validate it against the Go community to see if it makes sense and to check if other people share the concerns this proposal try to address.

Motivation of this proposal

To avoid (or reduce to the minimum) unexpected application crashes.
In Go, it is pretty easy to crash the entire application by an implicit panic (like a nil pointer dereference). It is even easier than in other languages because of the implicit dereferencing of pointers-to-structs variables.

This adds a feeling of insecurity when writing code, forcing the developer to be extra cautious and add checks in the code that increase verbosity and reduce readability.

Hopefully, this proposal will resolve that.

Proposal

When a panic occurs, it starts "bubbling up" until it finds a function that contains a defer with a recover. If it is not found, the application crashes.

This crash is (almost?) always not the desired result (from my experience), as you can't recover from it. The only way is to restart the application.
To avoid crashing, if you know your function can panic, you can add the "defer-recover" construct.

However, it is not always evident whether the function you are calling can panic (even if you look at the source code), so it is really easy to forget adding the "defer-recover" construct.

This is especially true with "nil pointer dereference" errors, as they are hidden in apparantly good code and can be dormant for a long time until they appear. This can happen with any Go "pointer type": pointers, functions, maps, channel, and interfaces (not with slices).

For example:

func RefreshProduct(currentProduct *Product, stockIncrementer func(*Product) int) (*Product, error) {
	if currentProduct.ID == "" {
		return nil, errors.New("Product does not exists. ProductID: " + currentProduct.ID)
	}

	return &Product{
		ID:    currentProduct.ID,
		Stock: stockIncrementer(currentProduct),
	}, nil
}

This code could panic and make the application crash. You can follow two approaches to avoid the crash:

The safest: use named return values and the "defer-recover" construct:

func RefreshProduct(currentProduct *Product, stockIncrementer func(*Product) int) (newProduct *Product, err error) {
	defer func() {
		if recoveredPanic := recover(); recoveredPanic != nil {
			err = fmt.Errorf("something bad happened: %v", recoveredPanic)
		}
	}()
	
	if currentProduct.ID == "" {
		return nil, errors.New("Product does not exists. ProductID: " + currentProduct.ID)
	}
	return &Product{
		ID:    currentProduct.ID,
		Stock: stockIncrementer(currentProduct),
	}, nil
}

The "I-know-my-panics" way: check everything that can panic:

func RefreshProduct(currentProduct *Product, stockIncrementer func(*Product) int) (newProduct *Product, err error) {
	if currentProduct == nil {
		return nil, errors.New("currentProduct is nil")
	}
	if stockIncrement == nil {
		return nil, errors.New("stockIncrement is nil")
	}

	if currentProduct.ID == "" {
		return nil, errors.New("Product does not exists. ProductID: " + currentProduct.ID)
	}
	return &Product{
		ID:    currentProduct.ID,
		Stock: stockIncrementer(currentProduct),
	}, nil
}

This is just a very simple example. Things get worse when you are working with interfaces holding pointers.

This proposal's goal is to be able to get the safe version of the code but without any change in the logic (as long as the function already has an error return type)

How? With the following new rules

Rule 1: Whenever there is a panic (implicit or explicit), follow the current "bubbling-up" rules plus the addition of rule number 2.

Rule 2: When a function with an error as last return type is found, stop panicking and return the error with the contents that would have returned recover(). The rest of the return values are filled with the zero value as expected.

With this addition, when we see a function like this (for example):

func storeTo(store io.Writer, contents io.Reader) error {
	// ...
}

We are completely sure that the only way that function can finish is by returning nil or an error. No unexpected panics are leaked.

Implementation details

I don't have enough knowledge about the Go source code to provide implementation details.
My guesses are that implementing this proposal should not be too hard. It could be summarized as the following:

Whenever a function with an error type as the last result type is found, add the following implicit defer:

defer func() {
	if recoveredPanic := recover(); recoveredPanic != nil {
		err = fmt.Errorf("something bad happened: %v", recoveredPanic)
	}
}()

I guess there could be complications around the err variable, as it should be the named error return value.

Surely there are better ways to implement this, but the goal now is to know the opinion of the community to first verify if the proposal makes sense.

Backward compatibility

I would say that, for 99% of the cases, this change will be backward compatible.
That 1% of the cases are those programs that are using the "panic-defer-recover" construct for doing actual logic control flow.

Of course, those numbers are totally made up. Maybe there are more cases, but I prefer first to focus on validating the idea with the community first,

FAQ

1.- Would this only work for "nil-pointer dereference" panics?
No, it would work for all kind of panics, like index out of bounds panics.

2.- What to do when the user explicitly panics?
This will work for both implicit and explicit panics. It is true that when your function can return an error, doing:

panic("bad thing")

would behave in the same way as:

return errors.New("bad thing")

But the question here is, if your function can already return an error, Why would you panic instead of returning the error?
I'm really asking. I don't know if there is a case where it is preferable to panic than to return an error

3.- What happens if there is a defer that calls recover and the function has an error return type too?
It would behave in the same way as today: the defer will be executed, the panic will be recovered, and the function will return as if no panic happened.
The "conversion" from panic to an error only happens if the panics "leaks" the function.

4.- What if no function with an error result type is found?
The application crashes 😞 . But this is what should happen, as no "error recovery code" was found in the whole call stack, so the only thing the application can do is to kill itself.

The text was updated successfully, but these errors were encountered:

bcmills · 2019-07-01T16:11:03Z

A panic generally indicates a programmer error that should be detected during testing and code review.

Turning that panic into a run-time error makes the panic, reducing the likelihood that tests will reveal it: tests that panic fail, whereas tests that provoke an error-return within some (possibly deeply nested) internal function call may or may not surface that error in any way that is visible to the testing package.

Moreover, unlike other run-time errors, a captured nil-panic won't include additional details that may be helpful in diagnosing the bug. And there are some cases where a function with an error in the return signature is documented to always be nil (#20803 (comment)), so call sites for those cases would in many cases silently swallow panics — and, especially in such cases, can result in unexpectedly corrupted data (see also #28150).

I would certainly love to see improvements that address the root causes of nil-panics in Go, but not at the expense of detecting actual bugs. Instead, I would rather see improvements to fuzzing (#19109), more nil-safety in the type system (#28133 and others), and/or static analysis tools.

ianlancetaylor · 2019-07-01T21:26:06Z

Apart from whether this change is a good idea or not, I don't think we can make this kind of change in the Go language today. It would cause too many existing, correct, programs to silently break in cases where they occasionally, but rarely, panic.

alvaroloes · 2019-07-02T11:10:41Z

@bcmills

I would certainly love to see improvements that address the root causes of nil-panics in Go, but not at the expense of detecting actual bugs. Instead, I would rather see improvements to fuzzing (#19109), more nil-safety in the type system (#28133 and others), and/or static analysis tools.

Fair enough. To be honest, I agree with you about addressing the root cause.
This proposal was the end result of a mental process to try to come up with an idea to do some changes in the type system to make it impossible to access the content of a type that can be nil (like "Optional"s in other languages).
You can see some tries in the following comments (and replies):

After the discussion, my feeling was that there is a lot of resistance to doing changes (like the one discussed in the first comments about the zero value), so I tried to come with a "compromise solution" to "nil-pointer dereferences" errors to reduce crashes.
But, yeah, I'd rather prefer to see a solution where the compiler ensures there won't be any nil-pointer dereference.

In any case, I would say that nil-safety should be addressed sooner rather than later. I understand that testing and code-reviews can catch nil errors, but they are not foolproof (unlike a compiler-based solution). If they were, there wouldn't be nil errors in production code, and there are.

I'll keep giving this a thought to see if we can get to a good solution 🙂

@ianlancetaylor

It would cause too many existing, correct, programs to silently break in cases where they occasionally, but rarely, panic.

Thanks for your comment. Could you please elaborate a bit? (I'm genuinely asking)
I understood some cases that Bryan wrote, but I'm having a hard time to understand why a program that could crash with a panic due to a nil-pointer dereference is more correct than a program that handles that error through the normal error-handling flow.
Could you please put an example?

ianlancetaylor · 2019-07-02T13:15:19Z

@alvaroloes Interesting, as I have the exact opposite intuition. I don't think I can show a useful small example, as I am explicitly talking about cases that happen rarely. In case it wasn't clear, I'm talking about large existing programs that were written before this change to the language. For some of those programs, a nil pointer dereference will indicate that some data structure was built incorrectly. Those programs naturally have many functions that return an error value. If a nil pointer panic converts into an error return, then some caller will see a nil pointer dereference turn into a normal error. But this may be a case where errors are expected, say because it is contacting some other server over a network. The program will log the error and continue. But now it is working with an invalid data structure, and some operation on the data structure was interrupted midstream and so that overall data structure may now be completely corrupt. Before this change the program would have crashed, permitting it to be analyzed and restarted. After this change, the program's behavior will silently change with no warning. It will now continue running with a corrupt data structure, leading to unpredictable future chaos.

alvaroloes · 2019-07-03T10:23:02Z

Thank you, Ian, for the explanation. I still don't see it clear why an application would continue normally after an error has occurred (and just log it).

In any case, I prefer to close this issue and dedicate the time and effort to another one that fixes the root cause of nil panic.

Thank you all for your time and energy for reading and commenting on this issue!

lootch · 2019-07-03T10:52:50Z

On 7/3/19, Álvaro López Espinosa ***@***.***> wrote: Thank you, Ian, for the explanation. I still don't see it clear why an application would continue normally after an error has occurred (and just log it).

Very briefly, you are correct in a general sense, but you cannot compel every other developer to accept your preference in this case. Code has been written, good and bad, no doubt, and will continue to be written, that conflicts with that.

alvaroloes · 2019-07-03T12:17:16Z

I agree with you @lootch, and I understand that better or worse code can be written. For example, one could use panic-defer-recover as logic control flow and create a correct program that does its job.

The real deal of all changes to the language that are not fully backward compatible is that the ratio "value/(downsides * affected applications)" must be pretty high. It is a bit difficult to demonstrate that for this proposal, so the justification only comes from subjective opinions (which is not enough).

As I said, I'll keep thinking of another better proposal. Thanks for your comment.

[thinking out-loud]

I really like Go as it is right now, the simplicity is key feature any language should aim for.
But, gosh!, when I program in other languages where it is impossible to have a nil-pointer dereference error (or race conditions thanks to read-only values) I feel much more "relaxed" and confident about my code.

I wish Go would have had that into account since the beginning, as it is really hard now to add those changes in an orthogonal and backward-compatible way.

[/thinking out-loud]

lootch · 2019-07-03T14:38:22Z

On 7/3/19, Álvaro López Espinosa ***@***.***> wrote: [ ... ] >I wish Go would have had that into account since the beginning, as it is > **really** hard now to add those changes in an orthogonal and > backward-compatible way.

That mantra is perishable goods. I think the "try" efforts show that: keeping things the way they are is a "single truth", otherwise called "the devil you know". The moment you need to move away from that and there are multiple possibilities that cannot be accurately measured against each other, only good politicians know how to keep the community from splitting or the split from becoming a disaster. Technologists are almost by profession the worst negotiators and should be as conservative as the unreliability of their (social) measurements compel them to be. Pioneers and adventurers can take risks, engineers can't.

ianlancetaylor · 2019-07-03T23:05:07Z

@alvaroloes We can aim for a better paradigm, if we believe it is better, but my point is that we can't silently change the behavior of existing code. We can introduce new mechanisms, we can remove old mechanisms, but we can't silently change old mechanisms to new ones. There is too much existing code for that.

alvaroloes · 2019-07-04T11:45:52Z

Thank you both for the comments.

We can introduce new mechanisms, we can remove old mechanisms, but we can't silently change old mechanisms to new ones. There is too much existing code for that.

Gotcha. Let's keep trying. I'll keep spending a bit of my spare time to find some "mechanism" (with a positive value/downside ratio) to avoid those implicit panics.

alvaroloes · 2019-07-12T14:54:47Z

@ianlancetaylor @bcmills @lootch Here is a new proposal to try to have nil-safety checked at compile time: #33078

gopherbot added this to the Proposal milestone Jul 1, 2019

gopherbot added the Proposal label Jul 1, 2019

ianlancetaylor added v2 An incompatible library change LanguageChange Suggested changes to the Go language labels Jul 1, 2019

alvaroloes closed this as completed Jul 3, 2019

bradfitz added the error-handling Language & library change proposals that are about error handling. label Oct 29, 2019

golang locked and limited conversation to collaborators Oct 28, 2020

gopherbot added the FrozenDueToAge label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Panics returned as errors if the function has an error result type #32871

Proposal: Panics returned as errors if the function has an error result type #32871

alvaroloes commented Jul 1, 2019

bcmills commented Jul 1, 2019

ianlancetaylor commented Jul 1, 2019

alvaroloes commented Jul 2, 2019

ianlancetaylor commented Jul 2, 2019

alvaroloes commented Jul 3, 2019

lootch commented Jul 3, 2019 via email

alvaroloes commented Jul 3, 2019 •

edited

Loading

lootch commented Jul 3, 2019 via email

ianlancetaylor commented Jul 3, 2019

alvaroloes commented Jul 4, 2019 •

edited

Loading

alvaroloes commented Jul 12, 2019

Proposal: Panics returned as errors if the function has an error result type #32871

Proposal: Panics returned as errors if the function has an error result type #32871

Comments

alvaroloes commented Jul 1, 2019

TL/DR

Disclaimer

Motivation of this proposal

Proposal

Implementation details

Backward compatibility

FAQ

bcmills commented Jul 1, 2019

ianlancetaylor commented Jul 1, 2019

alvaroloes commented Jul 2, 2019

ianlancetaylor commented Jul 2, 2019

alvaroloes commented Jul 3, 2019

lootch commented Jul 3, 2019 via email

alvaroloes commented Jul 3, 2019 • edited Loading

lootch commented Jul 3, 2019 via email

ianlancetaylor commented Jul 3, 2019

alvaroloes commented Jul 4, 2019 • edited Loading

alvaroloes commented Jul 12, 2019

alvaroloes commented Jul 3, 2019 •

edited

Loading

alvaroloes commented Jul 4, 2019 •

edited

Loading