Skip to content
This repository has been archived by the owner on Mar 31, 2018. It is now read-only.

Differentiating Programmer Errors from Operational Errors #10

Open
chrisdickinson opened this issue Feb 15, 2016 · 77 comments
Open

Differentiating Programmer Errors from Operational Errors #10

chrisdickinson opened this issue Feb 15, 2016 · 77 comments

Comments

@chrisdickinson
Copy link
Contributor

This primarily concerns the error symposium participants, but may also concern the post mortem WG.

The Problem

  • Promise users expect all Promise-returning API usage to return a Promise object, success or fail.
  • This means that invalid API use is returned as a rejection, alongside other errors.
  • This necessitates the use of try / catch in async/await code.
  • This also necessitates error type checking in promise-based code.
    • Were a "abort on unhandled rejection" flag to land, this would reduce such a flags efficacy: given that such a flag would operate by attempting to immediately abort the promise at top of stack assuming no user-installed handlers were present, and given that in type-checking users would usually add user-installed handlers.

Proposed Solutions

Recovery Object

Add an optional object parameter to all promise-returning APIs. This will be known as the recovery object. It does not change the behavior of Promises/A+; it is purely a Node API-level pattern. It allows users to intercept errors before settling the returned promise, and change the outcome of the lower-level API call:

// recovery object
await fs.readFilePromise('some/file', {
  ENOENT() {
    return null
  }
})

// normal use:
fs.readFilePromise('some/file')

Pros:

  • Promise API users get the API they expect. Error symposium users get the API they expect.
  • Simple to implement.

Cons:

  • It's a new pattern.
  • It doesn't solve problems in the ecosystem.

--abort-on-sync-rejection

Add a flag to abort the process on synchronous new Promise rejection. This does not extend to synchronous throw within handlers. Potentially patch Promise.reject so that it returns a pending promise that rejects on next tick.

For example, the following usage would abort under this flag:

new Promise(() => { throw new Error() })
new Promise((_, reject) => { reject(new Error()) })

The following usage would not abort under this flag:

new Promise((_, reject) => { setTimeout(() => reject(new Error())) })
attainPromise().then(() => { throw new Error() })

The net effect being that the promise would immediately exit on invalid Node API use under this flag. Without the flag, the process keeps running "as expected" for the majority of promise users.

Pros:

  • Solves problem in ecosystem for native promise users.
  • Does not introduce a new pattern.

Cons:

  • Unclear on how common synchronous rejection is in Promise constructors. Believed to be uncommon, could be proven wrong?
  • Flagged and unflagged behavior diverge.

This Discussion

Please pose problems with or benefits of the listed solutions, or propose alternate avenues of inquiry which will be added to the issue text. Clarifying comments, like "how are you using this?" are in-scope. Comments that suggest that the desire to separate operational errors from programmer errors is invalid will be moderated in the interest of keeping this discussion productive. Thank you for participating!

@phpnode
Copy link

phpnode commented Feb 15, 2016

--abort-on-sync-rejection sounds like it would be slightly more compatible with the ecosystem if it was only triggered on TypeError or ReferenceError.

I don't think introducing the new pattern from your first example is a good idea, and since it doesn't work with the ecosystem I don't think it's a valid option.

@chrisdickinson
Copy link
Contributor Author

I don't think introducing the new pattern from your first example is a good idea, and since it doesn't work with the ecosystem I don't think it's a valid option.

To clarify, it doesn't solve the problem in-ecosystem for existing promise-based APIs. The abort-on-sync-rejection would only solve for native Promises. If possible, I'd like @DonutEspresso to weigh in on whether that's an acceptable trade off before we say the recovery object isn't a workable solution.

@benjamingr
Copy link
Member

I'd appreciate it if we also consider the .recover extension I describe here:

let file = await fs.getFileAsync('dne').recover({
   EISDIR(err) {}
   ENOENT(err) { ... }
});

and copying your objections with replies here:

It puts core in the "Promise subclass" business, which I don't think we want to be in. Returning anything other than instances of the Promise class is liable to be confusing for users. Additionally, subclassing means that it becomes harder for application-level users to globally swap out Promise to their preferred implementation.

Why don't we want to be in it? Promises were built with subclassing ability exactly for us to be able to do it. It's a lot cleaner than patching promises, it allows reuse and so on. It lets recover chain cleanly and it allows using it in user code.

I'm not sure why it would be confusing to users, you can still globally swap out Promise and like I said if you support it in core we'll probably support it in bluebird too.

Since it uses .catch internally, it seems like it would be unsuitable for the purposes of post-mortem users (who look to be hitting on approach to abort on unhandled rejection while preserving stack when no user-installed catch handlers are installed.)

Why? It would use catch synchronously which is fine for post-mortem debugging unless I'm missing something. This is exactly the sort of context you'd want (handling the errors there). It would rethrow if no handler matched so it is still a synchronous throw in the asynchronous context. There is no magic or blessed core parameters - whatever solution we add to shim for no pattern-matching in catch I'd like to see it be built in a way that can be used in userland easily.

It moves the recovery object's handler method off the top of stack. This means that, should an author wish to abort in certain known operational error cases, the stack will be lost.

The stack would be exactly the same wouldn't it? Instead of a callback inside we take a callback outside.

For what it's worth, native promises are a little silly in that they always use the microtask queue. Bluebird for instance avoids it if it can prove the promise doesn't resolve synchronously anyway and avoids the extra queuing.

It looks like .recover() (vs. recovery-as-parameter) would get in the way of at least a few use cases, and put core in the business of augmenting the promise object, which I believe we do not wish to do.

Again, subclassing is not augmenting - it is extending. I'm not suggesting we alter the window.Promise object at all. I'm suggesting that instead of implementing this in about a hundred places for the entire core API we implement it once as a subclass.


Here is a fully concrete example you can run on your own computer, the actual example is at the bottom:

"use strict";
var fs = require("fs");

class CorePromise extends Promise {
    recover(o) {
        return this.catch(e => {
            if(o.hasOwnProperty(e.code) && typeof o[e.code] === "function") {
                return o[e.code].call(undefined, e);    
            } 
            return CorePromise.reject(e); // or `throw e`.
        });
    }
}

function readFilePromise(filename) { 
    // a terrible way to promisify, just for the example
    return new CorePromise((res, rej) => fs.readFile(filename, (err, data) => {
        if(err) return rej(err);
        else return res(data);
    }));
}

readFilePromise("nonExisting").recover({
    ENOENT(e) { console.error("Not Found :(", e.path) }
});
readFilePromise("actuallyADirectory").recover({
    // not caught, propagates to an unhandledRejection
    ENOENT(e) { console.error("Not Found :(", e.path) }
});
// this handler gets called for the directory since ENOENT does not match it.
process.on("unhandledRejection", (e) => console.error("Unhandled Rejection", e));

@petkaantonov
Copy link
Contributor

Both (recover object and recover method) are non-standard promise API extensions that will be obsoleted when the standard catch statement is extended to support custom predicates

@chrisdickinson
Copy link
Contributor Author

@benjamingr: To address subclassing:

  1. One of the goals of adding a Promise API to core is to reduce friction by settling on a common-baseline Promise implementation. That is to say, instead of having to pick a shim and an implementation, a package author that desires to use promises need only pick an implementation if the baseline doesn't suit their needs. Returning a subclassed variant of Promise distances us from this goal by introducing a new type of promises alongside native promises — users have to determine whether or not they have a native promise or a subclassed CorePromise (this is what I mean by "augmenting" — not all promises have the same methods, and that can cause confusion.)
  2. .recover changes the Promise API semantics, while a recovery object does not. There have been strong objections voiced about changing the semantics of promises in other threads.
  3. .recover precludes aborting at top-of-stack for error symposium users.
  4. Because CorePromise would extend global.Promise at startup time, and the promisify wrapper would hold a reference to CorePromise introduced by a variable binding, it would preclude a user from swapping out the promise implementation globally, because core APIs would continue to return CorePromise instead of their desired implementation. If we work to make it so that CorePromise can be swapped, then it limits valid implementations to those that include .recover.
  5. As a platform, I would prefer to limit us to returning well-specified Promise objects by default — that is, Promise and CancellablePromise (in whatever form that lands in.) This means that creating our own CorePromise wouldn't be possible without a clear specification noting the behavior of CorePromise. This would also preclude us from returning ad-hoc .then-ables.

@spion
Copy link

spion commented Feb 15, 2016

@chrisdickinson here is, I think, where most promise users have the biggest disagreement.

Promise users do not find it necessary to distinguish between programmer vs operational errors. When using promises, there are only expected and unexpected errors and only the consumer should decide which errors are expected and which aren't.

I'm not going to discuss that in details here, the problem is that both promises and async await are designed with this assumption being fundamental.

As a result I think that to solve this issue we must discuss precisely why this distinction exists in node, what are the problems with try {} catch {} and whether promises perhaps solve those problems already, in a different way.

So far, the only real issue I've seen has been post-mortem debugging, but we may be able to solve that in a different way too (#8).

@chrisdickinson
Copy link
Contributor Author

@spion I think @groundwater proves the exception to the assertion that promise users do not find differentiating these errors necessary. He is a user of promises via async/await, and finds it necessary to differentiate "I called the API incorrectly" from "this is a recoverable state of the program."

While promises (and async/await) are designed with the fundamental assumption that exceptional error flow is expected, I don't think that precludes us from accommodating the use case that @groundwater and the error symposium represents at the Node API (or platform) level.

@chrisdickinson
Copy link
Contributor Author

@spion:

As a result I think that to solve this issue we must discuss precisely why this distinction exists in node, what are the problems with try {} catch {} and whether promises perhaps solve those problems already, in a different way.

I agree that we should have a discussion on this, but this thread is probably not the best place for it. As part of addressing #12 I will be PRing a document in summarizing their needs and desires. Would you be opposed to holding off on the discussion of the programmer/operational error distinction in this thread and deferring it to that PR? I will endeavor to have it up by end of day today.

@spion
Copy link

spion commented Feb 15, 2016

@chrisdickinson I think this is a good place, because if we determine that there are no reasons other than post-mortem debugging, and another solution for PMD is found, this issue would likely close.

Anyway, I wrote my argument here:

https://gist.github.com/spion/3a1cad2debc14c56f353

@phpnode
Copy link

phpnode commented Feb 15, 2016

@chrisdickinson the point is that we (promise users) do differentiate, with Bluebird we'd do something like this:

return request(url)
.then(storePageInADatabase)
.catch(Http400Exception, handleNotFound)
.catch(Http500Exception, handleServerError);

But of course this is not part of the promise spec. With async await,

try {
  const page = await request(url);
  const pageId = await storePageInADatabase(page);
  return pageId;
}
catch (e) {
  if (e instanceof Http400Exception)
    return handleNotFound(e);
  else if (e instanceof Http500Exception)
    return handleServerError(e);
  else 
    throw e; // Something we weren't expecting.
}

and in hypothetical pattern matching dream-land:

try {
  const page = await request(url);
  const pageId = await storePageInADatabase(page);
  return pageId;
}
catch (e: Http400Exception) {
  return handleNotFound(e);
}
catch (e: Http500Exception) {
  return handleServerError(e);
}

This kind of error-subclassing pattern is really common in promise driven code but very rare in callback code because throwing is so discouraged.

The fact that operational vs programmer error differentiation is even possible in node is an incidental side effect of this requirement that async callbacks never throw and therefore normal error handling must be done differently. This kind of difference cannot be detected in a synchronous try..catch block for example.

Promises treat errors in a way more similar to a try..catch, and therefore their error propagation mechanism and error handling conventions are different from callbacks, but the capabilities still exist.

@Qard
Copy link
Member

Qard commented Feb 15, 2016

-1 for recover object as an argument. It conflicts with optional object arguments, which the exact fs.readFile(...) call that has been used in examples supports. Having two optional object-type arguments in a row would just be asking for trouble.

+0 on the promise.recover({ ... }) method though. As suggested already, I just don't think the try/catch approach will work adequately for post-mortem until checked/pattern-matched exceptions are a thing.

Personally, I don't actually care much about the post-mortem abort expectation. If you care so much about runtime safety, why would you be writing JS? I'm not going to argue that point though--I recognize I have a strong opinion that some don't agree with there.

@chrisdickinson
Copy link
Contributor Author

@Qard: At present, the recovery object is mixed in with the options object — it's always the last object argument.

@phpnode: Yep — I do this as well in my bluebird-based code. However, this isn't acceptable for error symposium users, since they want to separate "known failure state" from "unknown failure states", and they'd prefer unknown failure states be easy to locate when they emerge.

@DonutEspresso
Copy link

I've been peripherally involved in some of these discussions, so I'll continue to provide some input so long as it's useful. To start, I agree with @spion that expected/unexpected is a good way to frame the discussion. I'll try and summarize in a few short bullet points what my typical game plan is, which is likely similar for some of the other systems oriented folks:

  • Use errbacks as the primary channel for communicating expected errors. If you're using an API that throws underneath the hood (JSON.parse), localize your try/catch and return the error in an errback. If you're ever handed an Error object, that's an expected error, so it must be handled.
  • When it comes to unexpected errors, most of them are usually automatically thrown, so fail fast and fail hard (abort)
  • The action of aborting should generate core dumps useful for post mortem debugging
  • Complete the loop by going back and fixing the source of the error.
  • Because all expected errors are communicated via Error objects, anything that throws is assumed to be catastrophic (abort). Whether that's a typo, a null ref, or an explicit throw in code you own.

By adhering to this strategy, you effectively have two "channels" of communicating errors. You have an errback channel for expected errors, and an throw channel reserved for fail fast unexpected errors. IMHO the crux of the disagreement is almost certainly this: when possible, I actively avoid the use of throw/try/catch as a channel for communicating expected errors. I do so primarily because:

  1. catch as it exists doesn't allow me to safely differentiate expected from unexpected errors. instanceof isn't foolproof, and fails under various scenarios. Same with duck typing. And even if pattern matching lands, checking on e if TypeError isn't always semantically correct because APIs could be returning TypeErrors as expected errors. Unfortunately, JS's lack of real types hurts here.
  2. unwinding the stack only to have to determine if we need to rethrow is a non-starter for post-mortem

To be super clear about this - my hesitation around using catch apply even outside the context of promises. I also want to note that the construct we've created of having two channels to communicate these errors is also a necessary evil (this is where the checked/unchecked errors came into being in the symposium threads). It seems unlikely that we'd go through all this trouble were it easier to discern errors with more confidence. JS's catch is spiritually identical to Java's catch (Throwable e).

TL;DR, I think we all agree on expected/unexpected errors existing, and needing to differentiate them. Promise users do that using .catch(), non-Promise users do it by never putting expected errors into the throw channel. Camps may disagree on how to handle unexpected errors (abort or not), but I think that's an orthogonal concern to this discussion.

I'm starting to think that the discussion really is around how we can make Promises compatible with post mortem debugging (if possible). The context of most of these threads is around trying to figure out how/when to abort process on unhandled rejections without breaking the expected behavior of Promises, and using flags to enable/disable those breaking behaviors. I might posit that if only for supporting the use case of universal JS, using flags has some implications in terms of surprising behavior depending on what side of the wire that code is run on.

@petkaantonov
Copy link
Contributor

@DonutEspresso

You are using unexpected/expected errors to mean operational/programmer errors. The distinction, and what @spion means, is supposed to be that whether an error is unexpected/expected is decided by the consumer.

Having two error handling channels doesn't add to this because the producer is the one choosing the channel. You are just adding extra work for the consumer which now has to handle both channels to see what they expect and not. You cannot have one channel for unexpected and one for expected because the producer is choosing the channel while the consumer is the one who must decide what is expected and what is unexpected.

Here's a silly example but you really should read spion's example about image cropping and transformations, they are more interesting

// If this throws, it is clearly unexpected
port.open(99999); 



var thePort = prompt("which port do you want to open?");
try {
    // If this throws it is clearly expected
    port.open(thePort);
} catch (e) {
    alert("Sorry, the port " + thePort + " couldn't be opened because: " + e.message)
}

This should make it clear that the port.open implementation cannot in any way decide what is expected or unexpected.

  1. unwinding the stack only to have to determine if we need to rethrow is a non-starter for post-mortem

Not with proper tooling

@DonutEspresso
Copy link

@petkaantonov I don't think we're saying different things.

If I were being dogmatic about this, I'd say that the problem is partially because port.open() (and likewise JSON.parse and its ilk) decided to throw an expected error instead of returning me an Error object. As you said, it's now forced me to look for errors in two different places based on whether the operation was asynchronous. It's why stuff like safe-json-parse exists.

In an errback based world, given that functions are the lowest common dominator, having independent channels is the primary (only?) path forward to normalizing those errors. Force everyone, regardless of asynchrony, to use errbacks. I'm not saying it's the right solution, or that they even apply to promises, just stating that this is the current state of the world. We've settled on using return codes to normalize error handling.

Promises take a vastly different approach by streamlining everything into a single channel and handling them via catch. IMHO, which approach you take is completely driven by how comfortable you are with the trade offs re: post mortem and fidelity in differentiating errors via catch.

Regarding @spion's crop transform, sure, it is entirely possible a second transform can become invalid after the first is applied. That is most definitely an expected error in my view, because it's something you should have been thinking about when you designed the crop transform class. By not handling an possible expected error you turned it into an unexpected error. Differentiating expected from unexpected is not just about being defensive against bad input.

It would be lovely if we could fix the post mortem issues for promises. Unfortunately, whatever solution we can come up with there doesn't help with regular try/catch.

@spion
Copy link

spion commented Feb 16, 2016

@DonutEspresso as demonstrated even if you are tediously defensive (as in the first example), the function applyTransforms there breaks the ability to be defensive. You can no longer be defensive without solving the halting problem - i.e. you can't know in advance if the list of specified transforms is a valid argument without applying them. You must return Result<T> instead, which is pretty much the sync version of promises in a language that has no exceptions.

regarding sync try-catch, one of the possible alternative solution in #8 may be able to offer enough flexibility to do PM debugging even in the face of try-catch. To be honest I don't know if its even remotely doable, but it seems worth giving it a try.

@DonutEspresso
Copy link

@spion Yes, I would have done something similar to what you proposed. I also don't think it's unreasonable in this situation to use a localized try/catch (YMMV depending on who you ask) within your internal implementation. Presumably, in such scenarios it's also much more clear whether or not the code in your try block is a recoverable scenario. The whole thing about avoiding throw/try/catch is really around using it to communicate error scenarios across API boundaries.

@chrisdickinson Sorry for derailing this thread! Some of this is probably good info for the post mortem doc.

Re: the recovery object being proposed here - after thinking about it some more I think I'd echo some of the other concerns in terms of interoperability. If I wanted to write an abstraction top of fs, I for sure need to mimic the recovery pattern in my wrapper. I'm not a heavy promise user, but I would imagine that it might be terribly confusing for new comers to have to think about how/why to use two completely different error handling semantics within the same abstraction.

@chrisdickinson
Copy link
Contributor Author

We're verging into "invalidating the use case" territory a bit here. It's a bit hard to disentangle the needs of the error symposium from the needs of the post mortem group, but I'll give it a shot: My impression is that error symposium users wish to avoid APIs that require the following:

try {
  var val = JSON.parse(input)
} catch (err) {
  // bad json!
}

As @DonutEspresso noted, this is where we see things like safe-json-parse pop up. The worry I hear expressed by the error symposium is that, as async/await lands, absent action on our part to accommodate their needs, all Node Promise APIs will be written as:

try {
  var val = await fs.readFilePromise(input)
} catch (err) {
  // could be either bad input or bad state on disk.
}

How do we accommodate users who wish to differentiate between bad input and bad state? Especially, how do we accommodate users who wish to differentiate and are using post mortem tools?

It seems like there's pretty unanimous agreement that the recovery object route doesn't satisfy everyone's needs. Does --abort-on-sync-rejection? Under that proposed flag, the example above would abort at top of stack if input was bad, proceed to catch (err) on operational errors, or continue normally if no errors were encountered. It's analogous to the Checked/Unchecked error split proposed earlier in that it causes certain kinds of errors to abort immediately. Would a program running under such a flag encounter many difficulties in the ecosystem today? How common is synchronous rejection?

@DonutEspresso Not at all — thank you for your continued input!

@chrisdickinson
Copy link
Contributor Author

@rvagg:

Thank you for contributing this perspective.

I'd like to caution on treating this as a concern that can be isolated to "the error symposium participants" and "the post mortem WG" as it risks suggesting this addressing the needs of a very small interest group, which it's not.

An excellent admonition. I've been referring to the concerns brought by the groups by the names of groups themselves mostly for the sake of parsimony — I agree that it's important to keep in mind that the concerns that they are voicing are reflective of a larger group of users. Do you think documentation at README.md level would help address this concern? Would you suggest splitting the concerns-<group> labels out?

However, if we can address the concerns of people who see Promises as detrimental to discovering and addressing programmer errors then we might end up in a place where even they can be OK with their teams and companies they consult to using Promises because there's an escape hatch of some kind to bring them in line with their particular philosophy about how to deploy, debug and improve Node.js applications.

How does --abort-on-synchronous-rejection strike you in terms of providing this escape hatch? In particular, it would mean that the following program, run under the flag, would immediately abort with a useful core in the case of a bad filename input:

module.exports = function (filename) {
  return fs.readFilePromise(filename)
}

@phpnode
Copy link

phpnode commented Feb 16, 2016

@rvagg I feel like you made a lot of sweeping statements in that post that are pretty easy to disagree with, and your own dislike of promises is showing through. You are effectively saying that promises make node.js programs harder to debug, when the vast majority of promise users will say the complete opposite.

Promises are not something you should automatically jump to if your coders are writing error-prone code

Again, I would totally disagree with this. In my experience, Promises make it much easier to write robust code with proper error handling mechanisms. Programmer errors reveal themselves in just the same way as they do in callback driven code.

The key point though, is that callbacks and promises are different and so they have different semantics and their users have different expectations.

@spion
Copy link

spion commented Feb 16, 2016

Startup is cheap with Node.js and every Enterprise level deployment uses a cluster approach of some kind so the impact is relatively minimal on users.

This has not been true in our experience. Startup with Node is not nearly cheap enough, and there is always the tendency of the main (especially web serving) process to quickly blow up above the 1 second startup time, most of it spent synchronously loading thousands of modules from disk.

The impact is also not minimal because a single process serves thousands of users concurrently and all of them will lose their connection and may lose or corrupt their work when it crashes (depending on the guarantees of the storage layer)

But I digress.

Arguing about whether fail-fast or resilient-in-the-face-of-errors is the correct approach for Node.js will get us exactly nowhere.

This issue as stated at the top is not clear. It reads to me as "implement current node philosophy about error handling in promises". Which is a paradox and will definitely lead to the argument we want to avoid.

Post-mortem debugging on the other hand is a clearly stated problem.

@benjamingr
Copy link
Member

First of all thanks for dropping by @rvagg and discussing this with us. We'd really like to have you on board for the discussions so we can reach consensus.

Operations wants to have good insight into what went wrong so they can understand the scope of the problems and feed it to the right person to get it addressed, this is partly where postmortem debugging comes in (although it's actually pretty rare to see the kinds of advanced practices that Netflix and Uber use).

Right, promises change the thought-model. Bugs come with a lot more context since they are from composed chains (since a promise chain doesn't live in isolation). You typically get a ton of context on programmer errors and can find where things happened very quickly.

But at the programmer level you get a defensiveness that can lead to overcompensation, often resulting in error-minimisation or error-hiding practices like heavy use of try/catch or adopting Promises (so the errors go away completely).

Errors don't go away with promises. In fact - they propagate. If with callbacks you don't have to check if(err) at every step of the way - promises mandate that. Errors are no longer suppressed by default unless checked. If anything I think promises mitigate this problem rather than enhance it.

Of course neither of these is good practice for dealing with bad code and experts usually come in asking to wind back a lot of the safety mechanisms to expose the poor programming.

This is precisely why we added unhandledRejection - so promises never silence errors - even synchronous ones.

This is where fail-fast comes in as a mantra that's generally promoted with Node.js. Fail quickly when something unexpected happens, log or dump what you can in order to figure out what happened and restart.

The mental shift with promises is that you can get most of the useful parts of the debugging info and recover so you don't have to restart until you have a fix to apply. The server stays in consistent state at every step of the way since cleanup is performed as the stack is unwound.

Startup is cheap with Node.js and every Enterprise level deployment uses a cluster approach of some kind so the impact is relatively minimal on users.

This is something I don't understand - let's say you have a programmer error in your deployment. You have a problem where at /users/foo?bar=15 a TypeError occurs (for a very rare path). You "fail fast" and crash the server and dump core. Now let's say your startup is lightening fast and takes just 4 seconds for this big project - you also have a very strong machine for your service running 8 workers.

If I send you 2 requests every second (which is trivial) I can effectively cause a DoS attack on your server. What post mortem recommend (at nodejs/post-mortem#18) is to actually leave your server in inconsistent state and just hope for the best.

This is not contrived, @Raynos describes Uber having that problem here: nodejs/post-mortem#18 and I've personally seen it when consulting in several other big companies.

simply that Promises are not something you should automatically jump to if your coders are writing error-prone code

Obviously. Promises are not a magic bullet that somehow "fixes async" or causes you to "not have errors". They do however in my experience simplify my workflow greatly. What pattern you'd use for asynchronous coding is opinionated and while we can have a lot of fun debating it is not our place to make that choice for users and should offer users what they need to make an educated decision - making the lives of both people who use and avoid promises easier.

In fact, I suspect that the clash of philosophies over error handling is one of the biggest issues preventing a more nuanced discussion about the merits of Promises vs pure callbacks.

We'd really just want to see both coding styles supported in Node.

However, if we can address the concerns of people who see Promises as detrimental to discovering and addressing programmer errors then we might end up in a place where even they can be OK with their teams and companies they consult to using Promises because there's an escape hatch of some kind to bring them in line with their particular philosophy about how to deploy, debug and improve Node.js applications.

This is precisely why we introduced unhandledRejection in the first place. You can just add a e => { throw e; } and have thrown errors fail fast with promises. This is the escape hatch and you can in fact use that philosophy.

@rvagg
Copy link
Member

rvagg commented Feb 16, 2016

@phpnode and others, this defensiveness has to stop if you want to make progress here, you're not incentivising people with concerns to interact with this group and will simply entrench the divide. Please take my feedback as an actual perspective that exists and that you're going to have to engage with in order to break out of a WG and into a concrete implementation. I certainly don't have the energy to get bogged down in argument and don't appreciate you viewing me as someone with a faulty view of Promises that needs to be corrected, and I'm betting a lot of other collaborators are in the same boat.

@benjamingr
Copy link
Member

@rvagg

Please take my feedback as an actual perspective that exists and that you're going to have to engage with in order to break out of a WG and into a concrete implementation.

If you do not think that I have done that let me know. I believe I have. I definitely do value and appreciate your feedback here. I do however want to discuss it - there are gaps in my understanding of how you deploy code and what your requirements are and debating these things is important.

This working group is not only about "getting promises into core". It's also about making sure your workflows are not broken or ignored. Currently we're at a suboptimal solution where you're uneasy consuming code using promises (and not just writing that) - I don't see changing your perspective as a goal, I do want to understand it and make sure that the current things that are problematic for you are addressed like AsyncWrap with the microtask queue, post mortem debugging concerns and so on.

@chrisdickinson
Copy link
Contributor Author

Just a quick reminder that the topic of this discussion is scoped to differentiating operational errors from programmer errors. I believe @rvagg has reinforced the position that there are folks that really want some way of escaping from promise error propagation for truly erroneous, unexpected errors; we should take his (and @DonutEspresso's) use cases as valid and avoid credential-checking.

At present the only proposed option that is thus far unopposed is --abort-on-sync-rejection. Can I ask for a quick read on reactions to --abort-on-sync-rejection? Are there immediate problems folks see with it? Does it go too far, or not far enough?

@boneskull
Copy link

I'm in agreement with @spion 's assessment:

This issue as stated at the top is not clear. It reads to me as "implement current node philosophy about error handling in promises". Which is a paradox and will definitely lead to the argument we want to avoid.

I've stared at this issue for 45 minutes and cannot come to any other conclusion.

--abort-on-sync-rejection doesn't go far enough. Synchronous rejection this way seems rare, as stated. If you want to fail fast, you still have to do extra work. Imagine a terrible, dystopian world in which this code was commonplace:

func()
  .catch(panic)
  .then(anotherFunc)
  .catch(panic)
  .then(yetAnotherFunc)
  .catch(panic);

function panic(err) {
  // check err type here via some means
  new Promise((_, reject) => {
    reject(new Error(err));
  });
}

If you want to fail fast, Promises aren't your tool. Thus, the paradox as @spion writes. Maybe this should be closed?

@markandrus
Copy link

@chrisdickinson as an author of some Promise-based APIs, the proposed --abort-on-sync-rejection flag concerns me. I've written code that synchronously rejects, and I expect many others have, too. There are valid reasons to do so.

Given this, I do not think the time at which an error is raised (synchronously or next tick) is a good mechanism for distinguishing programmer errors from operational errors. This convention is a mismatch for Promises, and I am not sure how widely it is followed in callback-based code outside of Core.

I tend to agree with @balupton's suggestion of introducing a type for programmer errors. We could introduce this type to both callback- and Promise-based APIs in Core, without imposing any restrictions on synchronous rejections in Promise code. We could also build debugging features around such a type.

@benjamingr
Copy link
Member

@markandrus the approach suggested in groundwater/nodejs-symposiums#6 by @groundwater and @chrisdickinson is to check for a .code property for operational errors but it does not make the distinction based on the type alone.

This sounds like a reasonable path to explore - would you mind writing a concrete proposal based on it in a pull request?

@chrisdickinson can you document this in the proposed solutions in the post title?

@DonutEspresso
Copy link

@petkaantonov Sure, software is all about trade offs. Ultimately it's up to consumers to determine what's right for their use case. I just hope that I've clearly articulated the trade offs that we make at Netflix, and to ensure that the concerns that drove us to make these trade offs are captured as part of the discussion. We're certainly not the only ones.

@chrisdickinson I've been thinking about the flags that have been proposed so far, including --abort-on-sync-rejection. Since they trigger different behavior depending on the runtime, I think this could potentially break some assumptions about the write once run everywhere approach. I haven't yet fully thought through an end to end scenario, but it certainly seems like it could cause some "surprise!" moments. Is that ok, given the opt-in nature of the flags? Is using a flag synonymous with "you're off the reservation, you're on your own"? That seems to be the spirit of what we're proposing.

@markandrus
Copy link

@benjamingr thanks, I will review the PR you linked. My only concern with using .code as described is any overlap with userland error subclasses (I can think of one library I maintain that uses this property). Otherwise, if I can find time to help I will look into creating the proposal you mentioned. Do you mean opening it against groundwater/nodejs-symposiums?

@benjamingr
Copy link
Member

@markandrus that actually sounds like a good thing - or do they use .code for programmer errors too?

You can open it on this repo - I'd rather we keep all the stuff under the nodejs org since it enforces organization wide things (like the code of conduct) but I don't have strong feelings about it.

@rvagg
Copy link
Member

rvagg commented Feb 17, 2016

fwiw I've removed my original comment from here, it's clear I've not done a good job at expressing what I was trying to say and have only caused more argument in the process

@benjamingr
Copy link
Member

@rvagg argument in the process is fine and to be expected - it's definitely much better than not having the argument and not addressing the problems.

You have made it abundantly clear that you have a problem with promises - that's fantastic. I mean it. We could use someone like you here. If you will not voice your concerns and we will not debate them we won't be able to make progress. Not having someone with opinions like yours participate would significantly under-represent callback users meaning that:

a) We won't be able to make nearly as much progress since things would more likely get stuck when trying to get CTC approval.
b) Features V8 develops that use promises either internally or directly might break your coding flow, tools you use and programs. We need to know what those things are before devising a strategy for solving these issues before they make it to a release.

Especially since you're a seasoned node developer.

So I ask you to reply to my comments from #10 (comment) . I can't make you participate but I sure would appreciate it.

@rvagg
Copy link
Member

rvagg commented Feb 17, 2016

@benjamingr:

You have made it abundantly clear that you have a problem with promises

I'm trying very hard to not make this about my personal opinion on promises which is where I've obviously failed in this thread. My opinions are not a secret, they are just not particularly relevant. To get it out of the way: I choose not to use Promises for the majority of the JavaScript I write and have strong preferences about language and pattern simplicity in general. I don't like the Promises approach to handling errors (generally, not just about operational vs programmer errors), I find their state, error and continuation management far too complex to reason about in non-trivial applications, I find that they make a codebase harder to read and understand and not easier. I'm sure you've heard this all before and find it tiresome. It's all about personal preference and I have no problem with the growing popularity of Promises as a pattern that people find helpful for whatever reason and make no judgements of people that choose to adopt them as their primary asynchronous programming tool. I've been trying to dive deeper with Promises (grokking and using) in an attempt to understand the perspective of those who embrace them so completely. I continue to struggle to see the light but arguing about preferences on this matter is just as absurd as arguing about someone's preferred programming language. So let's put that aside because there's nothing much to be gained here.

Jumping in to this thread was simply an attempt to broaden the discussion and suggest that there is a larger group of stakeholders, beyond the narrow error-symposium and postmortem groups, that have a deep interest in this particular topic. And that attempts to build bridges in order to gain acceptance of having a Promises API in core would likely be assisted by recognising this and pandering to this perspective in some way if possible. Being purist about Promises and asking that everyone accept the same view on how errors should be managed will just get us bogged down further (or alternatively lead to rage quitting as has already been happening). You can't dismiss these concerns with "if you don't like the approach to error handling then don't use Promises in your code" because that's not how Node.js development works. We don't just have to deal with errors in our own code, we build applications on top of the largest package ecosystem in the world. Take it or leave it, it's just a suggestion.

Regarding my own opinion on this as a CTC member, I'm mainly interested on the question of whether or not a Promises API should be exposed by core at all, roughly what's being discussed in #21. I'm almost entirely focused on how we keep core lean and enable a healthy ecosystem of diverse needs and opinions to exist on top of it. I know I'm not alone in my obsession with "small-core" too, both inside our GitHub bubble and beyond. So it's the reasoning for why we would even go down this path that's relevant to me. Does this enhance or damage what we are trying to achieve by keeping core small? This is still an open question. Debugability, error differentiation, postmortem, AsyncWrap, etc. are all secondary issues for me and my interest in them is mostly about seeing the various stakeholders be involved and represented. I don't have nearly enough bandwidth to engage in all of the discussions that happen and am unlikely to be particularly helpful either, as I've already demonstrated.

I don't want to divert discussion away from the topic at hand here so I'm not going to engage in further discussion of my stated position. I'm going to try and engage properly on that elsewhere. Although the sheer volume of discussion at the moment is making it difficult to get anything productive done so maybe it'll have to settle down a bit before I can dive in (I've heard similar sentiments from others who are feeling stretched, so patience would be appreciated across the board).

@benjamingr
Copy link
Member

@rvagg you still haven't addressed any of the comments from my previous post at #10 (comment) just pointing that out - that's fine.

I don't like the Promises approach to handling errors (generally, not just about operational vs programmer errors), I find their state, error and continuation management far too complex to reason about in non-trivial applications, I find that they make a codebase harder to read and understand and not easier. I'm sure you've heard this all before and find it tiresome.

This is the first time I've had a callback user actually talk back to me like that. I mean that in a good way. People typically just say they don't like promises because they don't understand them or won't discuss it further - I would love to discuss it with you since you're a seasoned node developer.

I have not heard it all before and I do not find it tiresome. If you're interested I would definitely like to make a discussion out of this. Discuss what you hate about their state management, error handlers and what you find hard to reason about. That discussion could further be referenced when such claims come again and it would possibly make people who share your opinion to talk about it.

I continue to struggle to see the light but arguing about preferences on this matter is just as absurd as arguing about someone's preferred programming language.

Except this is the place where platform choices are made. If no one will argue about it we will never really gain the perspective of the other side.

Jumping in to this thread was simply an attempt to broaden the discussion and suggest that there is a larger group of stakeholders,

We appreciate it. That's why I keep asking for more.

So it's the reasoning for why we would even go down this path that's relevant to me. Does this enhance or damage what we are trying to achieve by keeping core small?

Agreed. I'm inconclusive about this myself. There are major gains and losses and I'd appreciate your participation in that discussion too.

@erights
Copy link

erights commented Feb 17, 2016

Sorry about that. I misunderstood how to use the Github labels ui. I did not intend to change the labels.

@DonutEspresso
Copy link

People have different needs when it comes to handling errors, and by association, their approach to failing fast. What's acceptable to some is not to others, but in theory that choice can be made independent of the async abstraction you choose. I think that's where the problem lies.

If we agree that there is a difference between programmer and operational errors, and that really the only difference is an approach in how you handle them (as outlined in our discussions above), then the two things I absolutely want are:

  • a clear way to distinguish between these two
  • the ability to have programmer errors (of the null ref and typo type) bring down the process
  • the ability bring down process at top of stack for post mortem (some may see this as a bonus, but it is absolutely critical for many deployments)

However, due to a more aggressive approach to error handling when using promises, it can be hazardous to make a run time determination of programmer vs operational, which leads directly to an inability to bring down the process via a fail fast strategy.

As far as I can tell, there hasn't been a great story, or even consensus within the promise community for distinguishing between the two. The most common feedback I've heard is "there isn't a difference, so don't worry about it." Unfortunately, I don't think that really helps address the concerns from those on the other side of the fence whom already do this today using the strategies discussed in this thread.

If we all agree on at least the premise, which, unless I'm reading the thread wrong, it appears that we do, then I think we've made a big step forward. Next steps would be to figure out how to tackle the three concerns above, which some of the other threads are already investigating.

@spion
Copy link

spion commented Feb 18, 2016

@DonutEspresso I would say that it is node that has a problem distinguishing programmer from operational errors and hand-waves it by saying "well, if its thrown its a programmer error". Here by node I mean the behaviour of the core libraries, but also the resulting shaky community consensus

There are many good reasons that try-catch should be avoided in node:

  1. It provides an easy post-mortem story which would be much harder (perhaps impossible?) otherwise because of the lack of filtered catch.
  2. It makes it easier to deal with exceptions, because otherwise you really have to think hard and write proper try-finally blocks.
  3. Finally, callbacks lack any exception propagating mechanism which makes thrown errors truly impossible to deal with. Infact they propagate errors the wrong way! These factors together led to the current best practice.

But "programmer errors" isn't a very good one, and the error handling document IMO doesn't do a good job to justify that claim. The resulting debate about what constitutes a valid reason to throw an error I think stems from the conflation of programmer errors and thrown errors

I wrote a simple comparison of the callback VS promise story, taking the second example from https://www.joyent.com/developers/node/design/errors as a gist here: https://gist.github.com/spion/60e852c4f0fff929b894 . It outlines some of the problems with the claims in the error handling document, and how promises approach things in a very different way.

@winterland1989
Copy link

This is the first time I've had a callback user actually talk back to me like that. I mean that in a good way. People typically just say they don't like promises because they don't understand them or won't discuss it further - I would love to discuss it with you since you're a seasoned node developer.

Just like in #21 (comment) i proposed, Promise is not the only way to describe the dependence graph of asynchronous operations , Promise get widely accepted because its monodic interface, but that's not hiding its internal complexities.

People don't understand Promise internal often misuse them in variety ways. For example, some implementation do not encourage use new to create instance because it's expensive, the second callback parameter to then is not recommended because it won't save you in case the first callback fail, all this gotchas are abstraction leaking IMO. Promise is too complex to became a basic async primitive, it solve callback problem in a opinionated way.

@groundwater
Copy link

I think @DonutEspresso has echoed my concerns and priorities quite well. Promises push expected and unexpected errors through the same channel, which for many people is undesired behavior. A sufficient example of "unexpected error" is that a function being called has a ReferenceError due to a mistyped variable name. I doubt anyone defensively codes around that.

Unless the Promise API is going to change, the solution for those wishing to avoid catching unexpected errors is to never catch, and abort on unhandled rejections. I think @chrisdickinson proposed a great solution using the recovery object approach. It let's people decide whether to use try/catch control flow, or to use a multi/special return values to indicate error.

This is the only solution so far that accommodates all sides. I and likely many of us are not interested in having a battle over what is the right way to use (or not use) Promises. It is easy to accommodate both sides. I would prefer to talk about problems you might encounter with a "hybrid" solution, like what are the pathological cases of mixing libraries that use both techniques.

@kriskowal
Copy link

@groundwater I suspect that the real solution to the problem of distinguishing programmer errors has very little to do with promises and rejections, as it can be solved instead by aborting, not on unhandled rejection, but on the reference error or type error itself. For these cases, even sync try/catch can interfere with post-mortem analysis.

@petkaantonov
Copy link
Contributor

@groundwater this thread is full of examples where node is using the throw channel for operational errors and errback for programmer errors. The problem with this approach is that the channel is chosen by the error producer and not the consumer who is the only one who can actually know. There are exceptions to this like VM thrown referenceerrors, but these (#NotAllErrors) don't change the problems with producer decision.

Errors are objects that can carry arbitrary context and data, they allow consumer to decide after the error is thrown. This is currently not feasible for post mortem users (who I understand are keeping promises blacklisted and would keep them blacklisted until possible) but it will be when the try catch is augmented with ability to define predicate functions/expressions as guards for the catch clause. Predicate can be defined not to accept e.g. ReferenceError and the behavior would be in that case same as if you didn't have a try catch statement in the first place.

You are not accommodating promises by changing their API, that's unacceptable.

@benjamingr
Copy link
Member

@groundwater namely - the following are all potential operational errors currently thrown by node:

  • Out of memory.
  • Parsing invalid JSON.
  • Attempting to bind to a closed UDP socket.
  • Listening to an "invalid" port.
  • If no error event handler is synchronously attached:
    • HttpAgent addRequest if createSocket fails.
    • HttpAgent removeSocket if createSocket fails.
    • new ClientRequest if createConnection fails.
    • Pushing to a Readable stream or unshifting from it if the chunk is invalid
    • A writing after a stream is closed in _stream_writable
    • Reading an invalid/non-string buffer chunk (validChunk)
    • A _tlsError in tls.
    • dgram in lookup
    • FSWatcher in a bunch of places
    • I'm going to stop now, but there's a very long list at https://github.com/nodejs/node/pull/5251/files
  • QueryString.decode in an edge case (it try/catchs twice first - that file is weird)
  • There are tens more of these - just look for throw in /lib.

Moreover, there are a lot of operational errors that manifest as synchronous errors:

Example 1:

  • I read a JSON object (or a csv) - everything is fine and it doesn't throw
  • I pass the result to ReadStream, it requires a string or an object but my JSON object contained a boolean.

Example 2:

  • I read a value from the database and expect it to be a postive number - I got null back
  • I pass the result to ReadStream as options.start and it throws synchronously

Example 3:

  • I read all the DNS values to look up from a CSV the user gave me - all are strings except one where the external JSON REST API I'm relying on that I wrote for getting the host names returned a number by mistake
  • I perform a dns.lookup and it throws synchronously that hostname must be string or falsey.

Example 4:

  • I get a url a user entered in a form on my website and sent me through a POST request.
  • The user entered is empty so the client doesn't send it in the JSON code - I get an empty field and it gets deserialized to undefined.
  • I parse it with Url.prototype.parse and it throws a TypeError synchronously.

I came up with all these examples on the spot here - if you'd like I'll gladly come with 10 more. All of these are what I believe (do correct me if I'm wrong) are operational errors (the last is debateable since I think that servers should sanitize for that). All are recoverable.

What @petkaantonov is trying to say is the distinction is not clear cut at the producer side (code raising the error). The consumer (the user consuming the error) always knows what they want or don't want to recover from. The reason exceptions are a part of languages in the first place is exactly in order to move the decision process to the consumers of the errors who understand their responsibilities better than the producers.

@benjamingr
Copy link
Member

I have moderated the off-topic posts away. Please feel free to open a separate issue if you have concerns about how promises would look like in core - this thread is for debating operational errors vs. programmer errors and how/if we can make this differentiation as well as address concerns by post-mortem wg people related to them.

@winterland1989 I'd appreciate if you stop linking to your library, we've all seen it already - if you want to suggest its inclusion in the language core please do so in ESDiscuss, if you'd like to suggest its inclusion in Node core please open an issue at https://github.com/nodejs/node. In all honesty the language TC has already mostly reviewed and rejected the idea - and Node would not even consider it until it has significant userland adoption and a very good reasoning on why it cannot be done as well in userland.

You're very welcome to participate in this discussion on-topic. Namely - me @groundwater @kriskowal and @petkaantonov are having an interesting discussion on whether or not synchronicity can be reliably used as an indication of programmer errors in Node and we'd love more perspectives.

@taylorhakes
Copy link

@DonutEspresso expressed exactly my concerns. The distinction between programmer errors and operational errors is incredibly important. My program should not continue if I hit a programmer error. In my code I take the approach of never throwing or rejecting a Promise. That makes operational errors much easier to handle and I leave programmer errors to .catch. Yes there are APIs that unfortunately throw as @benjamingr listed. I tend to try/catch a single line in those circumstances.

Is anybody suggesting that more places should throw errors? That would make all errors go through the same mechanism. But you would be forced to write synchronous code with a bunch of try/catch statements

try {
  doSomething();
} catch (e) {
  handleError1(e);
} 
try {
  doAnotherThing();
} catch (e) {
  handleError2(e);
}

Promises have pushed us farther in that direction. We have one mechanism to handle async errors, but I would say that it's a bad mechanism. try/catch/.catch currently can't support disguising between different Error types. It just catches all errors and forces the developer to be smart enough to distinguish, which gets harder the farther away from the error. I have seen many developers make the mistake of not checking the error type in .catch. You can say this is an education thing, but it happens a lot even with experienced developers.

Hopefully we will eventually get type guards on .catch or catch (e) {, but how does that fix the current problems? Those types of language changes could be years away. I am not comfortable suggesting more use of try/catch when I feel it will only lead to more confusion and bugs for developers.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests