Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: using only catch_all to catch all exceptions #31

Closed
aheejin opened this issue Oct 16, 2017 · 8 comments
Closed

Proposal: using only catch_all to catch all exceptions #31

aheejin opened this issue Oct 16, 2017 · 8 comments

Comments

@aheejin
Copy link
Member

aheejin commented Oct 16, 2017

TL;DR

The current form of separate catch instructions (catch i, catch j, ...,
catch_all) is very hard to generate from the compiler's side and possibly
detrimental for code size and performance. So I propose to merge all catch
blocks into one catch_all block that handles all tags (meaning both C++
exceptions and foreign exceptions).

Problem

Suppose we have this original C++ code.

try {
  ...
} catch (int e) {
  action 1;
} catch (...) {
  action 2;
}

action 1 and action 2 can be arbitrary user code.

Itanium C++ ABI
specifies that catch (...) clause should be able to catch not only C++
exceptions but also foreign exceptions that are not generated from one of C++
catch clauses. It may not be necessary that we should strictly follow the
Itanium ABI spec, it makes most sense for catch (...) to handle also foreign
exceptions anyway because that's the only way for a C++ programmer to specify
some action when it catches a foreign exception. That means, when there is
catch (...), we should generate a catch_all instruction. Then the generated
Wasm code will look like, in pseudocode,

block $label0
  try
    ...
  catch i
    if (int e)
      action 1
    else
      br $label0
  catch_all
    br $label0
  end
end
action 2

Here action 2 part is factored out so that it can be shared between catch i
and catch_all in order to prevent code duplications. But whether we duplicate
action 2 part of the code or factor it out, the requirement is that we should
be able to know which part of the code corresponds to the catch (...) so we
can generate correct code by factoring out or duplication.

Let's see another case. When the original C++ code is like

MyClass obj;
try {
  ...
} catch (int e) {
  action 1;
}

There is no catch (...) in this code, but that means we should generate
cleanup code to call the destructor for obj. And that cleanup code should
run regardless of whether we catch a C++ exception or a foreign exception. So,
it should be either duplicated or factored out as well:

block $label0
  catch <c++>
    if (int e)
      action 1
    else
      br $label0
  catch_all
    br $label0
  end
end
delete obj
rethrow

Now we face the same problem: we should know which part of the code corresponds
to corresponds to the cleanup code.

Separating code within catch (...) by examining and pattern matching LLVM IR
(or any other compiler's IR) is not always possible because code can be
transformed or optimized in many different ways. Windows EH developers once
tried it and failed. Windows EH
requires identifying not only catch (...) but also all the catch clauses, but
the problem here is inherently the same. Pattern matching cleanup code is also
not simple, because from the IR's point of view, they are just function
(desturctor) calls.

Can this be done if we demarcate these parts in clang (or more generally,
frontend code generation phase)? The answer looks, maybe yes, but it will be
much hard, and I'm not sure if it's worth it. Basically what we need is the way
to demarcate some code parts, and prevent any code entering or escaping from
that regions in all of the IR-level passes and backend level passes. Code
hoisting or sinking across the boundaries should not occur in any pass, and
instruction scheduling in backend should treat these boundaries as fences for
not only memory instructions but also all instructions. Windows EH developers
faced similar problems and came up with new specialized
instructions
,
but their objectives were different - they did this because they didn't use
Itanium ABI and they had to satisfy MSVC's spec -, it does not look possible to
reuse their approaches. Also, there will be more work that has to be done: such
as, matching each landing pad to its parent scope's cleanup code.

Even if it is possible by creating new instructions and doing more work on clang
side, it will also prevent code optimization opportunities, because it basically
separates certain parts of code and does not allow any optimization across their
border. For example, shared expressions may not be able to be merged.

Proposal

Considering the amount of work that needs to be done to satisfy the current spec
and the expected downside of code size and performance degradation, I think
having one catch_all instruction that handles all exceptions is the best way
to go. Actually we can do this even with the current spec by only using
catch_all and not using other catch tag instructions, but that brings
another point: is catch tag instruction ever useful?

To use only catch_all, there should be a way to tell if the current exception
is a C++ exception or not within a catch_all clause. While I think it can be
done by setting some variable within some libcxxabi functions (such as
__cxa_throw and __cxa_begin_catch), it would still be better if there is an
easy way to access the currently caught exception's tag within a catch_all
block. Maybe catch_all block can return the caught exception's tag.

Even if catch_all instruction does not put an exception object on top of Wasm
stack, there are ways we can relay an exception object from throw to
catch_all: one possible way is to use Wasm global. throw instruction sets a
Wasm global with the pointer to an exception object so within a catch_all
block we can retrieve it.

The only possible downside of this scheme is, when a foreign exception is thrown
and there is no cleanup code to run for a certain stack frame, anyway it should
stop at that frame because it is caught by catch_all instructions. But I
hardly imagine this case will be common enough to affect performance.

@dschuff
Copy link
Member

dschuff commented Oct 24, 2017

I think this problem is even bigger than this explanation makes it appear.
In particular, we want to have the property that a JS exception can unwind through the C++ call frames without causing memory leaks or otherwise messing up the program state. Today, (because exception handling is disabled by default due to its cost), if JS code called by C++ throws, the stack will unwind and no destructors will be called, leaking memory or worse. So it's not just a matter of properly handling catch(...), it's also allowing C++ code to continue to work properly in the face of JS exceptions, even if it does not want to use EH itself. As @aheejin says, in order to implement this we will probably just end up not using the tag mechanism at all for C++. So we should consider where it would be useful, and if we want to support it (It might still be useful for GC'd langauges that do not need to run destructors in every frame).
This proposal essentially wants a way to dynamically determine the exception type, whereas you could say that #30 wants a dynamic exception object that escapes the catch block (or perhaps a dynamic implicit "current exception" or stack of exceptions). Technically they are orthogonal but it seems like there could be a solution that solves both. Having these would make working around the issue in #29 much more straightforward, although I still think there would be a binary size win (whether that win would justify the added complexity remains to be seen). Anyway addressing this issue and #30 would go a long way to making things much better for C++.

@rossberg
Copy link
Member

rossberg commented Oct 24, 2017 via email

@eholk
Copy link
Contributor

eholk commented Oct 25, 2017

Have we considered adding a finally block? Would that address this problem as well?

@aheejin
Copy link
Member Author

aheejin commented Oct 26, 2017

@eholk I don't understand how finally will help the situation. We put some actions that have to be done regardless whether a exception is thrown or not, and C++ doesn't have that kind of actions anyway. This is about infeasibility of separating the catch (...) part, which we should run only when 1. we catch C++ exception whose type is not specified 2. we catch a foreign exception), and cleanup part, which we should run whenever we catch some exception. finally is a way to specify actions that has to be run even if we don't catch any exception.

@aheejin
Copy link
Member Author

aheejin commented Oct 26, 2017

@rossberg I could be wrong, but I guess what @dschuff meant was,

To make C++ destructors be correctly called when a JS exception is thrown,

  1. We should enable exceptions.
  2. If we were to use both catch <C++ tag> and catch_all, there should be a way to separate cleanup code from other parts of code, so we can put only that part in catch_all.
  3. There is no feasible way to do 2.

@eholk
Copy link
Contributor

eholk commented Oct 27, 2017

C++ does have code that has to run whether an exception is thrown or now: destructors. With finally, one possibility would be to wrap an entire function with a try ... finally block which is where all the destructor code goes. Then this would run at function exit both in normal returns or exceptional cases.

@aheejin
Copy link
Member Author

aheejin commented Oct 27, 2017

@eholk That wouldn't work because

  1. As I said yesterday (in person), it may not be impossible but very infeasible to separate some region of code as catch (...) or cleanup (= destructor calls). Basically we need a mechanism to demarcate the start and the end of a code region and we should prevent any kinds of code hoisting/sinking/optimization across that boundaries. This is not only infeasible but also detrimental for the code size and performance.

  2. Not all desturctors run at the end of a function. A C++ function has many scopes and each scope has different sets of destructors to run. So that means we need a big enclosing try ... finally not per a function but per a scope. But we don't have the scope information at the backend, where try ... finally ... end_trys are generated. So we need a way to mark some region of code for each scope, which is not only complex but also would boil down to the same code separation thing as in 1.

@aheejin
Copy link
Member Author

aheejin commented Dec 2, 2018

Closing this, since we decided to use the second version of the proposal that uses a single catch clause.

@aheejin aheejin closed this as completed Dec 2, 2018
ioannad pushed a commit to ioannad/exception-handling that referenced this issue Jun 6, 2020
…mory.init`, `memory.drop`, `table.init`, and `table.drop` identify passive segments; only that they index a valid segment. (WebAssembly#31)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants