-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal on the spec changes #29
Comments
Thanks for explaining this so well, I think I can understand the problem much better now. I believe you (or perhaps @dschuff) mentioned that another solution would be to provide an |
@binji Oh, I think I was confused about the code size thing when I was talking about the new scheme in the meeting today. (The code size was an advantage of this new scheme compared to some other alternative, which is the same but in this case Anyway, the arguments against to that solution are:
|
I actually support this proposal (with minor tweaks). In response to @binji, the current implementation allows neither catchless tries, nor tryless catches. However, I think it would not be that hard to add both to V8. The runtime support for these concepts are basically needed even in the current implementation. To handle rethrows, which have a label defining the enclosing try block to rethrow, a stack of possible thrown exceptions need to be kept. This stack should be easy to extend to handle the generalization of labeled try blocks (where the label defines the catch block to apply if an instruction (like a call) throws). The minor nit is the numbering of try blocks. I would prefer the use of block nesting to be consistent with other uses of labels (including rethrow). |
@KarlSchimpf The reasons I chose to only count
If we count all the block nests, the code will look like this:
Do you think it makes sense? |
I understand the motivation for this proposal from the point of view of a CFG-based producer. However, from a language perspective its implications are a bit more profound than may be apparent at first. One of the defining characteristics of structured exception handling is the following. If you have a program, and pick any instruction sequence instr* in it, then you can wrap that into a handler, try instr* catch … end and you are guaranteed that any exception produced by instr* will be caught by the handler. That sort of composability is what makes it structured. It is important to observe that this proposal destroys that basic property: you can now throw exceptions such that they bypass surrounding handlers: plainly, In particular, it is not true that this form of try is just equivalent to a try with a rethrow, as suggested in the motivation. A rethrow by itself cannot bypass intermediate handlers (those would need to “consent” by ways of being transformed themselves). Another view on the proposal is that it introduces a new form of control transfer that is neither a branch nor a throw. An instruction like try N instr* end essentially means try instr* catch_all (br_to_try_with_current_exception N) end There are several things that are special about this new form of “branch": it can only occur in a handler, it can only target a try block, and it magically takes an “unhandled" exception with it. Introducing a new form of control transfer is not a small thing, especially when it is rather special-cased and has no precedent in existing languages. It is not immediately clear how it will interfere with other possible extensions, how it affects transformability of Wasm code itself, and to what extent the loss of structure makes reasoning more difficult. I think it will take some time to properly investigate and understand the semantics, implications and trade-offs of such a proposal, and be confident that it doesn't paint us into some corner. To be honest, right now I wouldn't feel quite comfortable with it just yet (you probably have guessed so by now :) ). Out of interest, were there other possible solutions that have been discussed? On a higher level it seems like the underlying motivation here is to work around limitations on the way the “current exception” can be used in the exception proposal: it is always scoped implicitly to a catch block and cannot escape it — if it could you might be able to use conventional branches and nested blocks. Perhaps there are ways to relax that without resorting to control-flow extensions? FWIW, I agree with @KarlSchimpf that the immediate should be an ordinary label -- consistency is important (I didn't understand the code size argument -- how does the indexing scheme affect it?). I also think that this new form should be a separate instruction from an ordinary try, so that it would not have catch blocks by construction, and you avoid the off-by-one discrepancy with its immediate compared to other labels (where 0 refers to the innermost enclosing one). |
Thank you very much for comments. Currently I'll try to work around the problem Original problematic code: (which is not valid)
Proposed spec in this issue:
Workaround 1: (in pseudocode)
Workaround 2: (in pseudocode)
As can be seen, both workaround 1 and 2 are expected to incur code side |
@rossberg - I want to make sure I fully understand your argument structured The difference seems to be that with wrapping instructions in a block, it is straightforward to modify the body to preserve the existing behavior or create the intended behavior. In wrapping instructions with a Is that the key distinction? |
@eholk, yes, that's one perspective. Inserting a block may require shifting some indices, but that is merely a (alpha-)renaming that keeps all structure as is. Inserting a try that works "as expected", however, generally requires structural changes to the wrapped instruction sequence under this proposal. This isn't necessarily a show stopper, but it is a significant change that requires careful consideration and modelling IMO. I kind of see why this is desirable, though. In a way, it fills a hole in the "control flow matrix": currently, an instruction can have (1) a regular or (2) an exceptional result, and an instruction can pass on results to either (A) to its implicit continuation, or (B) an explicit continuation. But in fact, only regular results can be passed to an explicit continuation, via branches. That is the 1B combination. This proposal adds the missing 2B -- it is a bit funny, and I'm not aware of any precedent, but it might still be reasonable. Here is how I would reframe this feature. Leave the existing
This executes |
An even simpler and more flexible option would be to merely add a label to
The current Such an instruction can be compiled exactly as the original proposal in the case where the label denotes a try block. |
@rossberg - One idea I was thinking of yesterday seems in line with your It sounds like a key requirement for @aheejin is to be able to rethrow an exception from outside the try block where it was caught. Having exception values is one way to do this, but it does seem to come with some cost in the VM (although I think in V8 we are already paying this). It seems like the |
@eholk, yes that's the idea. Plus, the compiler can trivially recognise the common case where the target is a try block, such that it can generate code to jump to the outer handler directly instead of actually rethrowing. |
@rossberg Thank you for the suggestion. But isn't it basically the same as In s2wasm-style wast file:
In the real encoding, this becomes
|
@rossberg Oh but I like the idea of a separate |
@aheejin, well, yes, just using a symbolic label for exposition. Point is, it works like any regular label index, especially with 0 referring to the enclosing block. Also, it can target any block, not just try. But I would actually like to draw attention away from try_br and to the second proposal: just adding a label to rethrow. That is much simpler and more focused an extension. |
This is an old thread and many of the problems here have been discussed or decided elsewhere. |
…able.copy`. (WebAssembly#29) This would make it simpler to extend those instructions to support multiple memories/tables, and copying between different memories/tables. The current encoding has a single placeholder zero byte for those instructions, which allows extension to multiple memories/tables, but would require a more complicated encoding to add two immediate indices.
Proposal on the Spec Changes
I would like to propose some changes to the current proposal.
Propsed Changes
Try with Relative Depth Argument
try
now can have a relative depth argument as in the case of branches. The'normal' try - a
try
in which calls unwind to acatch
next to thetry
-has a depth of 0.
Here are examples. For brevity, only one
catch
instruction is shown for eachtry
instruction.Catchless Try Block
When an argument (relative depth) of a
try
instruction is greater than 0, itsmatching
catch
block does not have any uses. For example,In this case, when an exception occurs within
try 1
block, the program controlis transferred to the outer
catch
block. So in this case the innercatch
block is not used, so if we do not generate this kind of
catch
blocks, it willhelp reduce the code size. Effectively, a catchless
try
block is the same as acatch
with an immediaterethrow
. So this codehas the same effect as
Actually,
try 1
would not have a real use, because code insidetry 1
wouldgo to the one-level outer catch, in which case we can just omit
try 1
andplace the call inside
try 0
outside.The relative depth argument of
try
instruction only counts the number oftry
nests: it does not count
block
orloop
nests. For example,In this case, when the
throw
instruction throws, the control is stilltransferred to the outer
catch i
block, even though now there are twoblock
nests in the code.
Motivation
Background
In LLVM IR, when a function call can throw, it is represented as an
invoke
instructionwhich has two successors in CFG: its 'normal' destination BB and 'unwind'
destination BB. When an exception does not occur, the control goes to the
'normal' BB, and when it does, the control goes to the 'unwind' BB. Here is a
couple LLVM-IR level CFG examples:
C++ code:
LLVM IR-like pseudocode:
C++ code:
LLVM IR-like pseudocode:
invoke
instructions are lowered tocall
s in the backend, but they still havea landing pad BB as their successor.
landingpad
instructions disappear in thelowering phase, and the compiler inserts a
catch
instruction in the beginningof each landing pad BB.
In terms of control flow, an
invoke
, or acall
lowered from it, is similarto that of a conditional branch
br_if
. When a branch is taken,br_if
jumpsout of the current enclosing block(s) by the number of relative depth specified
as an argmuent. When an exception is thrown within a function call, the control
flow jumps out of the current enclosing
try
block. But the difference, inthe current EH proposal, is it can only break out of a single depth, because
call
does not take a relative depth as an argument and the VM transfers thecontrol flow to the nearest matching
catch
instruction.Structured Control Flow
To make a control flow structured, there should not be an incoming edge from
outside of a block-like context (
block
,loop
, ortry
), to the middle ofit. So it is required that the first BB of a block-like context should dominate
the rest of the BBs within it (otherwise there can be an incoming edge to the
middle of the context).
In the
CFGStackify
pass,
here is how roughly
block
markers are placed:end
marker will be)predecessors.
context, walk out to the nearest scope which isn't more deeply nested. For
example,
block
marker inB
. So we walk out of thescope to reach
A
.block
marker in the discovered block (the nearest commondominator of branches or some block found by the process in 2) and place a
end
marker in BB.For loops, a loop header is by definition dominates all the BBs within the loop,
so we just place a
loop
marker there andend
marker in the latch.Problems with the Current Proposal
A try/catch block is divided into two parts: a
try
part and acatch
part.What we should do for grouping a
try
part is similar to grouping ablock
,because we also want
try
to be structured.catch
instruction islanding pad as its successor
walk out to the nearest scope that more isn't nested.
try
marker in the discovered block.(Grouping
catch
part is not covered here because it is not relevant)The problem is, unlike branches,
call
instructions do not have a relativedepth argument so cannot break out of multiple contexts. But from the nearest
common dominator to the landing pad it is possible some call instructions that
might throw unwind to outer landing pads (landing pads ouside of the nearest
common dominator of throwing calls ~ current landingpad scope) or do not unwind
to any landing pad, which means when they throw, the exception should be
propagated out to the caller. For example,
Because it is not possible for a call instruction that might throw to specify a
relative depth, or in other words, it cannot specify which landing pads to go,
in the current EH proposal, this does not work.
Why the New Scheme is Better
The only way that can make the current scheme work is to split landing pads
until all the possibly-throwing calls within a
try
block unwind to the asingle landing pad or landing pads that's in the nested context of the
try
block. Minimizing the number of split landing pads will require nontrivial CFG
analysis, but still, it is expected to increase code size compared to when we
use the new proposed scheme above.
Code Size
For a simple example, suppose we have a call that unwinds to an outer landing
pad in case it throws.
If we split this landing pad, the code will look like the below. Here we assumed
that we factored out the
some code
part in the originalcatch
part to reducecode size.
So roughly, when we split a landing pad into n landing pads, there will be n
try
s + ncatch
s + nbr
s + nend
s that have to be added.If we use our new scheme:
In the same case that we should split a landing pad into n, if we use the new
scheme, roughtly we will need to add (n-1)
try
s and (n-1)end
s. (try
s nowtake an argument, so it may take a bit more space though.)
Easier Code Generation
Generating Wasm code is considerably easier for the new scheme. For our current
scheme, the code generation wouldn't be very hard if we attach a
catch
instruction to every call that might throw, which boils down to a
try/catch
block for every call. But it is clear that we shouldn't do this and if we want
to optimize the number of split landing pads, we would need a nontrivial CFG
analysis to begin with.
And there are many cases that need separate ad-hoc handlings. For example,
there can be a loop that has two calls that unwind to different landing pads
outside of the loop:
It is not clear how to solve this case, because, already a part of a
try
isinside an existing loop but
catch
part is outside of the loop, and there areeven another call that jumps to a different landing pad that's also outside of
the loop.
There can be ways to solve this, but there are many more tricky cases. Here, the
point is, the code generation algorithm for the new scheme will be a lot easier
and very straightforward. Code generation for the new scheme can be very similar
to that of
block
marker placement inCFGStackify
. We placetry
markers ina similar way to placing
block
markers, and if there is a need to break out ofmultiple contexts at once, we can wrap those calls in a nested
try N
contextwith an appropriate depth
N
. Optimizing the number of newly addedtry N
markers will be also straightforward, because we can linearly scan the code to
check if any adjacent
try
blocks can be merged together.The text was updated successfully, but these errors were encountered: