Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primer on low-level stack and control-flow primitives #105

Open
RossTate opened this issue Mar 24, 2020 · 8 comments
Open

Primer on low-level stack and control-flow primitives #105

RossTate opened this issue Mar 24, 2020 · 8 comments

Comments

@RossTate
Copy link
Contributor

RossTate commented Mar 24, 2020

During all this conversation, I've been reading up on the various stack and control primitives out there. These are the low-level primitives that exceptions, stack tracing, gc-root marking, generators, resumable exceptions, and continuation delimiting can be and often are implemented with. They have been found to often work will across module boundaries and even across language boundaries (or at least to facilitate cross-boundary coordination). Given their low-level nature, they don't require introducing any new value types. So I thought I'd review them here for y'all and attempt to illustrate how they'd work in and contribute to WebAssembly.

Simple Stack Marking

Let's start with helping languages manage their own memory. One key part of garbage collection is finding all the roots, i.e. all the garbage-collectable references currently on the stack. Yes you could do this by maintaining your own list, but this list gets changed a lot, and it's only needed relatively infrequently—only when the garbage collector runs. So that's not particularly efficient. What you really want is a way to walk the current stack and mark all the references on it on demand. (Here I'm assuming that code occasionally calls gc_collect, explicitly handing over control, rather than the gc firing at arbitrary times.)

We'll do this using a stack mark:
mark gcroot : [] -> [i32]
This declares a new kind of stack mark called gcroot that provides an i32 (with no input needed), which conceptually is a root on the stack. This declaration would be up in the events section, not at the instruction level.

Next we'll actually mark the stack:

void foo() {
    i32 newref = gcalloc(12);
    with-value-mark (gcroot newref) {
        ... // do stuff with this mark on the stack
    }
}

Here we allocate some garbage-collected memory and assign the address to newref. Now this address is sitting on foo's stack frame, and so we need to make sure its referenced memory isn't cleaned up until foo returns. The with-value-mark (gcroot newref) {...} construct places a gcroot mark with value newref on the stack (often using code regions to indicate which instructions are marked, imposing little to no run-time overhead) and keeps it there until the ... in the body is exited. In particular, any call to gc_collect while that ... body is still executing will see the value of newref on the stack as a gcroot mark. Which leaves the question, how does gc_collect observe that mark?

Stack Walking

Now let's dive into gc_collect, or really its helper function collect_roots:

void collect_roots() {
    walk-stack {
        while (true) {
            next-mark gcroot {
                add_reachable(get-mark);
            } none {
                break;
            }
        }
    }
}

Here we have three new constructs: walk-stack, next-mark, and get-mark.
The construct next-mark is only usable within walk-stack, and the construct get-mark is only usable within next-mark.

What walk-stack does is store a pointer into the current stack, starting with the present point in the stack, indicating where we currently are in the stack walk. next-mark then walks up the stack looking for marks. Here it has just one mark type specified, gcroot, but in general there could be many mark types specified. Once next-mark finds a mark of one of the specified types, it updates the pointer into the stack, and then it enters the body corresponding to that tag. Within that body, whenever get-mark is executed it returns the payload of that mark onto the stack. If at some point next-mark is executed and unable to find a matching mark, it executes the none body.

So collect_roots effectively iterates through all the gcroot marks on the stack and calls add_reachable on each of the respective i32 addresses, returning when the top of the stack is reached.

Advanced Stack Marking

Now this is a little inefficient. Often a function references many roots at a time, and this will process them one by one. So let's optimize this example, first by understanding how with-value-mark is actually a shorthand.

Remember that the declaration mark gcroot : [] -> [i32] looks a lot like an input-output operation, not just a value. That's because it is. with-value-mark is just shorthand for "whenever someone does get-mark on this mark, provide this value".

So with that in mind, let's expand foo above:

void foo() {
    i32 newref = gcalloc(12);
    mark gcroot {
        push newref;
    } within {
        ... // do stuff with this mark on the stack
    }
}

Here we see that the body of mark gcroot { push newref; } is just a block of type [] -> [i32], matching the corresponding declaration. Whenever get-mark is called, it hands the current pointer into the stack to this block so that the block can access its stack frame while executing on top of the overall stack (rather than creating a new stack). Once this block terminates, its frame is popped off the stack. So this gives us a way to temporarily execute blocks from stack frames up the stack.

So let's take advantage of that a redesign our mark to be gcaddroots : [] -> [], and consider a new function foo2 with multiple roots:

void foo2() {
    i32 newref1 = gcalloc(12);
    i32 newref2 = gcalloc(24);
    mark gcaddroots {
        add_reachable(newref1);
        add_reachable(newref2);
    } within {
        ... // do stuff with this mark on the stack
    }
}

Here gcaddroots directly takes care of adding both references to the reachable set. This in turn makes collect_roots simpler:

void collect_roots() {
    walk-stack {
        while (true) {
            next-mark gcaddroots {
                get-mark;
            } none {
                break;
            }
        }
    }
}

So with lower-level stack primitives and control-flow primitives, we can enable much better support for manual memory management. Hopefully it's clear how this same primitives could enable one to implement get_current_stack_trace() and to even place stack-trace marks that indicate how WebAssembly code corresponds to source code so that the result of get_current_stack_trace() is something that a non-wasm expert could actually use to debug their source code.

Stack Unwinding

Now for the second set of stack primitives used to implement stack unwinding. Stack unwinding is the process of cleaning up state as one pops frames off the stack. This clean up takes the form of destructors in C++ and finally clauses in many other languages. The most familiar instigators of cleanup are function returns and thrown exceptions. As we'll see, exceptions are really many primitives put together. I'll use C# because it makes it easiest to see how to break exceptions down into these primitives.

Escaping

Consider the following C# function:

int bar() {
    try {
        return some_func();
    } catch {
        return uhoh();
    }
}

It uses the unqualified catch clause to catch both C++ and C# exceptions (at the cost of not being able to inspect the exception).

Here's how we would compile this to lower primitives (ignoring C++'s nested rethrow for now):

mark cpp_handler : [i32, i32] -> [] // rtti and value
mark csharp_handler : [csharpexn] -> []

int bar() {
    escape $target {
        mark cpp_handler(i32 rtti, i32 val) {
            escape-to $target;
        } within {
            mark csharp_handler(csharpexn exn) {
                escape-to $target;
            } within {
                return some_func();
	    }
        } 
    } hatch {
        return uhoh();
    }
}

void cpp_throw(i32 rtti, i32 val) {
    walk-stack {
        while (true) {
            next-mark cpp_handler {
                get-mark(rtti, val);
            }
        }
    }
}

void csharp_throw(csharpexn exn) {
    walk-stack {
        while (true) {
            next-mark csharp_handler {
                get-mark(exn);
            }
        }
    }
}

Here we introduce two new primitives: escape-hatch and escape-to. The body of escape $target executes normally unless/until an escape-to $target is reached, at which point all of the stack between those two points is unwound (starting from escape-to) and then redirected to the hatch clause. This in particular means the hatch clause is never executed if escape-to $target is never executed. It also means that the wasm program has control over when the stack is unwound and at the same time guarantees nothing is ever executed on an invalid stack.

If we look at the implementations of cpp_throw/csharp_throw, we see they work by walking up the stack until they find a handler. Note that cpp_throw/csharp_throw do not cause the stack to be unwound. Instead, as you can see in the lowering of bar, the handler is responsible for this, which is important for implementing more flexible exceptions (discussed next). If the handler doesn't do this, then cpp_throw/csharp_throw continue on to the next handler until they reach the top of the stack. Notice that next-mark doesn't have a none case here, which means cpp_throw/csharp_throw trap if they reach the top of the stack. Conveniently, all of the stack is still intact, so a debugger can kick in to let the programmer inspect the intact state of the program and cause of the exception, or the host can collect the stack-trace marks to report the trace of the exception.

Handing Off State

Okay, before we get into filtered exceptions, let's first illustrate how to hand off state to the escape-hatch:

int bar2() {
    try {
        return some_func();
    } catch (Exception e) {
        return uhoh(e);
    }
}

lowers to

int bar2() {
    escape $target {
        mark cpp_handler(i32 rtti, i32 val) {
            if (RuntimeCompatibility.WrapNonExceptionThrows)
                escape-to(new RuntimeWrappedException(rtti, val))  $target;
        } within {
            mark csharp_handler(csharpexn exn) {
                escape-to(exn) $target;
            } within {
                return some_func();
	    }
        } 
    } hatch (csharpexn e) {
        return uhoh(e);
    }
}

Here we see that escape-to can hand off values to the hatch. We also see a case of a handler not escaping in cpp_handler. There's a flag in C# (nowadays set to true by default) that makes catch (Exception e) even catch (wrapped) C++ exceptions. If that flag is set, then the cpp_handler wraps the exception and forwards it to the hatch. If not, then the cpp_handler does nothing so that throw_cpp continues on to a "real" cpp_handler.

Filtering Exceptions

Because we have given the wasm program more control, without any more changes we can support C#'s filtered exceptions. Consider the following example:

class MyException extends Exception { int type; }
int bar3(int i) {
    try {
        return some_func();
    } catch (MyException e) when (e.type == i) {
        return uhoh(e);
    }
}

The when clause indicates that the exception should only be caught when that condition evaluates to true. This clause can be stateful and so is specified to be evaluated before any finally clauses or destructors. (Here)[https://thomaslevesque.com/2015/06/21/exception-filters-in-c-6/]'s a blog post on why this is useful. (Hint: better debugging support is one reason.) As a consequence, we lower this program to the following:

int bar3(int i) {
    escape $target {
        mark csharp_handler(csharpexn exn) {
            if (exn is myexception && exn.type == i)
                escape-to(exn) $target;
        } within {
            return some_func();
        } 
    } hatch (myexception e) {
        return uhoh(e);
    }
}

Notice that no unwinding is done if the condition fails. We also didn't have to change csharp_throw to support any of this; csharp_throw just walks up the stack firing off potential exception handlers until one unwinds the stack (just like what a standard VM's implementation of exceptions does).

Stack Conventions

I say that we unwind the stack, but what does that mean? As a low-level primitive, it just means move the stack pointer and reclaim the stack. Technically that is it. But conventionally stack unwinding also means executing finally clauses and C++ destructors. Rather than baking in such a convention, let us modify escape/hatch to directly support it:

mark unwinder : [] -> [];

int bar4(int i) {
    escape $target {
        mark csharp_handler(csharpexn exn) {
            if (exn is myexception && exn.type == i)
                escape-to(exn) $target;
        } within {
            return some_func();
        } 
    } unwind unwinder {
        get-tag;
    } hatch (myexception e) {
        return uhoh(e);
    }
}

We have added an unwind clause to bar4. This forces escapes to $target to walk the stack as it is unwound, now frame by frame rather than all at once. Whenever the given mark is encountered, in this case unwinder, then the corresponding unwind clause is executed (while the stack frame for the mark is still in tact). If the escape-to clause provides a value of type t to pass to hatch, then the unwind clause technically has type [t] -> [t], giving it the chance to update this value, which can be useful in particular for collecting stack-trace marks while the stack is unwound. In the case of bar4, we don't care about updating the exception e, so we just use get-tag to execute the mark. In this way, the escape in bar4 effectively calls all unwinder marks as it unwinds the stack.

Another lesser-known convention exists in languages with more complex control flow. Consider how a stack walk essentially maintains the stack but shifts the control focus of the program up the stack. Sometimes it is useful for function calls on the stack to be able to track this focus. For example, a C++ program maintains it's own stack in addition to the wasm stack, and so it might be helpful to also track what point within that separate stack corresponds to the current focal point of the stack walk.

Let's modify our garbage-collection helper collect_roots to permit such a convention:

void collect_roots() {
    walk-stack {
        while (true) {
            next-mark gcaddroots {
                get-mark;
            } focus-out {
                get-mark;
            } none {
                break;
            }
        }
        while (true) {
            prev-mark focus-in {
                get-mark;
            } none {
                break;
            }
        }
    }
}

This modification invokes focus-out marks on the way out so that they can observe that the stack walk has moved out of that part of the stack. After the walk reaches the top of the stack, it then walks back and invokes focus-in marks to conceptually revert any changes that were made by the focus-out marks.

Especially with these conventions, this pattern of simply invoking marks as one passes them is extremely common. So let's use stack-walk [next-pass*] [prev-pass*] and escape [unwind-pass*] as shorthand for simply invoking get-mark whenever a get-next/get-prev/unwind passes a mark in the respective lists. This shorthand lets us abbreviate collect_roots to just the following:

void collect_roots() {
    walk-stack [gcaddroots, focus-out] [focus-in] {
        next-mark none {}
        prev-mark none {}
    }
}

Nested Rethrow

At this point we have all the core functionality needed to implement a variety of features. I need to get to bed, so I'll edit this with some more non-exception examples later, but first let me illustrate how this supports C++'s nested rethrow. Actually, I'll show it supports Python's nested rethrow because, unlike C++, Python actually has a stack-tracing convention that's worth demonstrating.

To see this convention, if you run the following program:

def failure():
    raise ValueError, "Failure"
def refailure():
    raise
def fail():
    try:
        failure()
    except ValueError:
        refailure()
fail()

then you will get the following stack trace:

Traceback (most recent call last):
  File "main.py", line 10, in <module>
    fail()
  File "main.py", line 9, in fail
    refailure()
  File "main.py", line 7, in fail
    failure()
  File "main.py", line 2, in failure
    raise ValueError, "Failure"
ValueError: Failure

Notice that both failure and refailure are in the trace. This illustrates that Python builds the stack trace as it goes. Note: if you add prints, you'll also see that the stack for failure is unwound before refailure executes.

Here is how we can compile this (assuming for simplicity that all Python exceptions are ValueErrors and all we care about is the trace):

event code_line : [] -> [string, i32, string, string]
event python_handler : [pyref, string] -> []
event python_except : [] -> [pyref, string]
void fail() {
    escape $target {
        mark python_handle(pyref exn, string trace_so_far) {
            if (exn == ValueError)
                escape-to(exn, trace_so_far) $target;
        } within {
            with-value-mark (code_line "main.py", 7, "fail", "failure()") {
                failure();
            }
        }
    } hatch(pyref exn, string trace_so_far) {
        with-mark-value (python_except exn trace_so_far) {
            with-value-mark (code_line "main.py", 9, "fail", "refailure()") {
                refailure();
            }
        }
    }
}
void raise_with_trace(pyref exn, string trace) {
    walk-stack {
        while (true) {
            next-mark python_handle {
                get-mark(exn, trace);
            } code_line {
                [string file, i32 line, string body, string code] = get-mark;
                trace = file + line + body + code + trace (with formatting);
            }
        }
    }
}
void raise_new(pyref exn) { raise_with_trace(exn, ""); }
void reraise() {
    pyref exn;
    String trace;
    walk-stack {
        next-mark python_except {
            exn, trace = get-mark;
        }
    }
    raise_with_trace(exn, trace);
}

There are three things to note here. First, raise_with_trace also observes the code_line marks in order to collect the stack trace as it walks up the stack to find a handler (which then is handed the stack trace up to that point). Second, in the lowering of fail the call to refailure is executed within a python_except mark. Third, the reraise function implementing Python's rethrow walks up the stack looking for such a python_except mark that provides the exception that is most recently caught. Altogether we get an implementation of Python's nested-reraise construct with its own semantics for stack tracing (that differs from both Java's and C#'s), and we didn't have to add anything new for it. Of course, the same technique can be done/adapted for C++'s nested-reraise construct.

Generators

A number of languages feature generators. There are a few variations on this concept, so I'll focus on C# and Python generators. These have a foreach (x in list) {...} construct that executes its body for each element "in" the list, and a generator-method construct in which yield value can be used to conveniently yield the elements "in" the list. Although foreach is often implemented by translating to for (IEnumerator enum = list.GetEnumerator(); enum.MoveNext(); ) { x = enum.Current; ...; } and by converting a generator-method into a state machine that dynamically allocates and updates the state to simulate control flow, the two features can be matched together to make for a much more efficient implementation of this common pattern. For simplicity, I'll assume every IEnumerable has a Generate() method that directly executes the yield statements rather than transforming them into a state machine.

Let's start by lowering the following C#-ish code:

void baz(IEnumerable list, String header) {
    foreach (Object elem in list) {
        print(header);
        println(elem);
    }
    println("done");
}

to the following:

mark foreach : [ref] -> [];
void baz(IEnumerable list, String header) {
    mark foreach(ref elem) {
        print(header);
        println(elem);
    } within {
        list.Generate();
    }
    println("done");
}

Next let's lower the body of some generating code:

void GenerateInts(int x, int y) {
    for (int i = 0; i < x; i++)
         yield new Integer(i);
    for (int j = 0; j < y; j++)
         yield new Integer(i);
}

to the following:

void GenerateInts(int x, int y) {
    stack-walk {
        next-mark foreach {
            for (int i = 0; i < x; i++)
                 get-mark(new Integer(i));
            for (int j = 0; j < y; j++)
                 get-mark(new Integer(j));
        }
    }
}

So the foreach sets up a foreach mark on the stack within which it calls list.Generate(), which let's say eventually calls to GenerateInts(5, 4). Then GenerateInts walks up the stack to find that (first) foreach mark and executes its body, using get-mark to execute the foreach body with each yielded value. This pattern effectively implements the standard stack-sharing implementation of generators, again without needing to introduce any more primitives—without even references.

Stack-Allocated Closures

Let's review for a sec. We have this mechanism for walking the stack, which gets us a pointer into the stack. We have this mechanism for getting a mark, which gets a code pointer. The two together effectively make a stack-allocated closure, and get-mark is just calling that closure. (Technically, this is slightly different from a stack-allocated closure, but the analogy is close enough for our purposes.) Because the closure is allocated on the stack, rather than on the heap, we have to make sure our reference to it does not outlive the stack frame it sits upon. This is why this pair is not a first-class value and why get-mark is restricted to being used within a stack-walk and next-mark.

So stack walking and marks give us a way to go up and get stack-allocated closures, but it's also useful to be able to send stack-allocated closures (i.e. code pointer and into-stack pointer pair) down. I won't go into how this can be used to improve performance of higher-level languages, since that's still unpublished, but I will demonstrate how this can be used to optimize generators further and to help implement continuations.

First, let's extend functions with higher-order parameters so that one can write declarations like (higher-param $sac (i32 i32) (i32 i32)) after parameter declarations like (param $p i32) and before (result ...). These higher-order parameters are stack-allocated closures and consequently are invocable, let's say via higher-local.invoke $sac.

Using this, let's rewrite the lowering of GenerateInts to use stack-allocated closures:

void GenerateInts(int x, int y, void foreach(ref)) {
    for (int i = 0; i < x; i++)
        higher-local.invoke foreach (new Integer (i));
    for (int j = 0; j < y; j++)
        higher-local.invoke foreach (new Integer (j));
}

Notice that GeneratesInts no longer has to walk the stack, which was the slowest part of its implementation before. It just invokes its higher-order parameter foreach. Note, though, that it does not treat foreach as a value (i.e. there's no higher-order.get), which guarantees the lifetime of foreach will not exceed the lifetime of the call to GenerateInts.

Next let's rewrite the lowering of baz to instead assume IEnumerable.Generate takes a stack-allocated closure:

void baz(IEnumerable list, String header) {
    higher-order.let foreach(ref elem) be {
        print(header);
        println(elem);
    } in {
        list.Generate(foreach);
    }
    println("done");
}

Here we declare a new higher-order closure using higher-order.let (it's unsound to have a higher-local.set) and hand it off to list.Generate—no need for a stack mark anymore. This doesn't involve any dynamic allocation despite the fact that foreach closes over the local parameter header because that local parameter is sitting on the stack frame and the lifetime of foreach is guaranteed to last only as long as the stack frame. In other words, the stack frame is its closure environment.

So we can even more efficiently implement generators with higher-order parameters and stack-allocated closures, but can also combine first-class stacks with higher-order parameters to implement one-shot continuations.

First-Class Stacks

So far the mental model has been that there is one stack for the program, but it turns out that everything described works even if have multiple stacks calling into (or really returning to) each other. Such functionality lets us implement all sorts of interesting things like lightweight threads or even continuations.

Stack Type

The stack type is stack (param ti*) (result to*), where ti* is the inputs the stack is expecting and to* is the outputs the stack will produce when/if it eventually returns. These stacks are just big state machines. We can run this state machine to its completion to get values returned, but it turns out to be much more useful for the state machine to pause on occasion, and this pausing is much more safely done if it's done voluntarily by the state machine rather than forcefully by some external entity.

Stack Allocation

To allocate a stack, one uses stack.alloc $f : [t*] -> [(stack (param ti*) (result to*))] where $f is a func (param ti* t*) (higher-param () (ti*)) (result to*). This creates a stack whose initial stack frame is the function $f along with some of its inputs and a special higher-order input. The higher-param with no inputs is to be used by $f to pause the newly allocated stack. stack.alloc initializes the portion of the stack frame corresponding to that higher-param to be (the pair of) the host-level code pointer for pausing a stack and the pointer to the freshly created stack. Since a non-executing stack is always awaiting ti* inputs, this "pause" will return back to $f the inputs it was given.

One thing to note is that the t* could include mutable references, which $f can use to externally expose changes to the stack's internal state as it desires. So while these stacks might seem like a big black box, whoever created the stack has the ability to open up as much of that black box as it wants. Of course, this all assumes we have a way to make the stack change.

Stack Stepping

After we allocate a stack, it's just sitting there. To make it actually do something, just use stack.step $l : [(stack (param ti*) (result to*)) ti*] -> [] where $l is a block from to*. This provides the stack with the ti* inputs and causes it to progress until either the "pause" higher-param is called from within the unknown enclosed $f, in which case stack.step returns normally, or until the unknown enclosed $f returns, in which case control is transferred to $l with the returned values on the stack.

(Note that, for simplicity, I'm putting aside the complication that every stack needs to be guarded by a lock so that two threads cannot step the same stack at the same top. Later on I'll show how to adjust the instructions to better accommodate that constraint.)

Putting these together, we can implement lightweight cooperative worker threads:

mark yielder : [] -> [];
mark spawner : [$thread_state] -> [];
mark thread-locals : [] -> [$thread_state];
$result do_work($thread_state) {...} // might call yield, spawn, and get_thread_locals
$result join($result, $result) {...}
$result main($thread_state work, $result init) {
    stack_list = new List<(stack (param) (result $result))>();
    stack_list.add(new_worker(work));
    while (!stack_list.is_empty()) {
        worker = stack_list.dequeue();
        mark spawner($thread_state more_work) {
            stack_list.add(new_worker(more_work));
        } within {
            stack.step(worker) { // block to run on return
                product = pop; // pop $result off stack
                init = join(init, product);
                continue;
            }
        }
        stack_list.enqueue(worker);
    }
    return init;
}
(stack (param) (result $result)) new_worker($thread_state work) {
    local.get $work;
    stack.alloc $worker_body;
    return;
}
$result worker_body($thread_state state, void pause()) {
    mark thread-locals() {
        local.get $state;
    } within {
        mark yielder() {
            pause();
        } within {
            return do_work(state);
        }
    }
}
void yield() { walk-stack { next-mark yielder { get-mark(); }}}
void spawn($thread_state work) { walk-stack { next-mark spawner { get-mark(work); }}}
$thread_state get_thread_locals() { walk-stack { next-mark thread-locals { return get-mark(); }}}

Here main maintains a queue of workers, with each worker being a stack. It repeatedly pops a stack off the queue, makes it take a step, requeueing the stack if it doesn't finish, and aggregating the result of its work with the ongoing result if it completes. Before it forces the stack to take a step, it sets up a spawner stack mark to catch any stack walks done by a call to spawn within the worker stack, adding any provided workers to the queue.

The function new_worker's job is solely to create the stack, which it does by bundling the given $thread_state with the function worker_body, whose body is really where all the interesting stuff happens. Looking inside worker_body, we see that it sets up two stack marks. The thread-locals stack mark simply provides the state of the thread. The yielder stack mark invokes the provided pause higher-order parameter. This means that calls to yield() within do_work will walk up the stack, execute this mark, thereby running pause, which stack.alloc has set up to cause the stack to pause. Note that, if the language compiling to wasm wants to support lightweight threads with very efficient yielding, i.e not having to do a stack walk to find the "pause", then these primitives would also support another implementation strategy in which all compiled functions take a pause higher-order parameter and pass "pause" down to whomever they call. That is, we are not forcing a particular implementation strategy upon the language implementer; we are giving them the low-level primitives to choose their own.

Hopefully this illustrates how these low-level primitives combine together to support high-level patterns like algebraic effects. If the "event" of the algebraic effect has an input, the analog of worker_body would make its yielder mark update the $thread_state with that input. But we can also do a lot more than what algebraic effects support!

Stack Extension

In order to allocate a stack (param ti*) (result to*) we allocated a partial stack frame for a function that pauses and otherwise converts ti* into to*. Once we have such a stack, if it's not executing then we know that it's current stack frame is awaiting some ti*s. So we can extend the stack with another stack frame that will return ti*s. In order to preserve the type of the stack, this stack frame must itself be awaiting some ti*s. This suggests the following primitive:

stack.extend $f : [(stack (param ti*) (result to*)) t*] -> []

where $f is a func (param ti* t*) (higher-param () (ti*)) (result ti*).

This turns out to be surprisingly useful, especially for doing stack inspection. For example, suppose we wanted to combine our lightweight threads with our collect_roots implementation for program-managed garbage collection. The problem is that main has a lot of stacks representing worker threads that refer to root addresses that need to be collected. We can use stack extension to address this.

$result main($thread_state work, $result init) {
    stack_list = new List<(stack (param) (result $result))>();
    mark gcaddroots() {
        foreach (worker in stack_list) {
            stack.extend(worker) $collect_worker_roots;
            stack-wall {
                stack.step(worker);
            }
        }
    } within {
        ... // same loop as before (maybe add call to gc_collect)
    }
    return init;
}
void collect_worker_roots(void pause()) {
    collect_roots();
    pause();
}

The revised main sets up its gcaddroots mark to go through each of the stacks in stack_list and collect their roots. It does so by adding a stack frame for collect_worker_roots onto each stack, stepping the stack to cause it to call collect_roots, which then walks that stack to run gcaddroots along it, and then to pause, thereby returning control back to main. The stack-wall construct is just an optimization that makes stack walks think they have hit the top of the stack at that point so that each of these calls to collect_roots within the thread stacks don't each rewalk main's stack. stack-wall is also useful for security purposes as it prevents callees from observing the marks on the caller's stack and prevents even observing the time it takes to walk the caller's stack.

Notice that the above example does a stack.extend and then a stack.step. There is actually one primitive that combines these two together and is in fact slightly stronger and lower-level than the combination:

stack.extend_and_step $f $l : [(stack (param ti*) (result to*)) t*] -> []

where $f is a func (param t*) (higher-param () (ti*)) (result ti*) and $l is a block from to*. As it sounds, this extends the stack frame with $f and steps the stack in one go. Notice that, unlike with stack.extend, $f does not take ti* as inputs, and that, unlike stack.step, there need not be any ti* on the stack. So stack.extend essentially adds a stack frame that first pauses to get more inputs and then calls the provided function, whereas stack.step adds a stack frame with the provided inputs for the function that simply returns those inputs.

What is particularly useful about stack.extend_and_step, besides being more true to what actually happens at the low level, is that it gives you a way to step the stack without providing the expected inputs for the stack, as in the following variant where the worker threads each expect an i32.

$result main($thread_state work, $result init) {
    stack_list = new List<(stack (param i32) (result $result))>();
    mark gcaddroots() {
        foreach (worker in stack_list) {
            stack-wall {
                stack.extend_and_step(worker) $collect_worker_roots; // no i32 provided
            }
        }
    } within {
        ... // same loop as before (maybe add call to gc_collect)
    }
    return init;
}
void collect_worker_roots(i32 pause()) {
    collect_roots();
    return pause();
}

Stack Running

In both stack allocation and extension we were careful to make sure no local stack state could be caught in the first-class stack. The reason is that even if we were to immediately step the first-class stack, it could pause and outlive the local stack frame. But, although pausing is the key feature of first-class stacks, sometimes one reaches a point where they simply want to run a first-class stack to completion, disabling pausing. This effectively fuses (the lifetime of) the first-class stack with (that of) the local stack, and as such we can safely permit local state to be infused into the first-class stack.

We achieve this functionality with the following primitive:

stack.run {
    ... // code to run on the end of the stack, returning ti* into the stack
} pausing {
    ... // code to run whenever the stack would have paused, returning ti* back into the stack
} : [(stack (param ti*) (result to*))] -> [to*]

This conceptually mounts the given first-class stack on the local stack, extends the first-class stack with the frame for the main body of stack.run, and then resumes the first-class stack but executing the pausing clause whenever the first-class stack would have paused, eventually returning the result values of the first-class stack. The trick with the pausing clauses is most straightforwardly implemented by allocating space within each stack for a stack-allocated closure that is initially null but which the stack-allocated closure that is originally passed to pause first checks and defers to if the value is no longer null.

We can use this in our thread-workers example to forcibly abort threads. That is, suppose it is possible for the main loop to recognize early that the $result has already been determined, e.g. because it's computing a big conjunction and some worker has resulted in false, making it unnecessary to continue executing the remaining threads. Those threads might still be holding onto resources, though, so it is important to clean them up. We can do so by adding the following after the main loop:

foreach (worker in stack_list) {
    escape $target [unwinder] {
        stack.run(worker) {
            escape-to $target;
        } pausing {
            escape-to $target;
        }
    } hatch {}
}

Stack Acquiring

Earlier I deferred the issue that first-class stacks need to have associated locks to prevent someone from, say, trying to extend a stack with a frame while another program is stepping the stack. We could have every stack instruction acquire and release this lock, but that seems inefficient and doesn't address the fact that one thread might want to guarantee that a number of operations happen in direct succession. So instead we add the construct stack.acquire($s := expr) {...}. This pops off the result of expr, which should be a stack, acquires the lock on that stack, binds a local-variable-of-sorts $s to that stack, executes the ... body within which $s is in scope, and then releases the lock on the stack. It's input and output types are the same as the body ... plus an additional stack input.

Now $s is not really a local variable. It is simply names a way to refer to the stack.acquire. So the second thing we do is modify all of the above instructions to refer to $s rather than take a stack as input. That way these instructions do not need to acquire/release the stack's lock because they know that the stack is already acquired.

Let's illustrate this new construct by showing how we can use it to append an entire stack, not just a single stack frame, onto a stack:

void stack_append((stack (param ti*) (result to*)) s, (stack (param ti*) (result ti*)) sapp) {
    stack.acquire($s := s) {
        local.get sapp;
        stack.extend_and_step $s $stack_append_helper;
    }
}
ti* stack_append_helper((stack (param ti*) (result ti*)) sapp, ti* pause()) {
    stack.acquire($sapp := sapp) {
        stack.run $sapp {
            pause();
        } pausing {
            pause();
        }
    }
}

There is a lot going on here. First, stack_append acquires the lock on s, the stack to be appended to. Then it extends s with the stack frame for stack_append_helper, capturing sapp as the argument to the correspond parameter, which it then steps into, effectively starting the execution of stack_append_helper. This simply acquires the lock on sapp, and then runs sapp but immediatle pauses execution to get the values to provide to sapp, restoring control to stack_append. The acquire on s then completes, releasing the lock, and stack_append returns. So by the end of this, no one is holding the lock on the stack s, but the stack frame added onto s is holding the lock to sapp. Furthermore, the next time someone tries to step s, the value will be returned to sapp instead, while will either run to completion or, due to the pausing clause, cause s to pause whenever sapp would have paused.

Altogether, semantically speaking after stack_append completes it is as if the entirety of sapp has been mounted onto s and locked into place, combining the two into one. Furthermore, one design choice I made in the above primitives is that this is completely equivalent to permanently acquiring the lock on sapp and then directly copying each of the stack frames on sapp onto s one by one. In other words, unless you were the one to create a stack boundary, you cannot observe a stack boundary.

(Side note: you can use the technique above to also compose stacks of types [ti*] -> [t*] and [t*] -> [to*] into a stack of type [ti*] -> [to*], just like you can do with closures.)

Stack Duplication

Lastly, for stack duplication we mainly just add the instruction stack.duplicate $s : [] -> [(stack (param ti*) (result to*))] where $s references a stack.acquire of a stack (param ti*) (result to*). This primarily just duplicates the stack via memcpy. However, there is one complication.

The stack being duplicated can be in the midst of a step or run of another stack, in which case it has acquired the lock on that stack. We cannot have two stacks holding onto the same lock. So we have to duplicate that stack as well. In other words, while a stack is stepping or running another stack, the two are temporarily conceptually one stack and so must both be duplicated.

As a convention, after duplicating a stack, one could walk the duplicate and run each mark duplicator : [] -> [] on it. These duplicator marks could update the stack frame to duplicate relevant resources (or trap if impossible) so that the duplicate is reasonably independent of the original. For example, the stack could be in the middle of a stack.acquire on the stack held in some local variable $x, and the duplicator may want to update $x to reference the duplicated stack. For this reason, we also add stack.get $s that returns the stack currently acquired by the referenced stack.acquire, which might not be the stack value that was first passed to the stack.acquire because a stack.duplicate might have happened. It is an odd corner case, but from what I can tell this works out and provides the right primitives for supporting multi-shot continuations.

Wrapping Up

So what do y'all think of these primitives? Obviously these are a lot, but we don't have to add them all at once. Especially the ones about first-class stacks don't even make sense to add until after gc, since stacks will need to be memory managed. But I thought y'all would appreciate seeing a roadmap for this functionality and how it all fits together.

@binji
Copy link
Member

binji commented Mar 30, 2020

I think stack walking is a pretty interesting primitive, and would generally be useful to add to WebAssembly.

There's a lot of functionality described here, so we'd want to think about which features would need to be in a first version. It seems that the basic functionality does overlap with the current exception-handling proposal, so I'm curious whether this primitive is intended to replace or complement what is currently specified in the exception-handling repo. I also wonder whether the initial primitives (walk-stack, next-mark, get-mark, mark/within, escape-to) have uses outside of implementing exception handling. Perhaps debugging?

Also, I didn't see much mention of how these primitives interact with host frames. Can you go into that a little bit?

@arjunguha
Copy link

I've talked to Ross about this separately. I think this idea is really promising, and could enable garbage collectors written in WebAssembly to perform reasonably well.

I worked on a similar idea in the context of education several years ago:

http://blog.brownplt.org/2013/02/19/teaching-gc.html

In this project, the goal was to allow students to write garbage collection algorithms in Scheme (now Racket), for Scheme programs, without the need to understand low-level details of the machine and compiler. So, we came up with a high-level API for walking the stack and storing values in linear memory.

CC @shriram

@RossTate
Copy link
Contributor Author

RossTate commented Apr 6, 2020

There's a lot of functionality described here, so we'd want to think about which features would need to be in a first version.

Definitely. I like to work out a detailed roadmap and then carve out waypoints. Certainly I would not expect first-class stacks to be part of a first version. Stack-allocated closures I could see going either way, though I suspect they'd be pushed to a second stage. The "initial" primitives you mention seem to me like the likely first batch.

I also wonder whether the initial primitives have uses outside of implementing exception handling.

@arjunguha actually first had the idea for these primitives months ago in order to support custom garbage collection, such as through my collect_roots examples. Generators are another pattern that I've been wondering about for a while that these primitives address (plus a more efficient alternative if you add stack-allocated closures). Stack tracing itself is a feature that is not necessarily tied to exception handling, with many languages providing a function for getting the current stack trace, and these primitives give a module much more control over this feature.

But those are all examples listed above. Another common application of stack marks is for implementing dynamically scoped variables (e.g. for LaTeX, bash, and Emacs Lisp). You can also use stack-allocated closures (or simulate stack-allocated closures with stack walking) to implement C#'s out and ref parameters that let a callee modify local variables or interior fields in place. Another application is lazy argument passing, such as Scala's lazy keyword, for which stack-allocated closures are not necessary but are more efficient.

Perhaps debugging?

There's actually a lot of interesting opportunities here. I can imagine a debugger-frame stack mark that the debugger can interact with to visualize and even update the stack frame in terms of the source code rather than the wasm code. You can imagine some community-developed protocol for getting the line of code, getting the names of local variables, getting a string representation of the values of those variables, and updating the value of a variable to match a new string representation. Since the stack mark has access to the local stack frame, all of this can be done within wasm, rather than by the host, in a safe manner. The debugger itself could even be implemented within wasm rather than being the responsibility of the host; a breakpoint would just be a call to debugger() which would then walk the stack to get the debugger-frame stack marks. Of course, host-supported debugger would be more efficient and powerful, but it is at least interesting to gauge what can be done with these primitives.

It seems that the basic functionality does overlap with the current exception-handling proposal, so I'm curious whether this primitive is intended to replace or complement what is currently specified in the exception-handling repo.

There definitely is overlap. Except for the allocation and tag-inspection of exnrefs, the exception-handling proposal can be translated into the primitives here. But it is certainly not my intent to replace the exception-handling proposal; I don't think that would really make sense. Rather, my current concern is that the two proposals would interfere with each other, so my intent with respect to exception handling is to encourage considering how to make the two compatible and ideally symbiotic.

Also, I didn't see much mention of how these primitives interact with host frames. Can you go into that a little bit?

Stack walking only observes (aside from side channels like timing) the specified stack marks. This means it skips over stack marks it does not understand and stack frames (like host frames) without the specified stack marks. Of course, the host is the one implementing stack walking, so it can have its own behind-the-scenes handlers for when focus goes up and down the stack, and likewise for when a stack is unwound.

@bitwalker
Copy link

Just wanted to chime in and give my two cents on what aspects of this are of interest to at least one language implementation:

For context, I'm working on an ahead-of-time compiler for Erlang (and really BEAM languages in general that are able to compile to Erlang AST or Core representations). Wasm is really what motivated the project originally (and still is), but our current implementation targets x86_64, while we figure out the best approach to adding support for Wasm.

There are a few features whose common implementation techniques (on other architectures) are not easily lowered to Wasm:

  • Non-local escapes; exceptions are one form of that, but more generally I'm talking about control flow, e.g. continuations/stackful coroutines/backtracking
  • Green threads and preemptive scheduling (the type of scheduling isn't super important here, just an attempt to distinguish between generators/stackless coroutines and proper green threads)
  • Effects (this is out of my wheelhouse, but my understanding is that these share a lot in common with continuations, but are maybe better classed as non-local control flow)
  • Some GC strategies (namely using the stack as a young generation heap, Cheney-on-the-MTA-style)

For my own purposes, non-local control flow and green threads are of primary interest, and are the areas that present the most difficulty in lowering to Wasm versus other architectures.

For exceptions, it's acceptable in my case to use C++-style exceptions, even though that punishes Erlang programs a bit, since exceptions are used quite frequently for control flow (Erlang distinguishes between its three main "types" of exceptions: errors, exits, and throws; throws are used almost exclusively for control flow).

On the other hand though, creating green threads, suspending/yielding them, and scheduling are of critical importance - and because the scheduler is part of the runtime system, changes in how green threads are represented means changing the scheduler, so its extremely desirable to keep differences in representation as non-invasive as possible. The current way we lower green threads on non-Wasm architectures is by creating independent stacks, and suspend/resume them using assembly at various pre-determined yield points in the compiler-generated code. Ideally then, Wasm would have primitives to create stacks, destroy them, and swap between them (or more precisely, suspend execution on one stack, resume on another). This would include treating the "original" stack as just another stack (sans destroy), since the scheduler is really just another green thread, albeit a privileged one.

Within the context of a single stack, either being able to use stack stepping/unwinding primitives, or being able to save/restore a point in the stack, would provide a straightforward way of implementing non-local control flow.

Put together, this above set of primitives would only require relatively minor adaptations in both the compiler backend and runtime system to support Wasm as a new target, and I suspect that applies to just about any language that has a similar set of requirements.

The only other problem is understanding how these features interact with debuggers and other such tooling. On x86_64, sticking to conventions and making use of CFI directives can at least make sure compiled programs play nicely with a debugger; but it's not clear to me how that translates to Wasm.

@RossTate
Copy link
Contributor Author

RossTate commented Apr 7, 2020

Thanks @bitwalker for engaging and for the detailed description of how this might help with Erlang (a particularly useful example since it wasn't one I had in mind when putting this together)! Would you mind educating us a bit more on Erlang's exceptions and either how they differ from C++ exceptions or how they would be aided by these primitives? Admittedly I know little about Erlang exceptions.

@bitwalker
Copy link

@RossTate Sure!

C++ exceptions, or to be more specific, the zero-cost exception handling model specified by the Itanium ABI; add no runtime overhead when not used, but are relatively expensive to raise. That expense comes from unwinding the stack frame by frame, executing search and cleanup code in each frame with a landing pad. The original Wasm exception handling proposal mostly follows this model, though if I recall correctly, it doesn't explicitly support the two-phase approach described by the Itanium ABI, as it doesn't do the cleanup phase, but otherwise is more or less the same.

Erlang has three kinds of "errors":

  • exit: raised by using the erlang:exit/1 builtin, and is more akin to a signal, but is used to exit an Erlang green thread (called a process) either normally or abnormally. It is possible to catch exits using Erlang's catch keyword, but they are different than other errors in semantics. I only mention them here because in order to catch exits, they have to piggy back on the same exception handling mechanism as the other errors.
  • error: raised by using the erlang:error/1 builtin, and is essentially equivalent to exceptions in most languages, i.e. used for truly exceptional conditions, things you don't expect callers to handle
  • throw: unsurprisingly, raised by use of the throw keyword, and commonly used for control flow, e.g. indirect branches/goto, by "throwing" control from some point down the stack up to a matching catch. Throws are meant to be caught by a caller and handled. Erlang has no explicit looping constructs, so it depends heavily on recursion. An example use case for throw can be found in parsers, which tend to be deeply recursive, so their implementation can be made a lot cleaner by allowing internal parsing functions to throw when EOF or invalid input is reached. This is done rather than returning in those cases, since that would force every function to explicitly handle them, resulting in a lot of unnecessarily duplicated code. The use of throw for control flow like this is so common, that if the exception handling model penalizes the use of exceptions, it has noticeable performance implications.

As implemented in the BEAM virtual machine, all three errors use the same basic mechanism: the VM searches up the stack for the nearest catch pointer (which is a pointer to the instruction where control will resume), if one is found, it jumps to the instruction indicated in the pointer and resumes execution. If no catch pointer is found, and the start of the processes' stack is reached, the throw is converted to a nocatch error. In the HiPE compiler for Erlang (as well as my own compiler), it is represented slightly differently, instead using essentially double-barreled continuations (i.e. a return continuation, and an escape continuation), so rather than searching up the stack, the catch pointer is already on hand in a register.

In short, Erlang exceptions don't need to unwind and potentially execute code in every frame along the way; instead control is transferred directly to the instruction indicated by the catch pointer, only needing to search the stack for that catch pointer if it isn't already on hand. The result is really cheap exceptions. Using them for control flow comes with very little overhead.

Where things get tricky with Erlang though is in the implementation of processes (green threads). Specifically, transferring control from yield points in the generated code, to the scheduler; which then switches to another process and resumes control where that process left off (or in the case of a newly spawned process, its entry point). As it stands today, I'm not even sure how to do this in a way that doesn't require a completely different code generation approach and a separate scheduler implementation. For example, on the codegen side, one could convert from direct-style to continuation-passing style, split functions into many small ones at yield points and pass the next function to call as the continuation, then "yield" by unwinding all the way back to the scheduler, and resume execution again by calling the continuation function. While viable, it is entirely different to how codegen works on other platforms, and differs significantly in how generated code integrates with the scheduler. That's a pretty significant hurdle in my case, but I think it applies generally to any language implementation facing similar constraints, and which aims to support Wasm as a target, but isn't the only target.

I think it may be easiest to illustrate the fundamental primitives that I think solve the above problems by expressing them as functions (NOTE: these are not concrete proposals, just sketches):

// an opaque value that represents a continuation handle, can be passed around, stored, etc.
extern type Cont;
// an opaque value that represents a handle to a stack, can be passed around, stored, etc.
extern type Stack;
// an entry function takes no arguments and never returns
type Entry = fn() -> !;

/// Captures the current continuation and returns a handle to it.
///
/// Can be called with `call_cont`, but only once - subsequent attempts to 
/// call the continuation would trap.
///
/// Continuations are associated with the stack they are captured on
fn capture_cc() -> Cont;

/// Transfers control to the captured continuation, this "consumes" the continuation. 
///
/// Trying to call a continuation twice will trap.
/// Trying to call a continuation on a stack other than the one it was captured on will trap
fn call_cc(cont: Cont) -> !;

/// Creates a new stack, with a given size, and entry point.
///
/// The entry point is a function pointer which will be used as the continuation
/// when control is transfered to the stack for the first time.
///
/// Returns a handle to the newly created stack
fn create_stack(size: usize, entry: Entry) -> Stack;

/// Creates a stack created via `create_stack`, traps if the given stack is the current stack
fn destroy_stack(stack: Stack);

/// Captures the current continuation to the current stack, 
/// swaps the current stack with the new stack, and resumes 
/// execution on the new stack using the continuation stored on the new stack
fn swap_stack(new_stack: Stack) -> !;

The main purpose of the capture_cc/call_cc primitives is to allow implementing continuation-based control flow without having to expose any low-level implementation details in Wasm itself, the host decides how to implement them. You could just as easily call these setjmp/longjmp, since semantically they are close to the same thing. Essentially though, they are first class, opaque, values and with them you can implement all kinds of interesting control abstractions (including exceptions).

The other three primitives are designed to play well with continuations, in such a way that they are safe to use, avoid exposing too much low-level detail, while (I believe) still allowing you to implement all of the interesting things you'd want first-class stacks for. It may make sense for these to be more granular, but I think keeping the interface smaller provides more flexibility to host implementations without damaging their usefulness. I think there is value in some of the stack walking/etc. primitives discussed already in this thread, so I'm not rehashing them here.

Anyway, sorry this turned into a huge brain dump! I want to avoid making arguments based too much on the needs of my compiler, but hopefully knowing the specific problems I'm facing, it provides some useful information for the discussion :)

@RossTate
Copy link
Contributor Author

RossTate commented Apr 9, 2020

Thanks! That was useful. Definitely don't apologize for the brain dump; it's the sort of the thing we want a peek into. Now I want to talk to just a couple points that particularly stood out to me.

One is that, for Erlang, it is very useful to escape/hatch without unwinding. That's particularly important to know, because it suggests a practical constraint on how these primitives should be implemented as well as formal constraint on how they are design. In particular, above I speak about stack conventions like finally clauses, and ask whether these should be always enforced or simply conventions. In order to escape/hatch without unwinding, they would have to be just conventions, otherwise we'd have to walk the stack to make sure there are no finally clauses even though Erlang programs would not use them.

Another point I should make is that the primitives above do not support capture_cc in its "raw" form. Conceptually, you can think of a wasm program as starting on a host's stack. It's important that you not be able to capture the host's stack. Of course, it's easy enough to make the main your program allocate a stack that then the rest of your program runs on. capture_cc would then capture the continuation up to that point. I imagine that still suits your needs; I just thought it would be worthwhile to point the discrepancy and rationale out.

@bitwalker
Copy link

In particular, above I speak about stack conventions like finally clauses, and ask whether these should be always enforced or simply conventions.

Yeah my take is that you definitely don't want to assume very much at all about the exception model - i.e., don't assume C++-style exceptions, or require that a given language build their exceptions on top of them.

Another point I should make is that the primitives above do not support capture_cc in its "raw" form. [snip] It's important that you not be able to capture the host's stack.

Yeah, definitely, my expectation would be that the primitives here are not operating on the host stack, but the Wasm stack. I would also expect that by default, a stack is created that your program runs on, so it isn't necessary to do anything by default, but as soon as you create a second stack, you can switch between them (this implies a current_stack primitive I suppose).

--

Another thing I think may be of interest: LLVM has an interesting set of primitives that I think could be worth comparing/contrasting against, and my understanding is that they are intended specifically to model most kinds of structured exceptions. Check out the link, but to summarize here:

  • invoke; invoke <callee> <args> to label <block> unwind label <block>: a call which can throw an exception, to label <block> indicates the block where control resumes when no exception is raised, and unwind label <block> indicates the block containing the landing pad for when an exception is raised
  • catchswitch; %cs1 = catchswitch within none [label %handler0, label %handler1] unwind to caller or %cs2 = catchswitch within %parenthandler [label %handler0] unwind label %cleanup: used to describe the set of possible catch handlers that may be executed by the exception handling personality function. This instruction transfers control to one of the handlers, or continues to unwind if the unwind label is given.
  • catchpad; <resultval> = catchpad within <catchswitch> [<args>*]: This instruction is used to specify that the containing block begins a catch handler where the personality function attempts to transfer control to catch an exception. When the call stack is being unwound because of an exception, the exception is compared against the args. If it doesn’t match, control will not reach the catchpad instruction. The args and their meaning are defined by the personality function used.
  • catchret; catchret from %catch label %continue: The first argument to a ‘catchret’ indicates which catchpad it exits. The second argument to a ‘catchret’ specifies where control will transfer to next.
  • cleanupad; <resultval> = cleanuppad within <parent> [<args>*]: Specifies that the containing basic block is a cleanup block, where the personality function tries to transfer control to perform cleanup. The args are arguments to the personality function. The result is a token that is used to match the cleanupad to corresponding cleanupret instructions. The parent is the token of a containing funclet, or none if not contained in a funclet. When the call stack is being unwound with an exception, the personality function transfers control to the cleanuppad, using the provided arguments.
  • cleanupret; cleanupret from %cleanup unwind to caller or cleanupret from %cleanup unwind label %continue: This instruction indicates to the personality function that one cleanuppad it transferred control to has ended. It transfers control to %continue or unwinds to the caller.

Here's an example from the docs for how C++-style exceptions would use these primitives:

// Source code we're lowering
struct Cleanup {
  Cleanup();
  ~Cleanup();
  int m;
};
void may_throw();
int f() noexcept {
  try {
    Cleanup obj;
    may_throw();
  } catch (int e) {
    may_throw();
    return e;
  }
  return 0;
}
define i32 @f() nounwind personality i32 (...)* @__cxx_personality {
entry:
  %obj = alloca %struct.Cleanup, align 4
  %e = alloca i32, align 4
  %call = invoke %struct.Cleanup* @"Cleanup"(%struct.Cleanup* nonnull %obj)
          to label %invoke.cont unwind label %lpad.catch

invoke.cont:                                      ; preds = %entry
  invoke void @"may_throw"()
          to label %invoke.cont.2 unwind label %lpad.cleanup

invoke.cont.2:                                    ; preds = %invoke.cont
  call void @"~Cleanup"(%struct.Cleanup* nonnull %obj) nounwind
  br label %return

return:                                           ; preds = %invoke.cont.3, %invoke.cont.2
  %retval.0 = phi i32 [ 0, %invoke.cont.2 ], [ %3, %invoke.cont.3 ]
  ret i32 %retval.0

lpad.cleanup:                                     ; preds = %invoke.cont.2
  %0 = cleanuppad within none []
  call void @"??1Cleanup@@QEAA@XZ"(%struct.Cleanup* nonnull %obj) nounwind
  cleanupret %0 unwind label %lpad.catch

lpad.catch:                                       ; preds = %lpad.cleanup, %entry
  %1 = catchswitch within none [label %catch.body] unwind label %lpad.terminate

catch.body:                                       ; preds = %lpad.catch
  %catch = catchpad within %1 [%rtti.TypeDescriptor2* @"??_R0H@8", i32 0, i32* %e]
  invoke void @"may_throw"()
          to label %invoke.cont.3 unwind label %lpad.terminate

invoke.cont.3:                                    ; preds = %catch.body
  %3 = load i32, i32* %e, align 4
  catchret from %catch to label %return

lpad.terminate:                                   ; preds = %catch.body, %lpad.catch
  cleanuppad within none []
  call void @"abort"
  unreachable
}

I cleaned up a few of the names to make it clearer to read, but the key thing it illustrates is how a compiler can lower a normal try/catch and also handle cleanup actions that need to happen during unwinding. The nice thing here is that you don't need to use the cleanup instructions if you don't need them. The main difference here from WebAssembly is the personality function, which is the thing responsible for walking the stack and deciding what to do in each frame - but with your proposed primitives, it would be straightforward to define a personality function and associate it with each function lowered in that language.

ioannad pushed a commit to ioannad/exception-handling that referenced this issue Jun 6, 2020
* [spec] Add  exec text

* Update document/core/exec/instructions.rst

Co-Authored-By: Andreas Rossberg <[email protected]>

* Update document/core/exec/instructions.rst

Co-Authored-By: Andreas Rossberg <[email protected]>

* Update document/core/exec/instructions.rst

Co-Authored-By: Andreas Rossberg <[email protected]>

* Apply suggestions from code review

Co-Authored-By: Andreas Rossberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants