-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuations #1252
Comments
I believe these features will be implemented as an extension (generalization) of exception handling support, so that being completed is probably where the current focus is. I agree it would be nice to have more of a roadmap for what comes afterwards, yes. |
Some folks among us are currently working on an experimental design that generalises the current exception proposal with resumption, yielding the equivalent of effect handlers, which are a structured and typable form of delimited continuations. We have a basic semantics but it is far too early for a proposal. Next steps will be prototyping it in the reference interpreter and experiment with the design and perhaps some producers (as a side product, this would yield a reference implementation and spec proxy for the exception proposal). But regardless of design specifics, the fundamental mechanism of creating, managing and switching between multiple execution stacks is not easily added to existing VMs, to say the least. This will be the far greater hurdle. There is some interest e.g. in the V8 team, but I expect that implementing it and evaluating performance viability will take considerable time. |
This comment got a bit long, but bear with me, I promise it is all useful: I'm currently working on an alternative implementation of Erlang that is based on ahead-of-time compilation rather than bytecode interpretation; the primary target is WebAssembly, but other targets will be supported eventually to support usage as a general native code compiler for BEAM languages. To implement the language and compiler, a few key requirements have to be met:
To build an ahead-of-time compiler that meets these requirements, I chose to use continuation-passing style as described in Cheney on the MTA which provides a solution that elegantly addresses each requirement. In short, here is a description of how it works and how it solves my problems as a language implementer:
None of what I described above actually requires explicit continuation support from the machine - it only requires some means by which the stack size can be checked (which we already have solutions for today, if a little hackish) and something like From the research I've done so far though, the use of One alternative is to use trampolines - i.e. rather than All this to say, I believe something is needed to make the implementation of languages with similar requirements both feasible and performant. Perhaps that means some kind of native continuation support, or perhaps it means something less robust, like supporting the @rossberg Is there a place where I can review the rough sketch of what you are working on with the exception proposal + resumption? I'd like to dig in and see if it holds a solution for me or not. If so, and I can be of any help, I'm happy to try and lend a hand as well to keep it moving along. If you have any questions (or even suggestions!) about the above, definitely let me know! |
@bitwalker What is not clear to me from the options you describe:
As an aside.. "Cheney on the MTA" is indeed elegant.. its almost like a copying collector, but unlike a full GC a list of stack frames is trivial to copy. Also love that Baker has been serving that paper in the same form from the same url for >20 years :) |
@aardappel Sure, let me clarify :)
This is true on more platforms than just WebAssembly, but the trick is that you first define an arbitrary max stack size (say, the smallest size of all the platforms you support if you don't want to conditionally compile with different constants), and then the very first thing done on entry is to take the address of something on the stack (so not technically the stack pointer, but essentially can be used as one), or get the stack pointer itself. Given your max size, the stack pointer (approximated or no), and the knowledge of whether the stack grows up or down on a given target, you can check how much buffer you have between the maximum you have defined and the stack address you hold.
You pretty much nailed one major issue here - it destroys the ability to optimize register allocation, which may not be an issue with WebAssembly (?), given its stack machine semantics, but it is definitely something I would avoid unless I had no other choice. A bigger problem is that prevents a whole host of optimizations in the compiler backend (I'm using LLVM, but I've heard the issue is the same with various compile-to-C language implementations).
It may help to understand how execution flows when scheduling. So assume we have some process (read: Erlang green thread) which is executing, and upon entry to the current function, and after checking if the stack needs to be reset, and whether a major GC is required, checks its reduction count (essentially an approximation of how much work it has done since the last yield) and sees that it needs to yield to the scheduler because it is going to exceed its quota:
So with that in mind, using a trampoline doesn't prevent this flow, instead it just imposes a costly overhead - rather than being able to call a continuation directly (really just a function pointer at this point), using arguments already on the stack; a continuation must be constructed, and live values moved to the heap, before returning to the trampoline. Doing that for every call is significantly slower than only constructing a continuation when scheduling, GC, or a stack reset occurs.
Using the stack as a young generation is a brilliant stroke I thought. And it gets better! Since we're in CPS form, you don't need to copy stack frames, the only live values that need to be copied must be in, or referenced by, arguments to the current function, anything not reachable from those arguments is dead, and effectively collected by the jump :) Before I encountered the paper, I am sad to say that I wasn't familiar with his work, but Baker is brilliant, I'm a huge fan, especially since as you say he continues to host all of his papers after all these years, essentially unchanged! |
I want to point out as well that although there is some crossover here with the tail call proposal, I feel like my point is more about arguing for supporting continuations as a language implementation strategy, one way or another. Proper tail calls is essentially the catch of course, but the key lesson behind Cheney on the MTA is that the lack of native support for tail calls can be worked around and remain relatively efficient, given the right tools. More to the point though, the current tail call proposal is flawed in my opinion, as it essentially describes a trampoline (one must incur the cost of a return + call). One of the key benefits of tail calls is that they are efficient, essentially equivalent to a local goto. Without that, I'm not sure what benefit a tail call instruction in the spec provides, beyond a compiler not needing to generate its own trampolines. Perhaps it will be possible for WebAssembly machines to optimize those instructions into "real" tail calls, but I haven't seen any discussion to that point, and it is a critical one. I'm not sure what the right solution is, but I do think a key indicator on whether WebAssembly has the right primitives is whether or not functional programs can be compiled to code which is essentially as performant as imperative counterparts (barring other issues, such as immutable data), which boils down to supporting tail calls efficiently, one way or the other. |
@bitwalker, I'm afraid I don't follow. Why do you think that there has to be a trampoline? This is certainly meant to be a regular tail call. Re resumable exceptions / effect handlers: there are some out-of-date example sketches here. Other than that we currently only have a formulation of a formal semantics, without any usable textual explanation in writing. We should probably write something up eventually, but there hasn't been much activity. |
I apologize if I misinterpreted! I was referring to this section, where the execution semantics are said to behave like a
Thanks for the link! It seems to me that this could probably be used to express the kind of control flow that I am using Either way, is there anything I can do to help? Is the issue getting a reference implementation done, are there still open questions that need research/discussion, or does it just need someone to get it through the process? |
Well, given that caller and callee can have completely different parameter lists, tail calls cannot generally avoid argument shuffling, even if it happens in registers. Of course, engines are expected to minimise that in cases where they match, but that isn't semantically observable. Nothing unusual here, that's how they usually work. @Danielhillerstrom has started hacking the experimental effect handler design into the reference interpreter as a first step, but I doubt that can be parallelised much. The next step would be preliminary experiments with targeting it, and that's where it could use all the help we can get, if people were trying to play with it, help iterating on the design, perhaps by writing an experimental compiler backend targeting it. But the hardest part by far, if we get there, will be an efficient implementation of multiple stacks / continuations (regardless of concrete design details) in a production engine, which is gonna be super tricky, given the utter complexity of existing JS/Wasm VMs. |
My main issue is with this line:
This sounds a lot like it requires performing the same work that a return + a call performs, even if the implementation isn't strictly a trampoline. The reason why I think the distinction is important is that a key feature of tail calls is that they incur no more overhead than a regular call, in fact less, since no stack frame is allocated. But if tail calls are actually doing twice as much work as a normal call, then that is a huge price to pay for languages where tail calls aren't just an optimization. I'm not necessarily saying that the proposal is requiring implementations to do it that way, but the way it is worded isn't ideal. Tail calls don't really "unwind" the stack, they overwrite their own stack frame, which I guess has the appearance of unwinding the stack, but is not the same thing. I didn't mean to derail the conversation here, but I think the spec should be explicit about these differences, unless the intention is to avoid requiring tail calls to be implemented as proper tail calls.
Awesome, is that work public currently, or is it still being hacked together?
Once it is available to play with, even if it takes some work to set up on my end, I'd be open to writing a separate WebAssembly backend for my compiler, rather than using LLVM, or perhaps even by customizing LLVM with some new intrinsics + associated lowering. WebAssembly is an important target for me, so whatever I can do to help.
Yeah, I can imagine. That said, I would think the major engines are all strongly motivated to make it work, if WebAssembly is to become as foundational as it promises to be. Have engineers on those engines commented on this yet? Is it the kind of thing that is feasible and just a matter of time, or fundamentally incompatible with how those engines work today? I have little perspective on how those engines are currently implemented, but having an idea of the constraints they are operating under would be useful context to have for sure. |
Ah, don't confuse abstract machine lingo with implementation. This refers to the operand stack within the current function, which is just a notion of the spec's abstract machine and not physically materialised in real engines. This basically says that unused operands are dropped when you tail-call. This is all handled at compile time. That said, there is no reason to believe that tail calls are faster than regular calls in general. That's a misunderstanding. Their purpose is reducing space, not time! They can well be somewhat more expensive in certain cases. (The tail call instruction in some commercial engines which shall remain unnamed are notorious for having been up to 100x slower -- I'm confident that we will do better. :) )
Of course. Everybody sees the need for some form of stack switching. That doesn't mean that everybody isn't scared... |
Apologies for missing that! That indeed clarifies why it is phrased that way. Thanks for your patience :) |
Except you can't take the address of anything on wasm stack, by design. The moment you take the address of a local variable (in LLVM or e.g. C++) the wasm code emitted changes the local variable from a wasm local to something on a shadow stack. So you can't measure the wasm stack this way. And you already said you'd be unhappy having everything on a shadow stack. You could still emulate this.. you could for each wasm function statically compute the total size of all its locals + args + max temps, and use that as a conservative approximation of stack size used, and then execute your stack cleanup (using longjmp or exceptions) once it exceeds some conservative bound you estimate will be an ok amount of wasm stack to be using on any engine. But wasm doesn't specify the min/max stack an implementation should provide, so this is very iffy. You could even try to detect what an implementation provides by running out of stack space on purpose for a variety of recursive functions, count the number of calls, catch the trap, and estimate a conservative stack space bound from that ;) Again, I think you're better off with a shadow stack, on the whole :) Frankly, a wasm module should be allowed to at least hint at the amount of stack space it wants the engine to provide, because without such a thing, you would never be able to even use recursive algorithms (that process user data, e.g. a parser) in wasm, if the stack can be arbitrarily small. Or backends would always need to force recursive algorithms to use the shadow stack. Would that make any sense @rossberg @titzer @binji ?
Ah ok, I was imagining you meant a trampoline for each call/callcc, but you mean it is ideally only used for yield and when you run out of stack space?
Yup, so many "creative" ideas, that are still relevant! |
It would be nice to provide some way to configure stack size, but I see two problems with the concrete suggestions:
Not sure how to address this. It would be nicer if the VM was just able to grow the stack. Then stack space would only be bounded by heap space. |
|
Well, perhaps an example is needed to illustrate what I mean. The following is a trivial C application that uses recursion to increment a counter until some arbitrary limit. It uses a trampoline, but only when the stack is going to blow, and it determines when to return to the trampoline by checking how much stack space it believes it has remaining: #include <emscripten.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>
#include <stddef.h>
#define STACK_BUFFER 16384
// Checks whether the stack is overflowing
static bool stack_check(void);
// The address of the top of the stack when we first saw it
// This approximates a pointer to the bottom of the stack, but
// may not necessarily _be_ the exact bottom. This is set by
// the entry point of the application
static void *stack_initial = NULL;
// Pointer to the end of the stack
static uintptr_t stack_max = (uintptr_t)NULL;
// The current stack pointer
static uintptr_t stack_ptr_val = (uintptr_t)NULL;
// The amount of stack remaining
static ptrdiff_t stack_left = 0;
// Maximum stack size of 248k
// This limit is arbitrary, but is what works for me under Node.js when compiled with emscripten
static size_t stack_limit = 1024 * 248 * 1;
static bool stack_check(void) {
// Get the address of the top of the stack
stack_ptr_val = (uintptr_t)__builtin_frame_address(0); // could also use alloca(0)
// The check fails if we have no stack remaining
// Subtraction is used because the stack grows downwards
return (stack_ptr_val - stack_max - STACK_BUFFER) <= 0;
}
#define MAX_DEPTH 50000
static int depth;
static int trampoline;
int a(int val);
int b(int val);
int a(int val) {
// Perform stack check and only recurse if we have stack
if (stack_check()) {
trampoline = 1;
return val;
} else {
depth += 1;
return b(val);
}
}
int b(int val) {
int val2;
if (depth < MAX_DEPTH) {
// Keeping recursing, but again only if we have stack
val2 = val + 1;
if (stack_check()) {
trampoline = 1;
return val2;
}
return a(val2);
} else {
return val;
}
}
int main() {
stack_initial = __builtin_frame_address(0); // could also use alloca(0)
stack_max = (uintptr_t)stack_initial - stack_limit;
double ts1, ts2 = 0;
ts1 = emscripten_get_now();
depth = 0;
trampoline = 0;
int val = 0;
while (true) {
val = a(val);
// This flag helps us distinguish between a return to reset the stack vs
// a return because we're done. Not how you'd probably do this generally,
// but this is just to demonstrate the basic mechanism
if (trampoline == 1) {
trampoline = 0;
continue;
}
break;
}
ts2 = emscripten_get_now();
printf("executed in %fms\n", ts2-ts1);
return 0;
} To see for yourself, run with Please forgive the awful code, but I sketched it out in a hurry. The basic idea should be clear I hope - if one chooses to bounce back to a trampoline only when stack is running out, this is possible in WebAssembly today, albeit with some constraints. So far in my various profiling of similar scenarios, it is basically a tie between the above approach and just an ordinary trampoline, but I suspect more realistic scenarios would show that the less frequently one has to return to the trampoline, the better. With such a small stack to work with though, the benefit is quite small unfortunately - I would expect to be working with a stack of at least 2mb, if not more, 248k is just ridiculous. As for As @rossberg said, I don't think a per-module stack size makes sense, it would need to be configurable globally, which is essentially the case today except you are configuring the global stack size for the engine itself. As far as I'm concerned, that may as well mean it isn't configurable at all, and that's fine - the actual size isn't the primary problem, though the current situation isn't ideal. Allowing the stack to grow arbitrarily large by growing into the heap seems like a viable option as well, but probably not ideal since it isn't really what anyone would expect. I would rather just see a more forgiving stack size (like I said above, at least 2mb). From the perspective of the spec, I don't think one can require a specific stack size, but I do think that a few things could be added or clarified:
For my own project, I'm going to do more profiling, but based on what I'm seeing, I will probably just use a simple trampoline when targeting WebAssembly, until |
I should also clarify, I compiled the above example with |
@bitwalker |
@aardappel I did a fair amount of testing before posting ;). To check the values returned at each step, I dumped all the stack-related values via I did try an implementation with |
@bitwalker I think that this may be because you're compiling without optimizations. I took your code and compiled it with webassembly.studio, which by default compiles with |
@binji Yes, I'm compiling without optimizations, and that is intentional because of the size and simplicity of the example - it happens to be something a compiler can trivially convert into a loop in this case. Much larger programs would not be optimized this way because it would essentially require defining the whole program within a single function. The point of the example is not whether a compiler can optimize this one case into a non-recursive form, but that fundamentally one can check the available stack space and act on it to implement Cheney on the MTA. |
@bitwalker, as @aardappel said, the value produced by Your code is exploiting undefined behaviour in C. It happens to work on common hardware, but not on Wasm. |
Just to be clear, I am running that code in a WebAssembly engine, specifically v8, not compiling to x86_64. I know I'm looking at the shadow stack in the example, but that's because I'm assuming that allocations I make on the stack (in WebAssembly code) are being allocated on the shadow stack, including stack frames for functions called in that code. In other words, the system stack is not something I need to even be aware of when writing WebAssembly; the shadow stack is the "real" stack in that context. If that isn't the case, then yes, you're right that the example works by happy accident - but that would seem to imply that there is no way for a compiler to prevent generating code that could overflow the system stack, which seems like a huge problem to me. Is that really the case? |
Wasm functions use the system stack like most other code. The engine implements stack guards, but other than that it uses the system stack like any other compiler and runtime do. And yes, it can overflow, but such overflows are caught. The shadow stack is something introduced by the C compiler, and it only contains locals that need to be in memory addressable from C. Clearly, the engine could not store return addresses or other internal pointers there without creating major security problems. |
The problem I'm getting at is that this distinction isn't useful if a compiler implementation targeting Wasm is unable to implement any runtime checks to prevent overflowing the stack, or to recover from imminent overflow, as is possible on other targets. I'm unclear on why it is not desirable to make available, at least on a read-only basis, the current system stack pointer, as well as the maximum stack size, in order to do so. Clearly putting everything on a shadow stack isn't enough to guarantee anything, as previously suggested, so the options for functional languages here seem really awful, frankly. My C example seems to have muddied the waters rather than help illustrate, so I apologize if it has been more trouble than it was worth. I'm neither using C, nor generating it, I chose it to keep the example simple. However, my compiler does use LLVM, and would use the intrinsics it provides for Wasm, which naturally rely on features defined in the spec. But it seems like there is no supported way for a language implementation based on Cheney on the MTA, such as mine, to target Wasm without significant refactoring, which in most cases I suspect is simply a non-starter. So the most important question to me is: what is the path forward for these languages/implementations? Likewise, what can the spec do to support them? I thought the biggest missing piece was lack of native tail call support, but it seems like the typical options for working around that are essentially unavailable, or impractical due to overhead. The one exception is using a simple trampoline - and if you aren't starting from scratch, redesigning your implementation around that is non-trivial at best. |
The system stack pointer is a physical hardware address. Clearly, it is out of the question to expose that to untrusted code on a security-sensitive platform like the web, even if it was read-only and inaccessible. In principle, Wasm could provide more abstract means for querying and manipulating stack sizes. But the details of how to provide such a mechanism are not obvious, see above. Especially because they would introduce a substantial degree of observable non-determinism. A more desirable route would be to have mechanisms for growing stacks in engines themselves rather than having each program hack around it. Once the tail call proposal becomes available, the problem should be reasonably minor in practice, though. What remains are fairly strict resource constraints and the fact that programs can die unexpectedly when they blow them, but that has always been the reality of the web. I think it is an exaggeration to say that this is prohibitive, and Wasm is surely not making it worse. |
I'm interested to know more about the specific vulnerability(s) this represents, but I'm not arguing that real hardware addresses are necessary. Unless I'm missing something, the stack pointer address accessible to Wasm code need only be a useful proxy for the relative position in the system stack that the code is executing in (e.g., appearing to be an address between 0 and
Could you clarify what you mean by this? I'm specifically talking about being able to query stack size, I don't think manipulating it is a goal. Surely it is acceptable for different engines to have different stack sizes, so exposing that size would introduce no more non-determinism than is already fundamentally present.
That deals with one issue (the stack being too small and thus too easy to overflow), but is tangential to the issue I'm trying to raise.
I agree, for some subset of languages anyway, but that proposal still seems to be early in its cycle, and will be some in time in coming. In the meantime, the lack of them and the lack of workarounds is a huge issue for languages that require them. The tail call proposal itself doesn't contain a solution for languages which use the stack for allocations and garbage collect by jumping, but the resumable exceptions idea would likely provide a solution for that - but that isn't being actively worked on as far as I'm aware, it is just theoretically possible once the exceptions proposal is accepted.
The constraints themselves are not prohibitive, and aren't really the issue. Resource constraints and associated risk exist outside the web as well, but there are tools to deal with them there - in the context of stack limits, one has If I've fundamentally misunderstood or simply missed something, I'm definitely looking to learn and fix that. It just seems like this is something that has been overlooked, based on reading through the spec and looking through various issues/proposals in the org. If you have links to other discussions that are relevant here, definitely point me towards them so I can avoid rehashing anything that has already been discussed, or can at least make sure I'm bringing something new to the table. |
Currently, Wasm code cannot observe stack overflows, because it will be aborted. Its own outcome cannot depend on stack size, only success vs failure. That is quite different from being able to succeed with result A or with result B depending on the stack.
The proposal is at stage 3, so should be available relatively soon. Starting a whole different proposal from scratch seems very unlikely to go faster.
My point was that these tools have never been available on the Web, so it should perhaps not be surprising if they are not available in Wasm either. |
For sure, but now that Wasm opens up the possibility of these tools shouldn't the spec strive to identify and address? |
For now, use a shadow stack, much like what your C code does in an unoptimized build, and as I mentioned in my first reply. That need not be disastrously slow, you can still load variables into locals inside as single function, and get some benefits of register usage. Also you now have super cheap stack manipulation since you control all of it. Like @rossberg says, while theoretically possible, I don't see it likely that wasm would provide enough access to the wasm stack such that you can implement all this functionality there. The amount of languages out there that deviate in their execution model from the C-like model is endless, we'll unlikely support all of them directly. In fact we don't even fully support C directly :) Tail calls and resumable exceptions will help for some, but "rolling your own stack frames" will remain a popular option I predict. |
That's great news! I haven't seen much movement though, and it is still unclear to me whether the actual proposal has proper tail calls or something more restricted. But tail calls is only part of the problem; memory management is the other key aspect of this. Allocating on the stack when possible is always preferable, because on most systems, the stack is an area of memory for which access is optimized relative to main memory. In cases where it is not, it just functions like any other memory. So it follows that if the stack is mostly free, because all functions in your program are tail calls, and so consume no stack space, then you would want to use that free stack space for temporary allocations. In order to do that, you need to know whether an allocation will succeed or fail, by checking the amount of stack you have left. Furthermore, such a model simplifies garbage collection significantly, as one can easily garbage collect the stack space by either returning to a trampoline, or by manipulating the stack pointer. Which approach you use is not of any significance here, what matters is that you can tell when a GC needs to happen. As a language implementer, you also don't want to have two completely different ways of doing memory management based on whether you are targeting WebAssembly or not. Not having the stack be optimized is no big loss, but having to avoid the stack on one platform means you have to alter both your allocation code and garbage collection implementations significantly. This is a huge burden on compilers for which WebAssembly is only one possible target. In situations where it is unavoidable, you just have to eat that cost or avoid support for the platform; but that is only true of WebAssembly if no proposal to make that change succeeds.
I'm not surprised they aren't available right now, but I am surprised, and disheartened to hear this kind of answer, especially from you, which seems to say that because something hasn't been available on the web historically, then it isn't worth consideration. WebAssembly is already seeing use outside of the web, and such use cases are even mentioned in the spec/materials for WebAssembly itself! Not to mention that WebAssembly is already bringing tools to the table that were previously not available; tail calls being just one of them. Nothing about the web (that I'm aware of) fundamentally places a restriction on being able to query information about stack space - that's just an implementation detail of how the engines work today, it was never needed in JS, so why would it exist? If engine developers want to make it available to WebAssembly, they can, but I doubt they are inclined to do so without motivation. I'm advocating with the goal of trying to explain why it is necessary. To that end, I'm also working on getting in touch with a variety of functional language compiler engineers to gather more use cases and feedback, which I am hoping to have shared here soon. One of the most inspiring things about WebAssembly is that has the potential to enable so much of what was previously not possible on the web, or was possible but at great effort and expense. I generally agree with the sentiment that WebAssembly is about the web first and foremost, but unless there is a fundamental problem with some proposal, I'm not sure that we should be shutting the door on ideas just because they aren't already present in some form. After all, isn't that what the proposal process here is for?
That's not viable as an option, as was pointed out previously, because there is no direct correspondence between the shadow stack and the system stack; this means that preventing overflows, which is the point, is not guaranteed. The guarantee is necessary for tail calls, as well as for deciding when to perform allocations on the stack vs the heap and/or trigger GC.
How are you suggesting I implement my own stack? Compiling to native code means using the native stack, in this case that means the Wasm shadow stack, over which we don't have full control. Compiling a program into a giant loop/switch dispatch is an awful solution, for a number of reasons. The only current alternative is to trampoline for every call, but as mentioned before, beyond the performance cost, it still leaves out any language that wishes to perform GC using a Cheney on the MTA style generational collector, or use the stack for allocations.
From what I've seen, virtually every functional language of consequence (Haskell, Lisp, ML, Prolog) has at least one compiler/implementation which generates C, and the models those languages represent cover essentially the entire space of functional programming languages. Erlang, for example, is quite similar to ML in its core representation. These languages all differ significantly from C semantically, but nevertheless they can all be translated down to something that can fit into C's model. I don't think we're talking about an issue of supporting every possible implementation detail, but having primitives to query usage of the stack is a cross-cutting concern that affects all kinds of languages, C included.
Do you have an example of a language compiler that targets Wasm, rolls their own stack frames, and is not an interpreted language? You say it is popular, but I'm not seeing it, but maybe I'm just looking in the wrong places. I suspect this is also a problem for FFI, but I don't know enough about the proposed implementation to say. Perhaps I should open up a new issue specifically for the stack primitives I think are needed? This thread is not really about continuations specifically anymore, and it isn't really about tail calls either. It may help to hit the reset button, so to speak. |
At this point, yes, I believe you should open a separate issue. I think you should also make a list of exactly which stack primitives you're asking for. I know that aside from features like tail calls and built-in SJLJ that are already planned, you'd like to be able to predict when the internal call stack may overflow in advance. Is that all you need? |
Will do
Right, I am thinking to more or less mirror the primitives one would use elsewhere:
But these could be rolled up into a single primitive "stack space remaining", which doesn't provide you with a limit or a position, if that is preferable. Since some things live on the Wasm shadow stack, and some on the host stack, one has to take into account both rather than just one when making decisions, and as far as the host stack is concerned, I think it is only really necessary to know how much space remains available. Using these, a compiler could generate code that would check the stack remaining, and determine if a garbage collection/trampoline is required or not, then perform the stack allocations it needs, do any local computation, then proceed by invoking a continuation (generally another call, but can be a local jump in some cases), which in the case of a call, would enter the new function and again check the remaining stack and decide how to proceed. An important part of that flow is the ability to know where things will get allocated, so that the correct limit is checked. I'm not 100% clear on what all may end up on the host stack, so I'm going to try and follow up on that today before I open a new issue. |
Considering that a language like @shriram 's pyret language draws from deep insight into the failings and success points of the top languages out there (including javascript), his comment carries weight #919 (comment). Javascript is a low bar, the main point its got going for it is that it's ubiquitous due to participating in content dissemination over a point-to-point network systems. Data dissemination monopolies arise purely as an emergent behaviour of doing data disseminations over a point-to-point communication system. A language designed in 10 days rode the coat tails of a heavy tcp/ip node destined to serve ads, it isn't special and didn't come to it's current status through virtue of its merits. We have an opportunity to right the wrongs and not require so many frameworks to half arse reimplement continuations. TCO is a great step in the right direction. Continuations allow for proper concurrent functional programming languages which have reactors that broadcast data to multiple points (not just GAFAM servers!). Especially when data becomes directly addressable and we don't need to obtain indexed data by addressing google's nodes. It benefits google to have a single threaded execution environment as it works well shunting ads down an IP client/server pipe. Proper continuations give us better options. The browser can be so much more than the deliberately crippled advert serving machine it is today. |
A bit late to this conversation, which I didn't know was happening, but @bitwalker may find it useful to take a look (maybe not any more!) at Stopify [site: https://www.stopify.org], [paper: https://cs.brown.edu/~sk/Publications/Papers/Published/bnpkg-stopify/], [repo: https://github.com/plasma-umass/Stopify]. It presents a generic JS->JS compiler that implements similar transformations and includes optimizations to provide continuations. It shows how several JS-transpiled languages can be automatically enriched using these transformations. While I would still like to see better native support (see our performance numbers), Stopify offers a good path (see our performance numbers) from today's reality to that dream. In particular, it takes a ton of crud out of the run-time system, separating core language concerns from continuation-related concerns. |
As a follow-up to some of my comments above, I should probably link to my presentation from the Feb 2020 meeting, showing ideas for adding stack switching/coroutines/continuations/effect handlers to Wasm. |
Closing in favor of #1359 |
Some way of saving the WebAssembly execution state and later restoring it would be useful in order to port blocking APIs to platforms whose equivalent APIs are async, especially the Web itself, which relies heavily on async APIs. Only a very limited form of continuations would be necessary to open a wide range of possibilities.
Searching for "continuations" in the issue tracker brings up a few comments by @rossberg mentioning plans for "delimited continuations"/"stack switching" in a later revision of WebAssembly: #796 (comment), #919 (comment), #1171 (comment). There doesn't seem to be any mention of the feature on the website, nor a dedicated issue in this repository, so I'd like to ask -- what's the status here?
The text was updated successfully, but these errors were encountered: