Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safe recursion #1006

Open
andrewrk opened this issue May 11, 2018 · 24 comments
Open

safe recursion #1006

andrewrk opened this issue May 11, 2018 · 24 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented May 11, 2018

Accepted Proposal


If we solve #157 then we'll know at compile-time the maximum stack usage of the entire call graph.

Except for recursion. Recursion - whether direct or indirect - represents a loop in the graph.

directed_graph _cyclic svg

Here you can see indirect recursion B->C->E->D->B...

Recursion is problematic because it can happen a number of times that is only known at runtime, whereas the stack size is determined at compile-time. If too much recursion happens, we get stack overflow. Apart from being an annoying bug, stack overflow is a real concern for embedded software, which typically must resort to heuristics to find out just how small of a stack size they can get away with using.

But recursion is very useful. It's an intuitive way to reason about control flow. @Hejsil wrote zig's self-hosted parser without recursion, and it is not as straightforward as recursive programming.

So what do we do?

We have our cake and eat it, too. That's what we do.

Looking at the graph above again, we can have every function call in the graph be normal, except for D calling B. That's the only entrance into the cycle. So we can make there be a compile error for a normal function call from D to B. To make a function call which starts a cycle, you have to use a builtin:

const new_stack = try allocator.alloc(u8, @stackSize(function));
defer allocator.free(new_stack);
_ = @recursiveCall(new_stack, function, args);

It is a compile error to use @recursiveCall when you could have used a normal function call, and it is a compile error to use a normal function call when you must use @recursiveCall. So the compiler always tells you what to do.

I've prototyped how this would be emitted to LLVM using inline assembly, and I am talking to the LLVM mailing list to find out what's the best way to accomplish this. http://lists.llvm.org/pipermail/llvm-dev/2018-May/123217.html

There's another, simpler way we can have safe recursion, by limiting the amount of recursion at compile-time. This would be another builtin, that would look like this:

var cycles_left: usize = undefined;
_ = @limitedRecursiveCall(999, &cycles_left, function, args);
if (cycles_left == 0) @panic("recursion limit exceeded");

This would work the same way except instead of allocating memory on the heap, we specify the maximum number of cycles, and the function call fails if the cycle limit is exceeded. Then when calculating the stack upper bound, we multiply the cycle stack size by the recursion limit.

So you then would decide at compile time how much stack space you're going to ask for from the OS.

The benefit of this idea is that it actually does not depend on any changes to LLVM and we could implement it even before #157 is finished.

Until #105 is done we would have to specify the Recursive or LimitedRecursive calling convention on the first node getting called in the cycle (B in the picture).

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label May 11, 2018
@andrewrk andrewrk added this to the 0.4.0 milestone May 11, 2018
@bnoordhuis
Copy link
Contributor

Interesting proposal. Do I understand right that both builtins work (and can only work) under a closed world assumption?

How does @recursiveCall deal with function pointers? I saw mention of computing the set of all possible values but that seems (P-)hard.

About @limitedRecursiveCall, how is the counter propagated and what decrements it?


FWIW, I've been thinking along similar lines as @limitedRecursiveCall:

fn f(@rec n: usize) { if (n > 0) g(n-1); }
fn g(@rec n: usize) { if (n > 0) f(n-1); }

The idea being that the compiler knows n is special and "trivially" reduces on every call. Doesn't give a stack upper bound (at least not without further analysis) but it does guarantee termination with direct and indirect function calls and mutual recursion.

I'm using "trivially" kind of hand-wavy because I'm not sure what the compiler can trivially infer. There is probably something clever that can be done with comptime expressions.

@bheads
Copy link

bheads commented May 11, 2018

This could be really cool if you can solve all the corner cases. How would the compiler handle allocating on the stack?

http://man7.org/linux/man-pages/man3/alloca.3.html

@andrewrk
Copy link
Member Author

How would the compiler handle allocating on the stack?

@alloca in zig was removed and is not a supported use case: #225 (comment)

We will have a builtin function to annotate extra stack usage for the places we need it. For example if you use inline assembly to modify the stack pointer directly then you may need to use @annotateStackUsage(1234) with the number of bytes possibly used.

External functions could change the stack pointer in this way as well; external functions will support annotations to specify the stack usage upper bound per function.

I'm thinking about writing a tool which analyses binaries and disassembles functions to automatically compute these annotations.

@andrewrk
Copy link
Member Author

About @limitedRecursiveCall, how is the counter propagated and what decrements it?

I didn't think this through very well. I'll have to re-think it. It would work for direct recursion where we can add secret parameters, but we wouldn't want to have to change the calling convention of every function in the cycle.

@kyle-github
Copy link

This is a hard problem to solve generally due to NP completeness.

That said, why not start small and work at this for a while?

For instance, rather than trying various constraints like recursion limits, just have checked stack allocation only for call graphs that do not have cycles.

Over time you either chip away at the various other cases or people start writing code for embedded systems that does not use recursion.

There is very little Zig code out there today and it seems like you may be trying to solve the problem without any solid data.

Put in the checks you need to double check actual usage in test blocks.

test " ensure stack size" { assert(@maxStack() < 1024); .... }

I am typing on a phone so I have no idea how this will get formatted!

The idea is that the built-in above would return usize and if there was any recursion at all would either panic or return the maximum usize value.

As more code is written and you see the patterns that are safe, then you can relax the constraints with more complicated and refined rules.

@andrewrk
Copy link
Member Author

I did a proof of concept of this and it's now available for use: @newStackCall

This proof of concept means that, even though we don't have all the features done yet for safe recursion, the ideas will work. So recursion is OK to use, it's not evil, and the compiler will help you remove the possibility of stack overflow in a future version of zig.

What's left to do for this issue:

andrewrk added a commit that referenced this issue May 13, 2018
@ghost
Copy link

ghost commented Sep 6, 2018

the other day I was reading On Recursion, Continuations and Trampolines

... and looked up CPS were It says:

As CPS and TCO eliminate the concept of an implicit function return, their combined use can eliminate the need for a run-time stack. Several compilers and interpreters for functional programming languages use this ability in novel ways.

my gut feeling is that this can't be true, otherwise everyone would just do it, or there is some other major drawback hiding

@andrewrk
Copy link
Member Author

andrewrk commented Sep 6, 2018

The drawback of TCO is that you can't always do it. See #694

We have CPS in the form of coroutines and indeed it solves the recursion issue. I plan to revisit this issue with that in mind.

@thejoshwolfe
Copy link
Contributor

If you eliminate the runtime stack, you have to replace it with some other allocator. The proponents of functional languages would probably advocate storing bound variables on the heap, which doesn't seem like an improvement.

@ghost
Copy link

ghost commented Sep 6, 2018

The drawback of TCO is that you can't always do it.

from https://eli.thegreenplace.net/2017/on-recursion-continuations-and-trampolines/

It turns out we can convert any function to use tail calls instead of recursion (direct or indirect) by applying the following recipe:

  1. Pass each function an extra parameter - cont.
  2. Whenever the function returns an expression that doesn't contain function calls, send that expression to the continuation cont instead.
  3. Whenever a function call occurs in a tail position, call the function with the same continuation - cont.
  4. Whenever a function call occurs in an operand (non-tail) position, instead perform this call in a new continuation that gives a name to the result and continues with the expression.

I don't claim I completely understand the issue because there should definitely still be a catch somehow, just reading though it seems as if it would work always and solve the issue.

@thejoshwolfe
Copy link
Contributor

That all assumes passing a continuation as a first class function encapsulates all the bound variables for the closure. The concepts in that discussion are definitely not possible without bound variables. Bound variables have to be allocated in memory somewhere.

Functional programming literature is hard for me to understand because it all seems to ignore the reality of computers. Where is the memory allocated? That's a really important question that you can't just gloss over.

@monouser7dig I think that's the catch you're looking for.

@ghost
Copy link

ghost commented Sep 6, 2018

Ok maybe that’s true, thanks for the pointers.

@hryx
Copy link
Contributor

hryx commented Jul 7, 2019

Just some thoughts over coffee ☕️

  • Should safe recursion be allowed at comptime? I suppose this would depend on:
    1. a comptime allocator (use case: comptime allocator #1291), but not out of necessity. For example, a fixed-size buffer could be used for the stacks.
    2. the ability to detect cycles in a comptime call graph. Maybe we get that for free? I'm not yet familiar with the implementation of comptime analysis/evaluation.
  • Let's take the parser as an exemplary use case of recursion. Some challenges I can foresee:
    1. Unlike AST nodes, which generally stick around once allocated, the stacks will come and go rapidly. Because of this, we might not want to use arena allocation for them.
    2. If the grammar changes, the call graph will change, and so might the point of recursion (D->B in the original example). I wonder how painful it would be to have to juggle @newStackCall around the source file in this case?

andrewrk added a commit that referenced this issue Aug 31, 2019
which heap allocate their own frames

related: #1006
andrewrk added a commit that referenced this issue Aug 31, 2019
 * `await @asyncCall` generates better code. See #3065
 * `@asyncCall` works with a real `@Frame(func)` in addition to
   a byte slice. Closes #3072
 * `@asyncCall` allows passing `{}` (a void value) as the result
   pointer, which uses the result location inside the frame.
   Closes #3068
 * support `await @asyncCall` on a non-async function. This is in
   preparation for safe recursion (#1006).
@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 20, 2019
@andrewrk
Copy link
Member Author

andrewrk commented Sep 24, 2019

The plan to solve this is to use async functions and @Frame to heap-allocate recursive function calls. Zig will force recursive calls to be async, and this will cause a compile error due to async frame dependency loop, which will show the call graph cycle. The cycle can then be broken by this pattern:

const frame = try allocator.create(@TypeOf(async func(a, b, c)));
defer allocator.destroy(frame);
await @asyncCall(frame, {}, func, a, b, c);

@anosovic
Copy link

anosovic commented Apr 18, 2020

  • if recursion depth is known at compile time, or if the call graph cycle is tail call optimized, will you need to allocate the function call like that?
  • with this change, will it really be impossible to stack overflow at runtime? or could you eg call a bunch of different functions in a row to try to get zig to stack overflow?

@marcthe12
Copy link

Ackerman function should be test as it very good in creating stack overflows and impossible to avoid recursion.

@ghost
Copy link

ghost commented Jul 30, 2020

One subtlety: when detecting cycles, we need to count cycles that involve tail calls, but those cycles are only a problem if they involve regular or stack async calls as well. Since converting any one of those calls to an allocated async call will break the cycle, an instance of recursion should be considered a property of a call graph rather than any individual call. Error reporting should list all functions in the cycle and whether they're regular calls, tail calls, or stack async calls -- due to the lazy nature of the compiler, the "entry point" of the cycle may not always be defined.

Anyway, that might be obvious, sorry if so.

@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 9, 2020
@Jarred-Sumner
Copy link
Contributor

Jarred-Sumner commented Apr 25, 2021

The plan to solve this is to use async functions and @Frame to heap-allocate recursive function calls. Zig will force recursive calls to be async, and this will cause a compile error due to async frame dependency loop, which will show the call graph cycle.

Does this mean that, after this change, recursive function calls will be slower in Zig than right now due to the heap allocation? Or would there be a way to allow recursion of up to N depth like in the earlier example with @limitedRecursiveCall and then @Frame could be heap allocated in subsequent calls?

If the goal is "prevent/handle stack overflows", I wonder if a stack overflow could be a required error to handle when recursively calling a function, like error.OutOfMemory but e.g. error.StackOverflow. I honestly don't know much about compilers though

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@matu3ba
Copy link
Contributor

matu3ba commented Oct 14, 2021

Since this may happen potentially along the complete program graph Johnsons algorithm sounds like the best solution.
Described in "Finding all the elementary circuits of a directed graph" by Donal B. Johnson. Other solutions have quadratic runtime, which scales bad for bigger programs.

Somebody tested Johnson, Tarjan etc and found another solution was more flexible: link to paper.

Enumerating Circuits and Loops in Graphs with Self-Arcs and Multiple-Arcs.
Hawick and H.A. James, In Proceedings of FCS. 2008, 14-20

This stuff should be properly benchmarked and not only being used in theoretical limbospace, because no paper so far published their benchmark data and test suite.

@dumblob
Copy link

dumblob commented Oct 14, 2021

I know it's a bit off-topic, but Nim guys have implemented a full-featured & seamless CPS for Nim (i.e. including support for arbitrary recursion). Of course it's based around captured variables (i.e. "closures"), but it works really well.

The only downside is it's currently somewhat slower - but there is a huge potential for optimization (as their implementation focuses on correctness and readability and not at all on optimization).

@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 20, 2021
@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@matu3ba
Copy link
Contributor

matu3ba commented Nov 3, 2022

The folks from mold are discussing/planning to utilize Tarjan or some related methods/parallelization (https://ppl.stanford.edu/papers/sc13-hong.pdf) to speed up mold rui314/mold#842, so we might be able to steal get inspired by their ideas (and experimental results on code).

Their relevant research question looks like, if the same method(s) are applicable on object code, which I suspect translates very good to source code execution semantics (besides comptime dead code elimination).

@dvmason
Copy link

dvmason commented May 21, 2023

CPS conversion is always possible - if you have a heap and garbage collection. So that isn't really an (automatic) option for Zig. However, I am using CPS conversion in the runtime I'm building, and using explicit tail-calls between the CPS functions. I use an explicit stack and have automatic promotion to the heap where necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
Status: To do
Development

No branches or pull requests