Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need to stop relying on lower2cff #137

Open
Hugobros3 opened this issue Feb 27, 2023 · 4 comments
Open

We need to stop relying on lower2cff #137

Hugobros3 opened this issue Feb 27, 2023 · 4 comments

Comments

@Hugobros3
Copy link
Contributor

I wrote this sample to reduce a problem with my substitution_rework branch, where by accidentally making eta conversion weaker I made the compiler more or less explode. I think it's a good example of some larger conceptual issues at play though:

#[export]
fn test() {
    let mut x = 0;
    fn loop() -> () {
        x += 1;
        loop() // inner call
    }
    loop(); // outer call
    x
}

Full IR dump follows later in the document. Here are detailed explanations about what's going on:

The return continuation for the inner loop call (cont_75) jumps to the break point of the loop (cont_79)

Both of them are dead code but this is irrelevant because eliminate_params cannot see it: the loop header return parameter is 'used' by cont_75, and cont_75 itself looks live because loop_65 passes it to itself every further iteration. This is a separate defect though.

If eta reduction is working correctly, cont_75 reduces to loop_65.ret_67 (the return param of loop_65).
When lower2cff runs, loop_65's return parameter is eliminated cleanly (since loop_65 is not top-level, it may not have a ret param of order 2). this is the case because there is a substitution from ret_67 (the param) to cont_79 (the argument in the outer call), and so both calls into loop can be changed to the new mangled CFF-form loop.

If however, eta reduction is disabled, lower2cff will find the inner call to loop_65 uses a different def: cont_75. It thus proceeds to mangle cont_75 as well, effectively duplicating it - but that means the inner call is still calling the old version of loop_65 with the old signature, we've effectively only managed to peel one iteration!

This process goes on until either the entire loop is fully peeled, or the compiler blows up.

extern main_47: fn[mem, qs32, [[pu8]*]*, fn[mem, qs32]] = (main_48, argc_49, argv_50, ret_51) @(filter()) => {
    _56: [mem, frame] = enter(main_47.main_48))
    _58: mem = extract(_56, 0∷qu32))
    _59: frame = extract(_56, 1∷qu32))
    x_61: qs32* = slot(_59))
    _62: mem = store(_58, x_61, 0∷qs32))
    _81: !! = loop_65(_62, cont_79)@(filter()) // outer
}

loop_65: fn[mem, fn[mem]] = (loop_66, ret_67) @(filter()) => {
    _68: [mem, qs32] = load(loop_65.loop_66, x_61))
    _70: mem = extract(_68, 0∷qu32))
    _71: qs32 = extract(_68, 1∷qu32))
    _73: qs32 = add(1∷qs32, _71))
    _74: mem = store(_70, x_61, _73))
    _77: !! = loop_65(_74, cont_75)@(filter()) // inner
}

// post-outer loop call continuation
cont_79: fn[mem] = (cont_80) @(filter()) => {
    _82: [mem, qs32] = load(cont_79.cont_80, x_61))
    _83: mem = extract(_82, 0∷qu32))
    _84: qs32 = extract(_82, 1∷qu32))
    _85: !! = main_47.ret_51(_83, _84)@(filter())
}

// post-inner loop call continuation
cont_75: fn[mem] = (cont_76) @(filter()) => {
              // just calls the loop return param
    _78: !! = loop_65.ret_67(cont_75.cont_76)@(filter())
}

A valid follow-up question is: how come we need closure conversion at all ? Well, lower2cff can get stuck and recognize that fact (that it did not manage to change anything at all in the last iteration), and it quits then. There are also scenarios where a function can be stored to memory, which escapes any order-based type signature analyses. At least that's my best understanding of those scenarios.

It seems clear to me that this approach to lowering everything higher-order is not viable. It unpredictably blows up the code size and makes it possible for the compiler to diverge, even though staring at the PE filters suggests it would not.

I propose instead making higher-order functions a heuristic for inlining, giving them a higher budget (also scaling with opt level?) but stopping after a while. Instead we'd rely on closure conversion more often (possibly bringing back the need for a GC on the table).

@madmann91
Copy link
Contributor

madmann91 commented Feb 28, 2023

See #95. What you need here is to lift loop (looking at its free variables and making them arguments), then rewrite calls to it. Lower2cff should be rewritten to do that instead of specializing (ie. instead of beta-expanding/inlining).

@madmann91
Copy link
Contributor

To expand a little bit on that, you don't need to closure-convert loop, it is enough to just lift it, because it is not passed to another function or stored to memory (in other terms, all of the uses of loop are calls).

@leissa
Copy link
Member

leissa commented Mar 2, 2023

Yes correct. There is one annoying edge case though: What happens, if the loop body contains again a continuation as free variable? This may occur, e.g., if you capture the break continuation for an early exit. Then you really need to closure convert this - maybe implementing this under the hood with setjmp and friends.

@Hugobros3
Copy link
Contributor Author

Hugobros3 commented Mar 2, 2023

@madmann91 Thanks for the input! I have started a https://github.com/AnyDSL/thorin/commits/lift2cff branch where I implement this, it works on simple examples, but it's not actually very powerful on its own because of two reasons:

  • This is only really helpful for lifting continuations of order two into top-level functions, so things like loop bodies etc. Parameters of order >= 2 need either specialization or closure-conversion
  • I can only lambda lift continuations where they are used in callee position, again I need to resort to closure conversion otherwise.

Closure conversion seems very unhappy with all the new work it has to do, and swiftly proceeds to produce broken IR. Me and @m-kurtenacker have been plotting to nuke and rewrite it so that's not a big deal. Also this new lift2cff pass also takes care of what lift_builtins does, so we can probably simplify that all a fair bit.

@leissa Exactly, I have actually decided to use jmp_buf to implement the control/join mechanism in Shady - and I want to backport that mechanism in Thorin at some point. It's not super clear yet how that will look like, but I think we'll have two forms of calls in practice: regular function calls (incl. closures), and so-called 'join' calls which are generalized returns. Feel free to reach out in PM for more details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants