Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monads vs. constructors #31

Closed
hrutvik opened this issue May 22, 2023 · 3 comments
Closed

Monads vs. constructors #31

hrutvik opened this issue May 22, 2023 · 3 comments
Labels
back end enhancement New feature or request

Comments

@hrutvik
Copy link
Collaborator

hrutvik commented May 22, 2023

This issue suggests modifying the specification of monadic operations to be distinct from constructors from at least ThunkLang onwards.

Motivation

Currently, constructors are compiled to the call-by-value ThunkLang by Delaying their arguments. This ensures the "deep" evaluation of constructor arguments in ThunkLang is compatible with the "shallow" weak-head evaluation in PureLang.

Arguments of monadic operations receive the same treatment because they are conflated with constructors - however, we don't morally compile monadic operations to call-by-value primitives until StateLang. In particular, we expect arguments to monadic operations to be Delayed right until we compile away monadic operations in StateLang - otherwise, call-by-value Thunk/Env/StateLang would evaluate monadically sequenced code prematurely and semantics-preservation from PureLang would not hold.

But this has negative effects on specification, implementation, and verification from ThunkLang onwards. In particular:

  • ThunkLang and EnvLang have unusual monadic intepreters: these force various arguments to monadic operations because they "know" these will be Delayed.
  • EnvLang semantics actually bakes in the assumption that arguments to monadic operations will be Delayed by enforcing it in exp_of. The thunk_to_env pass removes the compiler expression Delays, which are recovered on expansion to semantics expressions.
  • This forcing semantics makes verification of env_to_state quite subtle - though it "compiles away" the monad, it must insert various Delay operations (both in monadic arguments and around effectful operations) to remain compatible with ThunkLang/EnvLang behaviour, which requires the monadic arguments to be forced.
  • env_to_state also seems "inconsistent", in that Ret/Raise are treated quite differently to other monadic operations.
  • Worst of all, these Delays appear in the compiler implementation, and so produce odd-looking (and perhaps poor quality) code. A particular example is that all array updates and allocations actually store Delayed values into the array, slowing down array-based code significantly.
  • Generally, lots of special-casing and invariants on monadic operations must be maintained (e.g. mk_delay always Delays monadic arguments).

This issue argues that we should separate constructors and monadic operations when their semantics starts to differ, i.e. in ThunkLang. Constructors can be deeply evaluated in ThunkLang as they are now, but monadic operations should remain suspended (i.e. shallowly/weak-head evaluated) until we actually want to compile them away in StateLang.
In other words, in PureLang conflating the two is fine because they have the same weak-head semantics, but in ThunkLang constructors have call-by-value semantics and we want monadic operations to remain weak-head.

High-level implementation

Overall, we would need to implement the following:

  • Add new language constructs from at least ThunkLang onwards:
    mop ::= Return | Bind | ...
    exp ::= ... | monadic mop (exp list)
    value ::= ... | monadic mop (exp list)
    
    The new monadic operations always halt immediately to a weak-head style value, i.e.
    eval (monadic mop es) = monadic mop es
  • Compile PureLang monadic operations directly into these without Delay, e.g.:
    compile (Prim (Cons "Ret") [e]) = monadic Return [compile e]
  • Compile the new monadic operations straightforwardly through EnvLang:
    compile (monadic mop es) = monadic mop (MAP compile es)
  • Modify ThunkLang/EnvLang monadic interpretation to remove force, as it is no longer needed. ThunkLang semantics will still need to handle MkTick, but this is "optional" in the sense it is never required/expected to be anywhere, so shouldn't impose subtle invariants on code.
  • Update env_to_state (proofs and implementation) to remove the various Delay operations, as these are no longer expected as input nor required on output.

For completeness, we could also bubble up the change to PureLang - it might be more elegant than using reserved constructor names. But this is not necessary and would require updating equational reasoning/parsing/typing/demands/... so can certainly wait.

Pros and cons

This could potentially fix the issues discussed above. In particular, ThunkLang/EnvLang could have more natural semantics, env_to_state could be much simpler and produce better code, overall fewer invariants/special cases need to be carried through compilation. Moreover, compilation of monadic operations is easier to specify, verify, and understand: they are passed down the compiler essentially unchanged, until StateLang when they are all compiled away. Currently we have a change in pure_to_thunk which is cumbersome in later languages, followed by the actual compilation in StateLang. Overall, it feels like monadic operations could have a much cleaner compilation story with this change.

Of course, this would be a significant change to the compiler. But in a sense, we are doing the work already, but more messily - we enforce Delayed monadic arguments (modelling weak-head evaluation) until StateLang anyway with invariants and very careful compilation, but this change would effectively make it happen "for free" (in the sense that we don't need to assert or enforce it).

This new construct would require mutual recursion between expressions and values. Fortunately ThunkLang already has this - so EnvLang would be the biggest change, and no optimisation happens in EnvLang, minimising this burden.

@hrutvik
Copy link
Collaborator Author

hrutvik commented May 22, 2023

Context: when writing up StateLang in my thesis last week, I did not understand what was going on with Delay. @myreen explained its necessity, but I wondered if the approach above might be a possible solution. We agreed it could help in a call today, and @myreen suggests the following plan:

On a new branch:

  1. Update ThunkLang and EnvLang syntax/semantics with the new constructs in parallel
  2. Respecify and reverify pure_to_thunk relations, updating ThunkLang semantics as necessary
  3. Respecify and reverify env_to_state relations
  4. Reimplement pure_to_thunk/env_to_state and ensure all well-formedness proofs seem reasonable
  5. Update all the Thunk-to-Thunk proofs/implementations. Update thunk_to_env.
    This might be a good time to review the unused relations in ThunkLang and see if they could be put to use.
  6. (optional) bubble the change up to PureLang, and update equational theory/parsing/typing/demands/...

Steps 1-3 can be started alone, but the remainder will require a collective effort to update.

@myreen points out a key subtlety: arrays. These store expressions in PureLang, and so must store thunks in ThunkLang onwards. The ThunkLang semantics must therefore check it only stores thunks in arrays. We should also fix a current inefficiency: all array allocations store Delays in arrays - if storing a variable, this leads to Delay (Force (Var _)). In the approach above, we should ensure we can use mk_delay to avoid this when compiling into ThunkLang.

A more minor subtlety could be projections. In particular, projections are forced in the transition to ThunkLang, but if projecting out of a monadic operation this could now produce a type error. However, as projections out of monadic operations should be forbidden except in the contextual equivalence proofs, this should be fine - but may require some changes to relations.

@hrutvik
Copy link
Collaborator Author

hrutvik commented May 24, 2023

Initial work on this is on the monad-vs-cons branch.

  • Update ThunkLang and EnvLang syntax/semantics in parallel
  • Respecify/reverify cross-language compiler relations
    • Initial pure_to_thunk relation
    • Env_to_state relations
    • Remaining pure_to_thunk relations
  • Update cross-language implementations (and their verification)
    • pure_to_thunk
    • thunk_to_env
    • env_to_state
  • Update thunk_to_thunk
    • let_force
    • split_Delay_Lam
  • Update top-level compiler proofs + check translation

@hrutvik hrutvik added enhancement New feature or request back end labels Jun 21, 2023
@hrutvik
Copy link
Collaborator Author

hrutvik commented Jul 6, 2023

This is complete as of 2751040.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
back end enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant