Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: scope context variables decorators (input, output, internal) #112

Closed
denismerigoux opened this issue Apr 30, 2021 · 10 comments
Closed
Assignees
Labels
🔧 compiler Issue concerns the compiler ✨ enhancement New feature or request 💡 language Language design

Comments

@denismerigoux
Copy link
Contributor

denismerigoux commented Apr 30, 2021

The problem

Scopes in Catala can have many context variables. But as the number of context variables grows, it is more and more difficult to figure out which of these variables are output, which are input and which ones are intermediate variables that are not relevant from outside the scope. Catala users have already started using comments to annotate scope context variable declarations with this classification.

While it is the essence of Catala that context variables are neither input nor outputs by default, since they can be redefined by a calling scope, we could benefit from user annotations to enable helper lints and better code generation in the different backends.

Specification of the decorations

A regular context variable declaration looks like this:

scope Foo:
  context a content bool 

This proposal would allow replacing the context with the following keywords:

  • input: this scope variable be defined by the caller, cannot be defined in the scope
  • output: this scope variable cannot be defined by the caller, has to be defined in the scope
  • internal: this scope variable cannot be defined by the caller, has to be defined in the scope, and does not appear in the outputs
  • output: this scope variable can defined by the caller, can be defined in the scope, and appears in the outputs

The classifications internal/input/context on the one hand, and internal/output on the other hand, form two independent classifications for respectively the input and output behavior of a scope variables. From this independent combinations of two choices between respectively 3 and 2 options are yielded 6 different possibilities for fully qualifying the input/output behavior of a scope variable:

  1. internal
  2. output
  3. input
  4. input output
  5. context
  6. context output

This specification defines informally two permissiveness lattices between the kinds of scope variables. Here it is, the most permissive being at the top:

 CONTEXT
    | 
    |  
  INPUT                       OUTPUT
    |                            |
    |                            |
 INTERNAL                     INTERNAL

Linting

If we have these four keywords, we can enforce their specification in three different ways.

  1. ERROR LINT: When calling a subscope Foo, we can ensure that all the variables of Foo redefined in the caller are either context or input, but not internal.
  2. ERROR LINT: When calling a subscope Foo, we can ensure that all the variables of Foo used (as outputs) in defining variables of the caller are output, but not internal
  3. WARNING LINT: Inside a scope, we can ensure all variables defined are either context or internal
  4. WARNING LINT: Inside a scope, we can check that all variables marked as output or internal have at least one definition.

Code generation

The decorations can also help us generate code that has easier signatures than the current compilation scheme that exposes all context variables in both the output and input structs of the scope. More specifically:

  • The input struct should contain the (no keyword) and input variables, but not the internal or output variables
  • The output struct should contain the (no keyword) and output variables, but not the internal or input variables

Implementation

The implementation of this feature will impact quite a lot of areas of the compiler:

  • Adding syntax keywords
  • Extending the surface, desugared and scopelang intermediate representations with the kind for each scope variable
  • Implement the lints presented above in the scopelang intermediate representation
  • Modify the dcalc and lcalc translations using the variable kind information according to the specification above
  • Fix the OCaml backend
@denismerigoux denismerigoux added ✨ enhancement New feature or request 🔧 compiler Issue concerns the compiler 💡 language Language design labels Apr 30, 2021
@msprotz
Copy link
Contributor

msprotz commented Apr 30, 2021

Thanks Denis for summarizing the proposal! Here's a few suggestions to simplify, or at least have a first simplified design that can be refined incrementally.

  • Can we (for the time being) leave context out of the discussion since it's the default behavior?
  • Linting:
    • caller-redefined variables cannot be internal or output
    • caller-bound variables cannot be internal or input
    • points 3. and 4. become optional
  • Code-gen
    • input and internal variables do not appear in the output struct
    • output and internal variables do not appear in the input struct

What do you think?

@denismerigoux
Copy link
Contributor Author

These are all good suggestions, I updated the design post above accordingly.

@denismerigoux denismerigoux added the 👪 help wanted Extra attention is needed label Apr 30, 2021
@EmileRolley
Copy link
Collaborator

I don't really get the differences between the no keyword and the current context keyword.

@denismerigoux
Copy link
Contributor Author

You read correctly, because there is none :) I propose we remove the context keyword for the default case after Jonathan's remark:

Can we (for the time being) leave context out of the discussion since it's the default behavior?

@EmileRolley
Copy link
Collaborator

For me the keyword context is meaningful. Is there any reasons to not simply keep context and add keywords internal, input and output?

@denismerigoux
Copy link
Contributor Author

Having no keyword for the context case is a nudge for programmers to clarify the role of their scope variables. Compare

declaration scope Foo:
  internal x content integer
  output y content boolean
  context z content date

with

declaration scope Foo:
  internal x content integer
  output y content boolean
  z content date

It is more obvious in the second version that something is missing to qualify z, which we want to encourage the programmer to do since it clarifies the use. Also in the case where the programmer has not yet labeled the scope parameters, it is more convenient to write

declaration scope Foo:
  x content integer
  y content boolean
  z content date

rather than

declaration scope Foo:
  context x content integer
  context y content boolean
  context z content date

All of these observations lead me to refine my proposal. I propose that we allow both (no keyword) and context, both having the same semantics.

@EmileRolley
Copy link
Collaborator

Okey, I agree with you. Do you think I can handle the implementation?

@denismerigoux
Copy link
Contributor Author

This is definitely more ambitious than the wildcard issue, since you have to go down the entire compilation stack. The general architecture is presented here https://catala-lang.org/ocaml_docs/catala/index.html, and the formalization is here https://hal.inria.fr/hal-03159939. I guess you can take a look a those, and we can schedule a call next week to sync up before you start. Is that good for you ?

@EmileRolley
Copy link
Collaborator

Yes, thanks. I guess I can start to look at it and write down some questions.

@denismerigoux
Copy link
Contributor Author

Implemented in #185 and #189.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔧 compiler Issue concerns the compiler ✨ enhancement New feature or request 💡 language Language design
Projects
None yet
Development

No branches or pull requests

3 participants