Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Pattern matching wildcards and actions #996

Closed
brprice opened this issue May 16, 2023 · 7 comments
Closed

FR: Pattern matching wildcards and actions #996

brprice opened this issue May 16, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@brprice
Copy link
Contributor

brprice commented May 16, 2023

Description

Currently pattern matches/case statements are

  • "shallow": one can only match on the head constructor, and not deeper into the structure. Thus one can write something like case x of A u -> ...; B -> ...
    but not case (x,y) of (A,B) -> ... or case x of C (A B) u D -> ... (actually these last two non-examples are of the same flavor, since (... , ...) is just fancy syntax for a constructor of the binary product type).
  • "maximal"/"discrete": they have exactly one branch per constructor in the relevant type.
  • they only work for ADTs, and not primitives. This is partly a implementation issue: primitives are inhabited by literals which are not quite the same as constructors. However it is partly a result of the previous point: for a large type like Char or an infinite type like Int, there is no (practical) way to exhaustively list every option.
  • "order independent": since all matches are disjoint, the order of branches does not matter

We would like to relax these restrictions. In particular the "no primitives" one: we definitely want to be able to pattern match on primitives (rather than have to build an if-tree with a primitive eqInt etc). The "maximal" one would be nice (and necessary) to relax, at least to allow a final "catch all" pattern.

We do not at this time propose to allow deep matching. This is both to keep this FR small and because it is not yet clear whether it would be pedagogically better to allow the more expressive but more complex construct. However, we do talk about deep matching a bit, to contrast it and to illuminate some corner cases.

Note that we will (or at least, may want to) lose the property of "order independent" if we allow wildcards (especially with deep matching): case x of 1 -> True; _ -> False is not the same as case x of _ -> True; 1 -> False (in Haskell -- see below for options here).

Dependencies

None

Spec

Note that for clarity in this spec I will talk about ADTs, and not primitives, everything applies with minor modifications to primitives also.

Simple wildcards

(Abstract) Syntax

The only part of the ast (i.e. the Expr' and Type' definitions) that will change is the auxillary CaseBranch.
Recall (eliding metadata), Expr = ... | Case Expr [CaseBranch], data CaseBranch = Branch ValConName [LVarName] Expr, where Case e [Branch C1 [x,y] t1, Branch C2 [] t2] means roughly the same as the Haskell case e of C1 x y -> t1; C2 -> t2.
The variables x, y listed in a branch are binders, and scope over the expression t1 in that branch.

The first part of this proposal is to add an optional fallback branch (binding no variables), essentially Expr = ... | Case Expr [CaseBranch] (Maybe Expr).
This extra branch Case e brs (Just r) is a "wildcard" match, similar to case e of ... ; _ -> r.

Typing rules

The rule for a case without the extra branch is as before:

e ∈ S a  (any number of parameters)
S is a type constructor
[Ci,...] is a list of all the constructors of the type `S`,
For each constructor S p = ... | Ci Aij{p} we have
  xij : Aij{a}, ... ⊢ T ∋ ti
---
T ∋ Case e [Branch Ci [xij...] ti, ...] Nothing

Here curly braces denote (meta-level/syntactic) dependency/substitution/scoping: if we had data T p = C (A p) B p, then C0 = C, A00 = A p, A01 = B and A02 = p -- note that the Aij depend on p (A01 does so trivially); and A00{X} would mean A X.
For consistency, our implementation requires that the constructors/branches appear in the same order in both the type declaration and the branches. This is not necessary though.
(For simplicity we have not formally stated two additional requirements: firstly that each branch has the correct number of binders (i.e. xij and Aij have the same range of j); secondly that all binders in one branch {xij : j ∈...} are distinct, i.e. we only allow linear patterns).

The new rule is when the extra branch exists:

e ∈ S a
S is a type constructor
[Ci,...] is a strict subset of the constructors of the type `S`,
For each constructor S p = ... | Ci Aij{p} we have
  xij : Aij{a}, ... ⊢ T ∋ ti
T ∋ t*
---
T ∋ Case e [Branch Ci [xij...] ti, ...] (Just t*)

i.e. it is treated as another branch which does not bind any variables.
Note that in this case we do not need to give a (named/old-style) branch for every constructor, but only a subset of them. In fact, we require a strict subset, so that there are some terms that will hit the fallback branch.

Notice however, that these rules only cover the case when the scrutinee synthesises a type which is headed by a constructor.
Matching against something of arrow type (for instance) is ill-typed, and "smartholes" will wrap the scrutinee in a hole.
Thus the only other interesting case is where we synthesise a hole.
Since this will never reduce, there is no need for any branches, and since there are no matching constructors we should in fact enforce being no constructors.
Thus the rule is

e ∈ ?
---
T ∋ Case e [] Nothing

Evaluation rules

The rule for case-of-known-constructor is modified so that if the constructor does not match a branch, then the new, fallback, branch is taken.

case (Ci s... : T A) of ..., C y... -> r{y...}, ...   ~>   r{s...}
case (C s... : T A) of ..., Just r   ~>   r     (if the constructor C does not appear in any named branch)

Note that we only reduce to the fallback branch if the scrutinee is in weak head normal form and the constructor is apart from each other branch. In particular we do not reduce with a stuck scrutinee (e.g. hole, or variable if we are evaluating under a binder): otherwise having more information to "unstick" the scrutinee could lead to a different reduct.

If there is only a fallback branch, and no named branches, then there is a choice whether to enforce the scrutinee being in WHNF.
This actually changes the laziness of the language: consider case (letrec x : A = x in x) of _ -> True (note the scrutinee is non-terminating). Should this be allowed to evaluate to True?
Some options (EDIT: we chose the "maximally lazy" version):

  • maximally lazy: this is most permissive, where if there is solely a wildcard branch we do not evaluate the scrutinee at all, and if there is at least one constructor branch we evaluate to WHNF
  • whnf-strict: most predictable, we always go to WHNF, no matter what the branches are (thus adding/removing branches will not change eval behaviour)
  • fully-strict: also predictable, but less liberal, and terminates less often: always go to a fully normal form, as deeply as possible.
    I think that whnf might be more predictable for shallow matching, but I am not sure. However, in case we plan to implement deep matching, then note that it does not make sense in that setting.

Note that for a normal-order reduction, we evaluate the scrutinee as little as possible before reducing the case to one of the branch RHSs. We never evaluate inside a RHS if we could first evaluate either the scrutinee or case.

Actions

At a low level, we need some way to choose what branches we wish to have.
We already have a "make a Case" action, and we have a free choice what branches that brings into being: the two sensible ones are "all the branches, no wildcard" or "no branches except wildcard". The particular choice does not matter, as this low-level api is intended for internal use (and for clients wishing to see low-level details), but not intended to be ergonomic for a human. However, for infinitely-branching primitives, creating all branches is impossible (unless we have something like a "not-pattern", which I do not propose adding); thus for consistency we should only offer the "only wildcard" action. (EDIT: we actually chose that ADTs get all branches, primitives and holes get just a fallback)
Whilst this deals with most scrutinees, it is somewhat lacking for those with no constructors: empty types and hole types (which we can deal with identically).
These are required to have no branches, not even a wildcard.
Thus the above rule needs to be slightly refined: "no branches except perhaps a wildcard".

We also need to be able to add and delete (or, merge into a wildcard) branches.
Thus two actions: "add a branch for constructor C" and "delete the branch for constructor C".
The first has the obvious interpretation: if such a branch exists, do nothing/error out; if no such exists, insert one in the appropriate place (assuming branches are ordered the same as datatype, otherwise at the end, just before the wildcard) with correct number of binders (auto-generated names) and a RHS of an empty hole (EDIT: we actually chose to copy the RHS of the wildcard branch).
This could be offered either on the Case as a whole, on as an action on the wildcard branch "split a constructor out".
One subtlety is when the wildcard only covers one alternative (this is a valid state, and may be intentional, as future-proofing for adding more constructors to a type).
In this case, when adding an explicit branch for the last constructor we must delete the wildcard branch (is it is now unreachable).
This will also delete its RHS: note that this is a low-level action, and we will make a different choice at high-level.

The second is slightly more tricky: what should happen to the RHS?
However, this is a low-level action, so for simplicity we will just delete it.
Note that if the case was exhaustive it now will not be, so a wildcard branch must be created in this case; the new (wildcard) RHS will be a copy of the deleted branch's RHS, with holes for the now-out-of-scope variables (i.e. those bound by the deleted branch's pattern).

At a high level, the basic actions will be similar, but we may want to tweak the details for the sake of ergonomics/pedagogy.
The choices we need to make are "what branches should exist on a new case", what is on the RHS of a new branch, and "what happens to RHSs of removed branches".
There seem to be merits in both "start with all branches and wittle down" (except this does not work for primitives) and "start with no branches and select them". There is also an argument for "ask what branches to start with" (e.g. exposed to a human as check boxes when initially creating a case, or maybe just offering both "all branches" and "no branches" options). For the sake of simplicity (and a more concrete proposal) I suggest only offering one action, "case with wildcard only". (EDIT: we decided to create all branches)
For creating a new branch, we will copy the RHS of the wildcard branch, since it is much easier to delete unwanted code than to recreate it.
Note that this means when creating the final branch the effect will be to change the wildcard pattern into a constructor pattern (as we add a new branch, delete the redundant wildcard, and also preserve the RHS of the now-deleted wildcard branch).
For deleting branches, in the case of no wildcard, one could do

case ... of ...; C x -> r{x}   ~> case ... of ...; _ -> t{?}

i.e. create a wildcard branch with the same RHS, with bound variables replaced by holes, or

case ... of ...; C x -> r{x}   ~> case ... of ...; _ -> ?

i.e. delete the RHS entirely.
In the case where a wildcard already exists, case ... of ...; C x -> r{x}; _ -> s then one must "merge" r and s. There are a few cases to consider here: what if neither/one/both of r and s are holes, what if they are the same, what if they have the same few outer layers but then differ, what if they are completely different. I think the only choices simple enough to be sensible to expose to a student are "always delete r{?} and keep s" or "if s is a hole, keep r{?}, otherwise keep s" (EDIT: we chose to keep the existing fallback RHS)

case ... of ...; C x -> r{x}; ...; _ -> s   ~> case ... of ...; _ -> MERGE(r{?},s)

where

MERGE(r,s) = s

or

MERGE(r,?) = r
MERGE(_,s) = s

Actions

At a low level, we need some way to choose what branches we wish to have.
We already have a "make a Case" action, and we have a free choice what branches that brings into being: the two sensible ones are "all the branches, no wildcard" or "no branches except wildcard". The particular choice does not matter, as this low-level api is intended for internal use (and for clients wishing to see low-level details), but not intended to be ergonomic for a human. However, for infinitely-branching primitives, creating all branches is impossible (unless we have something like a "not-pattern", which I do not propose adding); thus for consistency we should only offer the "only wildcard" action.
Whilst this deals with most scrutinees, it is somewhat lacking for those with no constructors: empty types and hole types (which we can deal with identically).
These are required to have no branches, not even a wildcard.
Thus the above rule needs to be slightly refined: "no branches except perhaps a wildcard".

We also need to be able to add and delete (or, merge into a wildcard) branches.
Thus two actions: "add a branch for constructor C" and "delete the branch for constructor C".
The first has the obvious interpretation: if such a branch exists, do nothing/error out; if no such exists, insert one in the appropriate place (assuming branches are ordered the same as datatype, otherwise at the end, just before the wildcard) with correct number of binders (auto-generated names) and a RHS of an empty hole.
This could be offered either on the Case as a whole, on as an action on the wildcard branch "split a constructor out".
One subtlety is when the wildcard only covers one alternative (this is a valid state, and may be intentional, as future-proofing for adding more constructors to a type).
In this case, when adding an explicit branch for the last constructor we must delete the wildcard branch (is it is now unreachable).
This will also delete its RHS: note that this is a low-level action, and we will make a different choice at high-level.

The second is slightly more tricky: what should happen to the RHS?
However, this is a low-level action, so for simplicity we will just delete it.

At a high level, the basic actions will be similar, but we may want to tweak the details for the sake of ergonomics/pedagogy.
The choices we need to make are "what branches should exist on a new case", what is on the RHS of a new branch, and "what happens to RHSs of removed branches".
There seem to be merits in both "start with all branches and wittle down" (except this does not work for primitives) and "start with no branches and select them". There is also an argument for "ask what branches to start with" (e.g. exposed to a human as check boxes when initially creating a case, or maybe just offering both "all branches" and "no branches" options). For the sake of simplicity (and a more concrete proposal) I suggest only offering one action, "case with wildcard only".
For creating a new branch, we will copy the RHS of the wildcard branch, since it is much easier to delete unwanted code than to recreate it.
Note that this means when creating the final branch the effect will be to change the wildcard pattern into a constructor pattern (as we add a new branch, delete the redundant wildcard, and also preserve the RHS of the now-deleted wildcard branch).
For deleting branches, in the case of no wildcard, one could do

case ... of ...; C x -> r{x}   ~> case ... of ...; _ -> t{?}

i.e. create a wildcard branch with the same RHS, with bound variables replaced by holes, or

case ... of ...; C x -> r{x}   ~> case ... of ...; _ -> ?

i.e. delete the RHS entirely.
In the case where a wildcard already exists, case ... of ...; C x -> r{x}; _ -> s then one must "merge" r and s. There are a few cases to consider here: what if neither/one/both of r and s are holes, what if they are the same, what if they have the same few outer layers but then differ, what if they are completely different. I think the only choices simple enough to be sensible to expose to a student are "always delete r{?} and keep s" or "if s is a hole, keep r{?}, otherwise keep s"

case ... of ...; C x -> r{x}; ...; _ -> s   ~> case ... of ...; _ -> MERGE(r{?},s)

where

MERGE(r,s) = s

or

MERGE(r,?) = r
MERGE(_,s) = s

Interaction with other actions

As well as each AST node having actions of their own, they also need to respond to actions taken elsewhere (mostly due to types of subterms changing).
In the implementation this is dealt with mostly via "smartholes".
The relevant effect here is when the type of the scrutinee changes.
Note that this may well happen often during writing a program: one may start with a hole, then insert a pattern match (giving a hole scrutinee), then fill in the scrutinee (changing its type).

The general problem is different types have different constructors and thus require different branches.
In line with the current implementation (which requires every branch to exist, and does not have wildcards), we will simply remove all current branches and re-create the correct ones.
For consistency with creating a Case in the first place, this will either be zero branches (hole type or empty type) or one wildcard branch (everything else).
However, since wildcards do not bind anything, we have the option to preserve the RHS of wildcard branches (if they exist in both the before and after state) -- it is as yet unclear whether we want to do this.
This can destroy interesting parts of the program on the RHS, but this is also the state today so it is at least not a step backward.
I suspect (but have no concrete data) that almost every instance of a scrutinee changing type will be during initial program writing when nothing has been put on the RHS of branches, and almost all the rest will be mistakes which will be quickly undone.

There is also another problem: the type of the scrutinee may stay the same, but that type's definition may be edited.
There are three cases here: adding a new constructor, deleting an old constructor, and changing parameters of an existing constructor.
The first two are dealt with simply: add a new wildcard branch if there is not one already (implementation choice: alternatively could add an explicitly-patterned branch) or delete that branch (respectively).
Changing arguments is similar: if add a argument we can update the patterns to bind another variable; if removing a argument we update and also remove mention of it on the RHS (replacing with a hole); if changing the type of a argument we put every mention of it in a hole (as otherwise the program will be ill-typed).

Ordering

I suggest that the wildcard comes last, thus the order of other branches does not matter (in the sense of not changing the result of evaluation).
(This is true because we do not have "deep" matching, and thus all patterns other than wildcard are pairwise disjoint.)
For consistency we could enforce ordering matching the type definition.
If not, e.g. if "insert" adds the branch just before the wildcard, then we should also add an action to reorder branches -- in particular, one would not then have to think ahead to split/insert in the correct order to get the final outcome desired.
One argument for making ordering changable is for aesthetics/documentation: it may be preferable to put branches in a particular order for reasons other than evaluation. (Potentially we could even allow the wildcard to be reordered, if we didn't rigidly tie priority for evaluation to syntactic ordering, though I suspect that may be confusing.)

Coverage checking

I do not propose to take long-distance-information into consideration, as in (from "Lower your guards" 4.1):

f True = 1
f x = case x of 
   False -> 2
   True -> UNREACHABLE

Thus coverage checking is trivial for these shallow patterns, and is subsumed by the typing rules above.

Implementation details

There are some choices to make, which have been flagged up (EDIT: and answered) inline above.

These choices are:

  • when adding a branch, should we copy wildcard's RHS into it?
  • where to offer actions: adding a branch could be offered either on the Case as a whole, on as an action on the wildcard branch "split a constructor out".
  • change type of scrutinee -- should we preserve RHS of wildcard (if it exists)?
  • if editing a type adds a new constructor, and there is a case with all constructors explicitly matched on we could either add a new wildcard branch or add an explicitly-patterned branch.
  • if editing a type drops an argument to a constructor we can either remove mentions of it on the RHS (replacing with a hole) or create a let-binding (binding it to a hole)?
  • what ordering (if any) to enforce on branches

Not in spec

Or-patterns

Whilst a main motivation for this FR is pattern matching on large primitive types, which "or-patterns" won't help with, they are a refinement of the "wildcard" idea, at least for finite types.
I briefly describe them here, with a view to deciding "these do not fit our vision" or "let's spec this out more fully". (Note that they unfortunately do not subsume wildcards, since they will only match a finite number of constructors.)

Consider the type of weekdays data Day = Monday | ... | Sunday, and the function isWeekend.
Before this FR, writing isWeekend is rather tedious: it requires:

  • make a new definition with the right name
  • insert a lambda
  • insert a case (I'll assume this inserts )
  • insert 7 branches
  • for two branches, insert True
  • for five branches, insert False
    Let us assume that the branch bodies (i.e. RHSs) are more complex, so they take most of the effort.

We can here get away with having two named branches and a wildcard. However, there is still duplication in the two branches, and (depending on our choices for what happens when editing types) if we added a new day it would automatically and silently be a non-weekend, which may be surprising!

An "or-pattern" is a way to have a branch match multiple constructors, and reduce to the same RHS (see ML, or https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0522-or-patterns.rst).
One can think of a wildcard as an or-pattern which lists every constructor that has not been mentioned in a named branch.
In this example we could write something like

isWeekend : Day -> Bool
isWeekend = λd . case d of
  Monday | Tuesday | Wednesday | Thursday | Friday -> False
  Saturday | Sunday -> True

There are some choices to be made when constructors have arguments: for data D = A Int | B Bool Int | C
the following are obviously nonsense

case e of
  A x | B x y -> ...
  C -> ...

case e of
  A x | B x _ -> ...
  C -> ...

case e of
  A x | C -> ...
  B x y -> ...

but the following may be accepted (is easy with plain ADTs, is harder for Haskell with GADTs introducing type equalities) (using an _ to mean "unnamed/ignored variable"), but we do not have to accept them:

case e of
  A x | B _ x -> ...
  C -> ...

case e of
  A _ | C -> ...
  B x y -> ...

There is a choice whether to allow or-patterns nestedly, or only at top level.

With or-patterns, there should be an option to "merge" two patterns (keeping only the common variables, considering name and type). However, there is the question of how to merge the RHS -- perhaps only offer if one of them is a hole?

Deep matching

We briefly describe "deep" matching. If this seems like something we wish to include, it deserves its own FR.
A deep match is matching on more than one level of the scrutinee.
Thus patterns are recursive: pat ::= con pat ... pat | var.
There is a design choice whether to allow a plain variable (note a wildcard would then be a plain variable which is ignored) or whether to enforce the top level being a constructor.

The typing rules are essentially the same, except that more variables can be brought into scope (their types are easy to calculate, since we know the type of the root pattern) and the coverage checks change:
above, where we say "[Ci,...] is a list of all the constructors of the type `T` " we should say something like "the patterns cover every possibility, and none of them are redundant".
This is somewhat tricky to state formally, I shall simply refer to prior work for now: http://moscova.inria.fr/~maranget/papers/warn/warn.pdf https://www.cl.cam.ac.uk/~nk480/pattern-popl09.pdf https://people.cs.kuleuven.be/~tom.schrijvers/Research/papers/icfp2015.pdf https://www.microsoft.com/en-us/research/uploads/prod/2020/03/lyg_ext.pdf

For both evaluation, the order of branches now matters, since multiple patterns may match the same scrutinee. As a simple example,

case C True False of
C True x -> 1
C x False -> 2

where the result depends on the priority of the branches.
More subtly, if the scrutinee was C (letrec x:A=x in x) False, then the priority of the branches would affect the termination of the program.
Similarly (see GMTM, section 8.1) the order of patterns in one line matters (at least for a lazy case):

case C (letrec x:A=x in x) False of
C True True -> 1
C x False -> 2
C False True -> 3

If we attempt to evaluate and first ask "does this match the first branch, in the first argument" we will attempt to evaluate a divergent term to WHNF. If we instead asked "does this match the first branch, in the second argument" we would see clearly not, and then (say) move on to "does this match the second branch, in the ... argument" and see it does, in both, returning 2.

What order should be chosen?
The classical choice is "branches top-to-bottom, arguments left-to-right, as written in the source".
One possible alternative (that I don't know of prior work on) is to enforce the existance of meets: i.e. whenever multiple patterns match a term, then there is a unique "most specific one", and this is the one that is matched. One drawback (other than complexity) is that this forces more branches to exist, giving bigger programs, however "or-patterns" can minimise this problem.

The evaluation rule is now case e of ... ; p{xs} -> r{xs} ; ... ~> r{as} where p is a pattern that binds the variables xs and is the highest-priority pattern which matches e, giving assignments of terms as to xs.
Here "highest priority" is discussed above, when considering ordering.
"Matches, giving assignments" can be formalised by considering patterns as a subset of terms and saying "there is a substitution of the terms as for the variables xs such p{as} = e"
I think when having deep matching, the semantics should be either fully lazy ("evaluate as much as needed") or fully strict ("always fully evaluate"), but not head-nomalise ("force one layer of constructors").

For actions, depending on our choices about ordering, there may need to be an action to reorder.
There would also need to be an action to make patterns: when focussed on one variable in one pattern, it should split one option out, leaving a variable to catch all the other cases or the sole remaining constructor if there is only one option (e.g. change x into Just y and Nothing, or d into Wednesday and d'), duplicating the rest of the branch (if that variable is used in the RHS, it would need to be substituted or let-bound).
Conversely, an action for deleting or merging is needed, however the precise spec here is not yet clear.
In both cases, there is the issue of what to do with a "clash": consider

case e of
  C True y -> ...
  C x    False -> ...

and then deleting True, or splitting False from y and True from x.

Discussion

None

Future work

See "not in spec"

@brprice brprice added enhancement New feature or request triage This issue needs triage labels May 16, 2023
@brprice brprice self-assigned this May 16, 2023
@dhess dhess removed the triage This issue needs triage label May 17, 2023
@dhess
Copy link
Member

dhess commented May 17, 2023

Thanks for putting this together.

My initial thoughts:

Wildcard branches

  • For non-primitive types, I would like to elide wildcards from match-related Actions in beginner mode(s). I think it makes sense to scaffold wildcards. Initially, always be explicit and require that the student write an output for every value constructor in a type (i.e., no wildcards). Later, point out that in practice, quite often many branches in a given match expression have the same output value. This will help motivate why wildcards are useful, and also hopefully reduce the cognitive load by eliminating an extra concept (wildcards) from the initial learning about match expressions.
  • On the other hand, for primitive types, they're unavoidable. Personally, I think I would try to avoid talking about primitives altogether until students have a mastery of student-defined types, but as we've seen, some teachers will want to start with Int and Char, so wildcards will be required even in beginner modes for primitive type match expressions.
    • Therefore, for primitive match Actions, the match with Action should probably just insert a wildcard.
    • Alternatively, or additionally, we could annotate all primitive types with a required base case value, to be used by the match with Action to insert along with the wildcard pattern at the end. For example, Int would have a base case of 0, Char would have a or A, etc. Then the match with Action for a subject of type Int would insert the 0 and * branches in the initial match with expression tree. Students could, of course, delete the base case and leave just the wildcard.

Eval rules and ordering

  • I think that eventual support for deep matching is a certainty, so presumably that should eliminate WHNF-strict eval order from the discussion.
  • I think it makes sense that the initial implementation enforces order by inserting constructors in the order in which they're defined in the typedef, so long as we can reorder all occurrences when the student changes the order in the typedef. (For example, in order to preserve the student's existing work, it would be nice to have "move before," "move after," "move to first," and "move to last" actions on value constructors in typedefs, which would need to find all occurrences of this typedef in match expressions and do the same. Practically speaking, this will have to be left for future work, in any case. In our initial implementation, they'll probably have to first delete the constructor they want to reorder and then re-insert it, losing any existing work in the process.)
    • (On the other hand, once we introduce deep patterns, we probably don't want to change the order of existing match expression branches when the student re-orders constructors in the typedef? Hmm....)
  • I think we should teach that branch order matters from the very start, since we'll eventually want to add deep matching, and I think it would not be helpful to say initially, "Don't worry about the order, it doesn't matter," and then change the story once we introduce deep pattern matching in the curriculum. Most likely students will naturally think that left-to-right "trees" are ordered, anyway, and I believe that was one of the motivations for drawing match expressions this way in the first place.
  • Once we have deep pattern-matching, I think we should add "move before," "move after," "move to first," and "move to last" actions on branches, and that will be sufficient. Presumably any occurrence of the bare wildcard pattern * should be position-immutable: it will always go last, and "move to last" would really mean, "move to last (except for wildcard)."
    • In the frontend, these actions could be mapped to the left- and right-arrow keys, and Home and End (for first/last) on extended keyboards, as well.
    • Later we could introduce drag & drop, as well.

Or patterns

Overall, I'm not inclined to implement this:

  • Short of the benefit of handling the "adding a new constructor means your existing patterns may be inexhaustive" case, this proposal seems to me like pure ergonomics. Historically speaking, we're not particularly concerned with ergonomics at the programming language level.
  • Therefore I think it would be more consistent with our design principles to implement this as a macro Action, and not at the language level.
    • i.e., write an Action where you can select multiple value constructors and tie their output values together. (I haven't given this much thought yet.)
  • We should be able to handle the "adding a new constructor leads to inexhaustive patterns" scenario by automatically inserting a new match with branch to all existing uses of the affected type: simply set the newly-inserted branch pattern to the new value constructor, and its output to ?.

Deep matching

  • Let's flesh this out enough in this FR so that we preserve a path forward by ensuring that we don't need to alter the shallow pattern-matching semantics once we implement deep matching.
    • In other words, make sure we can add it later in a backwards-compatible way; e.g., dictate in the initial design that branch order matters.
  • But obviously, we're not going to do this now, and probably not for Primer 1.0, either, unless it falls out really naturally from this FR and the work @georgefst is doing on building typedefs on the canvas.

@brprice brprice changed the title [WIP] FR: Pattern matching wildcards and actions FR: Pattern matching wildcards and actions May 24, 2023
@brprice
Copy link
Contributor Author

brprice commented Jun 1, 2023

The implementation of this is started at #1049. It needs another pass when we finalise this FR to ensure that the implementation and spec line up.

@dhess
Copy link
Member

dhess commented Aug 8, 2023

Where did this end up wrt. #1049?

@dhess dhess added this to the Primer (language) 0.8 milestone Aug 8, 2023
@brprice
Copy link
Contributor Author

brprice commented Aug 9, 2023

Thanks for digging through the old issues!

We have implemented wildcards (and thus matching on primitives). I have edited the FR above to clarify what choices we made with the implementation details.


WRT your comment above:

For non-primitive types, I would like to elide wildcards from match-related Actions in beginner mode(s).

This has not been done, but would be easy to add

[thoughts about branch ordering]

This has indeed all been left for future work

Or-patterns

Agreed not to implement

Deep matching

As you mentioned, this is future work.

@georgefst
Copy link
Contributor

While we're on the topic, one behaviour I've noticed, which I couldn't quite tell whether it's intentional or not (I don;t think I see it in the spec, but the relevant section is quite complex), is that the initial branches in a match depend on the order of actions:

  • Starting with a match node, then filling the scrutinee, results in a single wildcard branch.
  • Inserting a match node in a non-hole puts that expression in the scrutinee and generates the full set of non-wildcard branches.

I would think the latter behaviour is more useful, and we should only default to wildcards for primitive scrutinees.

@brprice
Copy link
Contributor Author

brprice commented Aug 15, 2023

While we're on the topic, one behaviour I've noticed, which I couldn't quite tell whether it's intentional or not (I don;t think I see it in the spec, but the relevant section is quite complex), is that the initial branches in a match depend on the order of actions:

* Starting with a `match` node, then filling the scrutinee, results in a single wildcard branch.

* Inserting a `match` node in a non-hole puts that expression in the scrutinee and generates the full set of non-wildcard branches.

This is intentional-in-the-implementation. The critical interactions are (a) creating a new match creates all branches and (b) changing the (synthesised) type of the scrutinee ensures that the branches are consistent. To be explicit about the mechanics of the current implementation: the two sequences are

  • match first, filling in the scrutinee-hole after (perhaps better to think of as doing a match on a hole, then changing the scrutinee)
    • firstly the action will create get case ? of with no branches
    • now the typechecker will fix up the branches: we expect just a wildcard branch, so get case ? of _ -> ?
    • the next action will create case x of _ -> ?
    • the typechecker runs (let's assume x ∈ Bool) and requires branches consistent with {True; False}. Since we have just a wildcard match, this is ok and no changes happen.
  • fill in the hole first, then match
    • we start with x, and let's assume x ∈ Bool.
    • the match action creates case x of True -> ?; False -> ?

I would think the latter behaviour (i.e. filling in a hole-scrutinee will update the branches) is more useful, and we should only default to wildcards for primitive scrutinees.

I think I have a mild preference agreeing with you, but this could do with more thought about pedagogy and ergonomics-for-a-beginner. However, I am not sure of the precise spec that you would like here. Let's enumerate a few scenarios:

  • Pure construction (essentially your two above)
    • start with hole, create case, atomically fill in hole
    • start with hole, create case, fill in hole in a few actions which don't modify the type (after the initial filling in the scrutinee-hole e.g. top-down creation of not (null l) is non-type-changing, but bottom-up is not)
    • start with completed thing-you-want-to-scrutinise, then do a match on it
  • Editing, changing type (each of three cases: has only a wildcard match (with non-trivial RHS); has some ADT-branches and a wildcard (with non-trivial RHS); has only ADT-branches)
    • construction of a complex scrutinee where the type changes, e.g. ? ; Cons ? ?; null (Cons ? ?)
    • type permanently changes: e.g. 1 to even 1
    • type temporarily changes: e.g. 1 to even 1 to if (even 1) then 2 else 3
  • Editing, preserving type
    • True to not True

What would you like to see in each of these cases? What is the general rule, and is it consistent with "just do all branch modifications in the typechecker"?


For the result of changing a scrutinee, the relevant part of the spec is

In the implementation this is dealt with mostly via "smartholes".
The relevant effect here is when the type of the scrutinee changes.
Note that this may well happen often during writing a program: one may start with a hole, then insert a pattern match (giving a hole scrutinee), then fill in the scrutinee (changing its type).

The general problem is different types have different constructors and thus require different branches.
In line with the current implementation (which requires every branch to exist, and does not have wildcards), we will simply remove all current branches and re-create the correct ones.

This is perhaps poorly-written, and technically differs slightly from the implementation (indeed, your example is actually called out specifically here!). The issue is that the implementation does not actually react to "the scrutinee('s type) changed", but simply typechecks the resulting program; since a case-of-hole has just a wildcard branch, and this is consistent with any type, we do not recreate the branches.

@brprice
Copy link
Contributor Author

brprice commented Dec 8, 2023

Closing as the main feature was implemented in #1049. There are other possible choices, enhancements and future work, but they would be best served by a new FR. Some future work is tracked in #334.

@brprice brprice closed this as completed Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants