Early version of RFC: Pattern matching ? #245

alehander92 · 2020-07-23T19:48:19Z

Pattern matching for Nim

Pattern matching can be a good addition to the stdlib/fusion. It seems it becomes more mainstream:

This is an early version of a RFC: we need to decide what to do before having a more complete document.

We already have several pattern matching libraries :

patty by Andrea Ferretti https://github.com/andreaferretti/patty
ast-pattern-matching by Arne Doring https://github.com/krux02/ast-pattern-matching
gara by myself https://github.com/alehander92/gara/ (not well maintained)
possibly others? maybe case statement macro dsl-s? please write a comment if you want to add something

A first question is, what do we want:

just keep it in third party libraries
standartize one and put it in fusion
invent a new dsl based on case stamenets/better research of other languages and put it in fusion
standartize one and put it in stdlib
invent a new dsl based on case stamenets/better research of other languages and put it in stdlib

Motivation

Easier to write complicated conditions
Easier to extract values
More flexible syntax

TODO

case a:
of Circle(r: @b):
  echo b
of c@Other:
  echo c
else:
  discard

Many more examples in gara, ast_pattern_matching and patty's docs: TODO when a general approach is decided on

Constraints

compile time: ast and other values : as @krux02's ast-pattern-matching
runtime for most nim types
extensible
zero/low overhead
exhaustiveness checking
maybe: make it easily swapable for other pattern matching lib
maybe: make it support binary matching like Erlang/Elixir

Research

It would be useful to research a bit more Rust, Haskell, OCaml, Python, Ruby, Scala, Erlang, Prolog, Elixir

We can also compare several possible designs like the Python folks did in their preparation for the feature.

Next steps

Decide on general direction
Offer 1 or more designs in a more detailed version of the RFC
Implement

Future directions

Not sure

Disclaimer

I am the author of one of the libs gara, so I might be biased.

pinging @krux02 and @andreaferretti which probably have something to say on the topic

andreaferretti · 2020-07-24T07:40:42Z

I have nothing in particular to add, except for the fact that pattern matching is often done on algebraic data types. Nim today has almost algebraic data types, via variant objects, but:

with an ugly syntax
with a limitation that two different variants cannot have fields with the same name

If pattern matching lands in stdlib, it would be nice if at least the second of these issues is solved. If I remember correctly there is another RFC for that.

Other than this, I would be happy with just putting gara in the stdlib, as it seems the most complete option today

Varriount · 2020-07-24T09:24:37Z

Ideally, pattern matching should work well with AST manipulation, since recognizing structural patterns in the AST tends to be 50% of a macro's job.

That being said, I've only ever needed the kind of pattern matching provided by something like py-good. I can't recall ever having needed it anywhere else.

haxscramper · 2020-09-20T12:18:29Z

Additional arguments in favor of pattern matching in stdlib

A lot of popular programming languages (almost all with exception of Go/C that position themselves as "simple" languages with very few functional features) either already have support for pattern matching or existing proposals.

rust
python
even java (proposal for openjdk)
C++ (proposal)
Kotlin (proposal)
C#
ruby
crystal
ada-spark
And of course any functional language such as haskell, ocaml etc.

I'm not a big functional programmer myself, but i think Nim is already a really good functional programming language - closures, sum types, optional types etc. Support for custom operators and UFCS makes transition even more seamless. Pattern matching is an important addition to the list if features that would allow nim to position itself as a functional programming language.

Use `caseStmtMacro`

I think pattern matching in stdlib should be implemented using caseStmtMacros that are currently experimental, but quite usable (I tried implementing pattern matching and there was no immediate issues). The reason - this way pattern matching would be just an extension of case statement and won't introduce any new concepts/keywords to the language. Can be added as another feature in sugar module.

Set of features should be relatively small

Second - I think it is important to keep list of supported features relatively small - e.g. no predicate validation (match field only if it satisfies some kind of predicate), unification etc. The reason - due to potential number of extensions to concept of pattern matching it is very easy to end up with feature creep where macro creates whole new semantics for case statement. Although some additions to pattern matching would certainly be useful. For example - allowing abitrary predicate extensions seems like a little too much for me, but checking if value is in set kind: in {enEE, enEE1} will certainly be useful in a lot of situations.

It is also important to not require any modifications to existing types - reuse existing len and `[]` overloads as well as kind fields. This makes certain assumptions about what functions are implemented for object, but I would argue this is a good thing. E.g. further promoting idea that all int-indexable (arrays, sequences etc.) shoud implement `[]` and len, all case objects should have kind etc. as it would only lead to more consistent API across all code.

Pattern matching on key-value pairs

Pattern matching on key-value tables (json for example) might be really practical in a lot of scenarios. Possible syntax:

    assertEq "000", case %{"hello" : %"world"}:
           of {"999": _}: "nice"
           of {"hello": _}: "000"
           else: "discard"

`NimNode`

One of the main use cases for pattern matching in nim is macros. NimNode probably the most used case object and large portion of macro is just re-parsing of input structures. Pattern matching should account for this.

In addition to making nim node pattern matching easier it will also make macros more declarative (for simple macros): it would be easier to understand what kind of input is expected and what will be generated from it.

My implementation

I'm currently working on my own pattern matching library based on case statement macros. You can see example tests here - https://github.com/haxscramper/hmisc/blob/master/tests/tMatching.nim

I originally wanted to PR it as a stdlib module but then decided to first ask for feedback on general idea/syntax. Right now it is a little over-engineered and has dependencies on other part of hmisc. If it can be accepted as stdlib module I will make it self-contained and PR it into stdlib.

Examples

    case [1,2,3,4]:
      of [_]: fail()
      of [_, 3, _]: fail()
      of [_, 2, 3, _]:
        discard

Support for regular elif branches in case stmts

    assertEq "default fallback", case (a: 22, b: 90):
           of (_, b: 91): "900999"
           elif "some other" == "check": "rly?"
           elif true: "default fallback"
           else: raiseAssert("#[ not possible ! ]#")

Todo list

There are some things that I still have on todo list such as

Trailing zero-or-more (*<patt>) for lists
Use *(<patt>) in non-trailing format - [$a, *_, $b] to match first and last element in sequence.
exact and prefix match for sequences: right now matching using [] pattern requires len to be equal - in a lot of situations it is necessary to just check first couple of items. Can be implemented using *_ with special treatment for this pattern (e.g. just don't check anything) but I would like to have something closer to [ , ] ... with trailing ellipsis. But that is just preference mostly.
List variables - similar to nimtrs where I have distinction between $scalarVariable that can only be assigned once and @listVariable which will store all appended elements.
Other suggestions?
- Pattern matching on data streams ?
- Maybe adapt some syntax ideas from rust pattern matching for macros: link. Pattern matching for NimNode is a most common use case, so it might be a good idea to adapt some solutions from rust.
- Another potential source of syntax ideas are term rewriting macros

juancarlospaco · 2020-09-21T01:53:49Z

More programming languages using Pattern Matching:

Scala, Scala-Native, ScalaJS.
Swift.
F Sharp.
ML.
Haskell.
D.

alehander92 · 2020-09-21T06:24:24Z

@haxscramper cool! but please take a look at impl-s of patty/ast-pattern-matching/gara . They do some of those, so this might be cool. Also please keep in mind the goal of zero overhead : expanded code should be usually similar to hand-written condition/extraction(not always happening in those libs tho)

Awesome!

alehander92 · 2020-09-21T06:25:35Z

@haxscramper thanks and Also: can you write down a bit more detailed test spec/proposal here or in your own rfc : that was one of the goals of the rfc in my mind, to discuss a more official dsl

haxscramper · 2020-09-21T11:13:05Z

@alehander92

After some thinking I came to conclusion it is not necessary to specifically annotate captured variables with prefix since it really differs from any other nim dsl in stdlib. I started with $capturedVar but now I think it is not really necessary. Type of matched variable can be inferred based on pattern context - e.g. variable is assigned (=) if used in regular pattern and appended (.add) if it is inside of repetition *(<pattern>) although I've done this in nimtrs and current version of matching I think it is not that good of a solution in general.

Capture and submatch can be done using infix is, equality comparison (any other prefix?) will be rewritten in infix notation, e.g. fld: == SomeVal to match field against value will be rewritten into ... .fld == SomeVal. Capture and match can be written as fld: captured == SomeVal for infix operator checks (in, ==) or as fld: capt is SomePattern( ... )

Although I think this is a really slippery slope because there will be a lot of different behaviors of the match depending on written expression (e.g. == Val vs var == Val vs var is Patt) so it is necessary to be really careful here. As I already mentioned it is really easy to just sit for a while and come up with dozens of possible nice things that can be added (regex pattern extraction, custom unpackers, predicatess, regex-like pattern matching for sequences (like in nimtrs) and so on).

One thing I think that is really necessary is to make it look closer to pattern matching in functional language and not enum case-of v2.0. Default nim convention for enum naming is to have abbrValueName and ObjectName. I think it should be possible to omit the abbr prefix. Eg. I want to write IfStmt instead of nnkIfStmt etc.

More examples

Create value to compare against

    macro e(body: untyped): untyped =
      case body[0]:
        of ForStmt([$ident, _, $expr]):
          quote do:
            9
        of ForStmt([$ident, Infix([%ident(".."), $rbegin, $rend]),
                    $body]):
          quote do:
            `rbegin` + `rend`
        else:
          quote do:
            90


    let a = e:
      for i in 10 .. 12:
        echo i

    assertEq a, 22

I would prefer to write `expr` instead of %expr but right now it seems it is not possible to add arbitrary expressions inside of quoted part. Syntax is just taken from nimtrs. Since it is not possible to differentiate newValue() from KindOfCaseObject() I decided to just prefix it with % for now.

Iflet macro

    macro ifLet2(head: untyped,  body: untyped): untyped =
      case head[0]:
        of Asgn([$lhs is Ident(), $rhs]):
          quote do:
            let expr = `rhs`
            if expr.isSome():
              let `lhs` = expr.get()
              `body`
        else:
          head[0].assertNodeKind({nnkAsgn})
          head[0][0].assertNodeKind({nnkIdent})
          head[0].raiseCodeError("Expected assgn expression")

    ifLet2 (nice = some(69)):
      echo nice

    ifLet2 (`nice` = some(69)):
      echo nice

It uses some of my own node kind assertions, but this can be removed when design is finalized. Although due to complexity of pattern matching I think having something like this to show errors would be nice. But that is a topic for another RFC/PR.

Unexpected node kind. Expected one of {nnkIdent} but found nnkAccQuoted

 112          echo nice
 113    
 116:12     ifLet2 (`nice` = some(69)):
                    ^~~~~~
                    |
                    nnkAccQuoted

Non-goals

Support for variable unification - even though it is really nice it have in some cases it adds too much complexity to the implementation (e.g. when you have alternatives that each bind the same variable). I think it is a good idea to completely disallow
High extensibility - pattern matching is complex enough on it's own and I would prefer to have more simple implementation that forces end user to provide all additional checks either via if/else or using nested pattern matches. Some things are used often enough (check if in set or equal to some specific value) so they should be supported out of the box.

Other goals

make it easily swapable for other pattern matching lib

If case statement macro will be moved to 'stable' part of the language (e.g. no longer hidden behind experimental switch) this can be done quite easily I think.

Some comments on other libraries

Patty

I don't think matching non-tuple types by position only is a good idea as it makes matches really fragile in case new fields are added (which is much more likely to happen than with regular anonymous tuple).

ast-pattern-matching

Supports only nim nodes

haxscramper · 2020-09-21T11:17:48Z

@alehander92 just addition to whatever I wrote above to RFC comment for pattern matching - I think it is better to just take as much potential ideas for syntax and features and then sort through them to determine if this is really necessary. But what I'm really certain about several things

Pattern matching should be an extensio of case
Unification is overkill in most cases
Special syntax for variable captures should be carefully considered - is it really necessary or will it only make code less readable
Omitting abbreviated enum name when matching kind is a huge plus for readability.

Araq · 2020-09-21T11:22:21Z

The only thing I really care about is supporting if conditions with the pattern like so:

case n.kind
of foo and n.len == 4:
  actionA
else:
  actionB

This might require a language extension as a macro can only support it well via code duplication.

haxscramper · 2020-09-21T11:27:29Z

@Araq, This is already supported thanks to UFCS - I just generate access path to the field in object, without differentiating between fields and function calls

    macro e(body: untyped): untyped =
      expandMacros:
        case body:
          of Bracket(len: in {3 .. 6}):
            newLit(expr.toStrLit().strVal() & " matched")
          else:
            newLit("not matched")

    echo e([2,3,4])
    echo e([3, 4])

Generated code

block:
  let expr = body
  if kind(expr) == nnkBracket and contains({3..6}, len(expr)): newLit(
      strVal(toStrLit(expr)) & " matched")
  else:
    newLit("not matched")

alehander92 · 2020-09-21T11:29:41Z

@haxscramper thanks!

Pattern matching should be an extensio of case

sounds good

Unification is overkill in most cases

iirc it seemed not hard to implement, but this is not a top goal indeed

Special syntax for variable captures should be carefully considered - is it really necessary or will it only make code less readable

makes sense: I'll look at the bigger comment later

Omitting abbreviated enum name when matching kind is a huge plus for readability.

yep!

alehander92 · 2020-09-21T21:18:07Z

please keep in mind that probably nim nodes should work in a similar way to other variant types(even if nim nodes are defined as a more special type): we don't want to overspecialize for one case !

Also, exhaustiveness checking would be useful even if not for all possible situations 👍

haxscramper · 2020-09-21T21:25:50Z

@alehander92

There is zero specific distinction for Nim node in a way I currently implement it - I have a little hack that just takes .kind type implementation and determines correct prefix that should be added if necessary.

https://github.com/haxscramper/hmisc/blob/f58838040170547e19c01bd6b9f3e42a16811f91/src/hmisc/macros/matching.nim#L44

Exhaustiveness check is possible in theory but only for simpler patterns and it would make things more complicated as it would be necessary to either write custom checker or somehow lift kind fields and do more sophisticated rewrite to take advantage of case checks. Right now this is a simple rewrite to series of if branches with no specific checks.

alehander92 · 2020-09-21T21:27:37Z

iirc exhaustiveness checking isn't really that hard . Also we want to take advantage of generation of case if possible: pattern matching should be zero overhead. Of course those are not top priorities, so probably they don't need to be implemented now

alehander92 · 2020-09-21T21:28:13Z

basically for compound patterns you need to compose the simpler checks and if you already have those: this might not be too hard, but this is just old brainstorming, so I might be very wrong about it

haxscramper · 2020-09-21T21:36:17Z

tl;dr - maybe I'm missing something, but to me it seems like extremely difficult task.

Whether or not it is possible to check for exhaustiveness is a question that should be delayed until after we have finalized set of supported features. For example set checks in {3..5} would require answering questions like "does this level of pattern tries all values or something is not matched yet". Things like ("hello", _), [2 | 3, _] will also make it necessary to handle arrays, tuples etc. Due to how matching is done for function (e.g. example with len) it will be even more complicated.

alehander92 · 2020-09-21T21:40:27Z

disclaimer: don't really look hard into this: maybe you're right and it's not so useful for Nim !

hm, I might not understand all the details .. but this means that it might be good to design the dsl in a way that it makes it a bit more prone to exhaustiveness checking.

but what's the problem with set / array checks? we would have just several data structures to check for, I guess

and yeah, as it's not only about the shape, the analysis can always be a bit more conservative, but a useful thing would be to just autogenerate a case which is not being handled as a warning: this is out of scope tho, so I am just discussing ideas

haxscramper · 2020-09-22T10:49:18Z

More todo

Remove $a for variable captures and replace it with just a for variable, add non-trailling pattern matches.
Support for set checks.
- {2} means len(set) == 1 and contains(set, 2),
- {2, *@others} means len(set) >= 1 and contains(set, 2) followed by for val in set: if val != 2: other.incl val
Disallow multiple uses for the same variable in single pattern

haxscramper · 2020-09-23T10:05:30Z

Replacing special prefixes with keywords?

any instead of .._
any @var, until Patt() after variadic instead of *@var, Patt()
any @var, incl Patt() instead of **@var, Patt()

    # With special keyworkds like `add`, `until`, `incl`
    [any @leading, until @middle is "d", any @trailing]
    [any, Patt(), any]
    # With prefix annotations like `*`
    [*@leading, @middle is "d", *@trailing]
    [.._, Patt(), .._]

Variadic match would use any, search-until is until or incl. For optional match use opt. Capture wll be in form <kwd> @<var> or <kwd> @<var> <op> <rhs> (@middle is "d" for example)

And just in general would you rather prefer DSL to introduce keywords or work with special symbols like ?@, *@, .._ ?

We will still have opt @var ? default and other things, but in general it would be <kwd> @<var> or <kwd> @<var> <infix> <rhs>

And <fld>: <prefix> <rhs> or <fld> @<var> <infix> <rhs> for field/kv-pair captures

haxscramper · 2020-09-23T11:11:37Z

Match expression

Match expression

Check if tuple element is in set: (in {1, 2, 3})
Tuple element equals to something: (== 123)
Field is in set (fld: in {2, 3, 4})
Field matches pattern (fld: is Patt())
Match and capture field (fld: @captured is Patt())

Tuple matching

case (true, false):
  of (true, true) | (false, false): 3
  else: 2

Object/named tuple matching

Matching of fields is performed using (fld: <expr>) for both objects and tuples.

To match case object you can either use Kind( ... ) to match only instances of single kind or (kind: in <set of values>) to test against multiple possible values. It is not necessary to add enum prefix - e.g. to match nnkIfStmt you can write IfStmt( ... ).

Object matches can contains either fld: ... matches or [elems] for cases when you need to check items accessible via [] operator.

Match for statement: ForStmt([@ident, _, @expr])
For statement with infix .. as expression: ForStmt([@ident, Infix([== ident(".."), @rbegin, @rend]), @body]) and capture loop variable, @rbegin .. @rend and @body

Matching optionas ((fld: Option[T]))

Get optional value from object field, otherwise set to default: fld: @opt ? "default"

Thanks to UFCS I can treat functions as regular fields, making it possible to have patterns like

macro e(body: untyped): untyped =
    case body:
      of Bracket([Bracket(len: in {1 .. 3})]):
        newLit("Nested bracket !")
      of Bracket(len: in {3 .. 6}):
        newLit(expr.toStrLit().strVal() & " matched")
      else:
        newLit("not matched")

echo e([2,3,4])
echo e([[1, 3, 4]])
echo e([3, 4])

[2, 3, 4] matched
Nested bracket !
not matched

Sequence matching

Sequence elements can be either matched exactly (for example [_] to match sequence with exactly one element) or using variadic expressions.

Right now I have two ideas for expression syntax - first one relies on use of symbols like * and the other one introduces more keywords. Example of uses

capture all elements in sequence: [any @elems] or [*@elems]
get all elements until (not including "d"): [any @leading, until "d"] or [*@leading, "d"]
All elements including all following: [any @leading, with "d"] or [**@leading, "d"]
Match first two elements and ignore the rest [_, _, any] or [_, _, .._]
Match optional third element [_, _, opt @trail] or [_, _, ?@trail]
Match third element and if not matched use default value [_, _, opt @trail ? "default"] or [_, _, "default" ?@trail]
Capture all elements until first separator: [add @leading, until @middle is "sep", add @trailing]
Extract all conditions from IfStmt: IfStmt([add ElseIf([@cond, _])])

I'm more in favor of keyword-heavy syntax because **@, *@, .._ is looks worse than any , until, any, with. Trailing ellipsis might be added as alternative to standalone any

Set matching

Check if only 2 and 3 is in set: {2, 3}
Set contains 2: {2, .._}
Check if set contains 9 and capture all other elements: {9, @other}

Key-value pairs matching

Match only if has "key": {"key": _}
Capture value from key: {"key": @val}

case %{"hello" : %"world"}:
  of {"999": _}: "nice"
  of {"hello": _}: "000"
  else: "discard"

metagn · 2020-09-23T11:58:23Z

of (true, true) | (false, false): 3

@haxscramper Just to clarify: Is this different from of (true, true), (false, false):, and if so, would the comma version still work?

haxscramper · 2020-09-23T12:18:34Z

@hlaaftana No it is not different and comma version should work the same way it works in regular case.

So

of (true, false), (false, true):

of (true, false) | (false, true):

Should be identical

haxscramper · 2020-09-23T13:10:51Z

Another use case for pattern matching is input data validation. One of the main use cases for pattern matching is macro implementation.

Use isnot as optional clause to execute when something does not match. It is not really useful for pattern matching with multiple branches (because not matched branch does not signal that input data is invalid), but should be really handy when doing basically anything related to external input validation (json, nim node etc.)

I withdraw my previous statement about support for custom predicates - it is not really that hard to implement, so there is no reason to avoid it, but it will be very useful for input validation.

@var.matchesCond() - check if element matches condition and bind it to var
_.matchesCond() - only check predicate

I think pattern matching (data destructuring) should at least have all features of data synthesis (object construction, json input, array comprehension etc), so support for validation is a good thing to have.

  {
    # Match pattern, otherwise execute `doError`
    "key": @a is ("2" | "3") isnot doError(),
    # execute regular code
    "key": @a is ("2" | "3") isnot (
      echo "Expected `2` or `3`"
    ),
    # Do nothing on fail
    "key": @a is ("2" | "3"),
    # Execute callback if match failed
    "key": "2" | "3" isnot doError(),
    # Check for match
    "key": "2" | "3",

    "key": _.isString() isnot (echo "Expected string for key"),
    "key": @a.isInt() isnot (echo "expected integer")
  }

  (
    fld: ForStmt([
      # First element must be an `nnkIdent("..")`
      == ident("..") isnot (
        echo "Expected infix `..` on ", it.lineInfoObj()
      ),

      # Second element should have kind `IntLit()` and will be bound
      # to injected vaeriable `a`
      @a is IntLit() isnot (
        echo "Expected integer literal `..`", it.lineInfoObj()
      ),

      @a is (
        IntLit() or StrLit()
      ) isnot (
        echo "Expected either integer literal or string but found " &
          $it.lineInfoObj()
      )
    ])
  )

https://github.com/Originate/lodash-match-pattern

is Patt(): check if matches pattern
in Set: is in set
== Val: Is equal to value
add @var: only in sequence patterns. Add all elements to var until input end reached or next pattern matched
add, until Expr: match all until frist Expr
add, incl Expr: match all including all Expr
opt @a: match optional element and bind it to a
opt @a ? "default": either match element to a or set a to "default"

haxscramper · 2020-09-23T19:01:58Z

Final draft of the syntax

Final draft of the syntax

To me it looks good enough, so most likely I will be implementing this or something really simiar to it.

Introduce stdlib module matching
Make caseStmtMacros a non-experimental feature

Overload for match macro with following prototype:

macro match*(
  n: tuple | object | ref object | seq | array | set
): untyped

New matches macro with following prottype:

macro matches*(
  n: tuple | object | ref object | seq | array | set,
  expr: untyped,
  makeDiagnostics: bool = false
      ): untyped

assertMatch template

template assertMatch*(input: typed, pattern: untyped): untyped

MatchDiagnostics objec

type
  MatchDiagnostics = object
    matchFails*: seq[tuple[path: Path, diagn: string]] ## Position of
    ## the the first match failure. If multiple alternatives are tried
    ## (`1 | 2`) may contain more than one path. `diagn` is a string
    ## describing match failure

matches is a version of match that works for single checks, evaluating either as true or false if makeDiangostics is false. If makeDiagnostics is true tuple[matchOk: bool, matchDiagn: MatchDiagnostics] is returned.
assertMatch is a wrapper on top of match with semantics similar to
```
template assertMatch(input: typed, pattern: untyped): untyped =
  let (matchRes, diagnostics) = matches(input, pattern, makeDiagnostics = true)
  if not matchRes:
    raise newMatchError(diagnostics)
```
All variables injected by matches are available in the scope, so you can write assertMatch(<user input>, [all @elems is JString()] and just use elems down the line, knowing this is a array of json strings.

Supported structures for pattersn

Similar to flexible serialization RFC #247, nimtrs several kinds of collections are supported which are differentiated based on pattern match syntax.

seqs - matched using [Patt1(), Patt2(), ..]. Must have len(): int and [int]: T defined. ¹
tuples - matched using (Patt1(), Patt2(), ..). ²
pairable - matched using {Key: Patt()}. Must have [Key]: T defined. Key is not a pattern - search for whole collection won't be performed.
set - matched using {Val1, Val2, .._}. Must have contains defined. If variable is captured then Val1 must be comparable and collection should also implement items and incl.
object - matched using (field: Val). Case objects are matched using Kind(field: Val) or, if you want to check agains multiple values for kind field (kind: in {set, of, values})

Element access

Where expr is a result of evaluation for case head. If case head is an identifier it will be used as-is in pat subsitution. Otherwise let expr = <case-head> will be injected in scope. expr is not gensymed, making it possible to access it when necessary.

(fld: "3"): Match field fld against "0". Generated access is expr.fld == "3".
["2"]: Match first element of expression agains patt. Generate acess expr[0] == "2"
("2"): Same as sequence
{"key": "val"}: First check "key" in expr and then expr["key"] == "val". No exception on missing keys, just fail match.

It is possible to have mixed assess for objects. Mixed object access via (gg: _, [], {}) creates the same code for checking. E.g ([_]) is the same as [_], ({"key": "val"}) is is identical to just {"key": "val"}. You can also call functions and check their values (like (len: _(it < 10)) or (len: in {0 .. 10})) to check for sequence length.

Checks

Any operator with exception of is is considered final comparison and just pasted as-is into generated pattern match code. E.g. fld: in {2,3,4} will generate expr.fld in {2,3,4}
(fld: is Patt()) - check if expr.fld matches pattern Patt()
(fld: _.matchesPredicate()) - if call to matchesPredicate(expr.fld) evaluates to true.

Notation: <expr> refers to any possible combination of checks. For example

fld: in {1,2,3} - <expr> is in {1,2,3}
[_] - <expr> is _
fld: is Patt() - <expr> is is Patt()

Examples

(fld: 12) If rhs for key-value pair is integer, string or identifier then == comparison will be generated.
(fld: == ident("33")) if rhs is a prefix of == then == will be generated. Any for of prefix operator will be converted to expr.fld <op> <rhs>. ³
(fld: in {1, 3, 3}) or (fld: in Anything) creates fld.expr in Anything. Either in or notin can be used.

Variable binding

Match can be bound to new varaible. All variable declarations happen via @varname syntax.

To bind element to variable without any additional checks do: (fld: @varname)
To bind element with some additional operator checks do:
- (fld: @varname <operator> Value) first perform check using <operator> and then add Value to @varname
  - (fld: @hello is ("2" | "3"))
Predicate checks: fld: @a.matchPredicate()
Arbitrary expression: fld: @a(it mod 2 == 0). If expression has no type it is considered true.

Bind order

Bind order: if check evaluates to true variable is bound immediately, making it possible to use in other checks. [@head, any @tail != head] is a valid pattern. First match head and then any number of @tail elements. Can use any _(if it != head: tail.add it) and declare tail externally.

Variable is never rebound. After it is bound, then it will have the value of first binding.

Bind variable type

Any variadics are mapped to sequence
Only once in alternative is option
Explicitly optional is option
Optional with default value is regular value
Variable can be used only once if in alternative

Pattern	Ijected variables
`[@a]`	`var a: typeof(expr[0])`
`{"key": @val}`	`var val: typeof(expr["key"])`
`[all @a]`	`var a: seq[typeof(expr[0])]`
`[opt @val]`	`var a: Option[typeof(expr[0])]`
`[opt @val or default]`	`var a: typeof(expr[0])`
`(fld: @val)`	`var val: typeof(expr.fld)`

Matching different things

Sequence matching

Input sequence: [1,2,3,4,5,6,5,6]

Pattern	Result	Comment
`[_]`	Fail	Input sequence size mismatch
`[.._]`	Ok
`[@a]`	Fail	Input sequence size mismatch
`[@a, .._]`	Ok, `a = 1`
`[any @a, .._]`	Error	⁴
`[any @a.(it < 10)]`	Ok, `a = [1..6]`	Capture all elements that match
`[until @a == 6, .._]`	Ok	All until first ocurrence of `6`
`[all @a == 6, .._]`	Ok `a = []`	All leading `6`
`[any @a.(it > 100)]`	Fail	No elements `> 100`
`[none @a.(it in {6 .. 10})]`	Fail	There is an element `== 6`

until: non-greedy. Match everything until <expr>
- until <expr>: match all until frist element that matches Expr
all: greedy. Match everything that matches <expr>
- all <expr>: all elements should match Expr
- all @val is <expr>: capture all elements in @val if <expr> is true for every one of them.
opt: Single element match
- opt @a: match optional element and bind it to a
- opt @a or "default": either match element to a or set a to "default"
any: greedy. Consume all sequence elements until the end and succed only if any element has matched.
- any @val is "d": capture all element that match is "d"
none: greedy. Consume all sequence elements until the end and succed only if any element has matched.

Greedy patterns match until the end of a sequence and cannot be followed by anything else.

For sequence to match is must either be completely matched by all subpatterns or have trailing .._ in pattern. ⁵.

Sequence	Pattern	Match result
`[1,2,3]`	`[1,2]`	Fail
	`[1, .._]`	Ok
	`[1,2,_]`	Ok

More use examples

capture all elements in sequence: [all @elems]
get all elements until (not including "d"): [until @a is "d"]
All leading "d": [all @leading is "d"]
Match first two elements and ignore the rest [_, _, .._]. ⁶
Match optional third element [_, _, opt @trail]
Match third element and if not matched use default value [_, _, opt @trail or "default"]
Capture all elements until first separator: [until @leading is "sep", @middle is "sep", all @trailing]
Extract all conditions from IfStmt: IfStmt([all ElseIf([@cond, _]), .._])

Tuple matching

Input tuple: (1, 2, "fa")

Pattern	Result	Comment
`(_, _, _)`	Ok	Match all
`(@a, @a, _)`	Fail	⁷
`(@a is (1 ｜ 2), @a, _)`	Error	⁸
`(1, 1 ｜ 2, _)`	Ok

Case object matching

Input AST

ForStmt
  Ident "i"
  Infix
    Ident ".."
    IntLit 1
    IntLit 10
  StmtList
    Command
      Ident "echo"
      IntLit 12

ForStmt([== ident("i"), .._]) Only for loops with i as variable
ForStmt([@a is Ident(), .._]) Capture for loop variable
ForStmt([@a.isTuple(), .._]) for loops in which first subnode satisfies predicate isTuple(). Bind match to a
ForStmt([_, _, (len: in {1 .. 10})]) between one to ten statements in the for loop body

KV-pairs matching

Input json string

{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}

Get input ["menu"]["file"] from node and

case inj:
  of {"menu" : {"file": @file is JString()}}:
    # ...
  else:
    raiseAssert("Expected [menu][file] as string, but found " & $inj)

Option matching

Some(@x) and None() is a special case that will be rewritten into (isSome: true, get: @x) and (isNone: true) respectively. This is made to allow better integration with optional types. ⁹.

Footnotes

¹ it might be possible to generate
input sequence if only items is defined

² you can,
technically match arrays using the same syntax since generated
access paths are the same

³ even though it opens possibilities for
various hacks it is advisable to use only comparison operators
like ==, or =~

⁴ consecutiveGreedy

⁵ Trailing .._
does not generate exchaustive checks for sequence match, so you can
use it to do things like pattern match next tokens in lexer for
example. This is one of the main use cases that I have in min for
sequence matching - you only need to have len and [], so it should
be pretty easy to adapt anything basically

⁶ actually you can also use (len: _(it >= 2)) - it will
generate call to expr.len which is perfectly valid. It is not
strictly necessary to only match sequences with [...] and objects
with (fld: )

⁷ First match for a has value 1 but second one should be equal to 2

⁸ Binding of variables to pattern that contains alternative is only supported if varible occurs once in the expression. Reason: there is no backtracking for match. First tuple element matches to 1, but then fails on second element. To make correct match we would have to backtrack, rebind @a to two and then try binding again.

⁹ if your
enum contains enSome and enNone you can match for elements using
enNone(fld: @val) or (kind: enNone, fld: @val). Option is much
more common case so it takes priority and special-cased

haxscramper · 2020-09-28T13:04:55Z

@alehander92 initial implementation as described in the draft is mostly complete (sets and predicate calls (@a.matches) are not finished & I have some other ideas that I need to test). I will be adding more unit test and searching for potential flaws in design/edge cases + improving documentation.

documentation - updated copy of the syntax draft
current tests
implementation

You can try current version by installing hmisc@#head - right now implementation depends on some other modules from the library; I will of course clean this up before PRing it, but now I just used it to save time. Minimal standalone use example

In addition to syntax described above I would also like to introduce two operators

template `:=`*(lhs, rhs: untyped): untyped = assertMatch(rhs, lhs)
template `?=`*(lhs, rhs: untyped): untyped = matches(rhs, lhs)

Which would allow sequence and nested tuple unpacking, similar to #194, unpack and definesugar module. rust-like iflet macro (even including some features from RFC) - all of that, basically for free (two-line implementation)

    (@a, (@b, @c), @d) := (1, (2, 3), 4)
    [all @head] := [1,2,3]

    if (Some(@x) ?= some("hello")) and
       (Some(@y) ?= some("world")):
      assertEq x, "hello"
      assertEq y, "world"
    else:
      discard

haxscramper · 2020-11-06T12:41:20Z

Implementation is completed (feature-wise) in nim-lang/fusion#33. Some additional notes in this comment, but everything else is complete (as far as I'm concerned)

juancarlospaco · 2021-01-14T00:02:28Z

This bug should be closed, pattern matching already exists and works.

konsumlamm · 2021-01-28T14:28:23Z

I'm not sure if it's already too late to change stuff, but here are some opinions (<pattern> means an arbitrary pattern):

Nested matching (and in general, the different kinds of patterns, like sequences, objects etc.) seems to have inconsistent syntax, for example to match for a nested pattern in objects, you need to use (field: is <pattern>), while in sequences, you can nest a pattern directly ([<pattern>]). (EDIT: seems like this is not the case and you can just leave out the is, so I don't see the point of having it)
Using operators (@a == 42), predicates (@a.matchesPredicate()) and trailing expressions with it (@a(it mod 2 == 0)) don't really fit in, it feels more like a tacked on, overloaded DSL. Instead, I think something like trailing guards (putting if <condition> or and condition after the pattern) ~~and as-patterns (a pattern where the value is bound to a name)~~ (EDIT: this already exists with is) would be better.
Some should allow matching for Some(<pattern>) (but I assume something like that is planned). I also don't really like that Option is special cased (it's in the stdlib, so it's not that bad, but still), with custom unpackers (see next point), that should be possible to solve.
An option to easily create new patterns would be nice (something like unpackers in gara or the unapply method in Scala). This should also allow to implement matching on Options in a library, without special casing it, by creating custom patterns for Some and None.

Don't get me wrong, I love pattern matching and I appreciate the effort you make, but I think we can do better (especially since this is in fusion now and will likely become the recommended pattern matching implementation for Nim).

Araq · 2021-01-28T19:20:20Z

It's not too late but further progress needs my upcoming "let/var inside expressions" RFC. IMHO.

haxscramper · 2021-04-11T11:21:30Z

I think this RFC can be closed, because there nothing else to add to it. "let expressions" should be discussed in a separate RFC (when it comes out), and particular details of pattern matching implementation should be discussed in fusion.

levovix0 · 2021-05-24T15:23:01Z

Is it possible to make capturing matched values to variables not so ugly?
for example

const b = 2

case some_seq
of [a, (b)]: ... # b is constant and = 2
of [a, b]: ... # b = some_seq[1]
else: ...

instead of

const b = 2

case some_seq
of [@a, b]: ... # b is constant and = 2
of [@a, @b]: ... # b = some_seq[1]
else: ...

haxscramper · 2021-05-24T15:27:54Z

Current implementation won't be changed, but when/if we get let expressions this part of the syntax could be revised. Drop the requirements for @ prefix and instead require use ==myVarToCompare when equality comparison is necessary.

al6x · 2021-06-02T09:05:26Z

A bit late to the party. It's strange that among other langauges the Elixir/Erlang was not mentioned. As it has one of the most powerfull and elegant pattern matching capabilities https://elixir-lang.org/getting-started/pattern-matching.html and https://elixir-lang.org/getting-started/case-cond-and-if.html

haxscramper · 2021-06-02T09:29:08Z

Prolog also wasn't mentioned, and it seems like elixir adopted a large portion of the list operations from its [head | tail] syntax. Variable pinning via ^ is probably better than current @var syntax. Patterns for function arguments is an interesting idea (and it can be implemented with fusion/matching matching easily (someone already asked about this, so I have a working example)):

import fusion/matching
import std/[macros, options]

{.experimental: "caseStmtMacros".}

macro matchCall(procs: untyped): untyped =
  var patterns: seq[tuple[pattern: NimNode, funcName: string]]

  result = newStmtList()

  var topParams: tuple[name: string, arg0, type0Name, returnType: NimNode]

  for idx, pr in pairs(procs):
    pr.assertMatch:
      ProcDef:
        Ident(strVal: @name)
        @trmTemplate # Term rewriting template
        @genParams # Generic params
        FormalParams:
          @returnType
          all @arguments

        @pragmas
        _ # Reserved
        @implementation

    topParams.name = name

    arguments[0].assertMatch: # FIXME handles only one-argument functions
      IdentDefs:
        @arg0Name
        CurlyExpr[@type0Name, @pattern]
        _

    topParams.returnType = returnType
    topParams.type0Name = type0Name
    topParams.arg0 = arg0Name

    let funcName = name & "impl" & $idx
    result.add nnkProcDef.newTree(
      ident(funcName),
      trmTemplate,
      genParams,
      nnkFormalParams.newTree(@[
        returnType, nnkIdentDefs.newTree(arg0Name, type0Name, newEmptyNode())
      ]),
      pragmas,
      newEmptyNode(),
      implementation
    )


    patterns.add((pattern, funcName))

  var dispatchImpl = nnkCaseStmt.newTree(topParams.arg0)
  for (patt, funcName) in patterns:
    dispatchImpl.add nnkOfBranch.newTree(
      patt, nnkReturnStmt.newTree(newCall(funcName, topParams.arg0)))

  result.add nnkProcDef.newTree(
    ident(topParams.name),
    newEmptyNode(),
    newEmptyNode(),
    nnkFormalParams.newTree(@[
      topParams.returnType,
      nnkIdentDefs.newTree(topParams.arg0, topParams.type0Name, newEmptyNode())
    ]),
    newEmptyNode(),
    newEmptyNode(),
    dispatchImpl
  )

  echo result.repr

matchCall:
  proc pattern(a: NimNode{Infix[Ident(strVal: == "+"), .._]}): NimNode =
    echo "Matches plus"
    result = newEmptyNode()

  proc pattern(a: NimNode{Infix[Ident(strVal: == "-"), .._]}): NimNode =
    echo "Matches minus"
    result = newEmptyNode()


macro usesPattern(body: untyped): untyped =
  return pattern(body)

usesPattern(12 + 2)
usesPattern(2 - 2)

gemath mentioned this issue Sep 29, 2020

Nested tuple unpacking #194

Closed

haxscramper mentioned this issue Oct 25, 2020

Pattern matching implementation nim-lang/fusion#33

Merged

konsumlamm mentioned this issue Feb 12, 2021

caseStmtMacros to support C#8-like Recursive Patterns matching (Deconstruction Pattern) #57

Open

haxscramper mentioned this issue Feb 15, 2021

Add pattern matching to nim stdlib nim-lang/Nim#17047

Closed

Araq added Accepted RFC implemented labels Apr 11, 2021

Araq closed this as completed Apr 11, 2021

haxscramper mentioned this issue Jul 6, 2021

Nimble improvements #398

Open

haxscramper mentioned this issue Sep 9, 2021

Allow to indent of/else/elif/except/finally for block commands #420

Closed

omentic mentioned this issue May 21, 2023

Proper sum types and structural pattern matching #525

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early version of RFC: Pattern matching ? #245

Early version of RFC: Pattern matching ? #245

alehander92 commented Jul 23, 2020

andreaferretti commented Jul 24, 2020

Varriount commented Jul 24, 2020 •

edited

Loading

haxscramper commented Sep 20, 2020 •

edited

Loading

juancarlospaco commented Sep 21, 2020

alehander92 commented Sep 21, 2020

alehander92 commented Sep 21, 2020

haxscramper commented Sep 21, 2020

haxscramper commented Sep 21, 2020

Araq commented Sep 21, 2020

haxscramper commented Sep 21, 2020 •

edited

Loading

alehander92 commented Sep 21, 2020

alehander92 commented Sep 21, 2020

haxscramper commented Sep 21, 2020

alehander92 commented Sep 21, 2020

alehander92 commented Sep 21, 2020 •

edited

Loading

haxscramper commented Sep 21, 2020

alehander92 commented Sep 21, 2020 •

edited

Loading

haxscramper commented Sep 22, 2020 •

edited

Loading

haxscramper commented Sep 23, 2020 •

edited

Loading

haxscramper commented Sep 23, 2020

metagn commented Sep 23, 2020 •

edited

Loading

haxscramper commented Sep 23, 2020

haxscramper commented Sep 23, 2020

haxscramper commented Sep 23, 2020 •

edited

Loading

haxscramper commented Sep 28, 2020 •

edited

Loading

haxscramper commented Nov 6, 2020

juancarlospaco commented Jan 14, 2021

konsumlamm commented Jan 28, 2021 •

edited

Loading

Araq commented Jan 28, 2021

haxscramper commented Apr 11, 2021

levovix0 commented May 24, 2021 •

edited

Loading

haxscramper commented May 24, 2021 •

edited

Loading

al6x commented Jun 2, 2021 •

edited

Loading

haxscramper commented Jun 2, 2021 •

edited

Loading

Early version of RFC: Pattern matching ? #245

Early version of RFC: Pattern matching ? #245

Comments

alehander92 commented Jul 23, 2020

Pattern matching for Nim

Motivation

Constraints

Research

Next steps

Future directions

Disclaimer

andreaferretti commented Jul 24, 2020

Varriount commented Jul 24, 2020 • edited Loading

haxscramper commented Sep 20, 2020 • edited Loading

Additional arguments in favor of pattern matching in stdlib

Use caseStmtMacro

Set of features should be relatively small

Pattern matching on key-value pairs

NimNode

My implementation

Examples

Todo list

juancarlospaco commented Sep 21, 2020

alehander92 commented Sep 21, 2020

alehander92 commented Sep 21, 2020

haxscramper commented Sep 21, 2020

More examples

Non-goals

Other goals

Some comments on other libraries

haxscramper commented Sep 21, 2020

Araq commented Sep 21, 2020

haxscramper commented Sep 21, 2020 • edited Loading

alehander92 commented Sep 21, 2020

alehander92 commented Sep 21, 2020

haxscramper commented Sep 21, 2020

alehander92 commented Sep 21, 2020

alehander92 commented Sep 21, 2020 • edited Loading

haxscramper commented Sep 21, 2020

alehander92 commented Sep 21, 2020 • edited Loading

haxscramper commented Sep 22, 2020 • edited Loading

haxscramper commented Sep 23, 2020 • edited Loading

haxscramper commented Sep 23, 2020

Match expression

Tuple matching

Object/named tuple matching

Sequence matching

Set matching

Key-value pairs matching

metagn commented Sep 23, 2020 • edited Loading

haxscramper commented Sep 23, 2020

haxscramper commented Sep 23, 2020

haxscramper commented Sep 23, 2020 • edited Loading

Final draft of the syntax

Supported structures for pattersn

Element access

Checks

Examples

Variable binding

Bind order

Bind variable type

Matching different things

Sequence matching

Tuple matching

Case object matching

KV-pairs matching

Option matching

Footnotes

haxscramper commented Sep 28, 2020 • edited Loading

haxscramper commented Nov 6, 2020

juancarlospaco commented Jan 14, 2021

konsumlamm commented Jan 28, 2021 • edited Loading

Araq commented Jan 28, 2021

haxscramper commented Apr 11, 2021

levovix0 commented May 24, 2021 • edited Loading

haxscramper commented May 24, 2021 • edited Loading

al6x commented Jun 2, 2021 • edited Loading

haxscramper commented Jun 2, 2021 • edited Loading

Varriount commented Jul 24, 2020 •

edited

Loading

haxscramper commented Sep 20, 2020 •

edited

Loading

Use `caseStmtMacro`

`NimNode`

haxscramper commented Sep 21, 2020 •

edited

Loading

alehander92 commented Sep 21, 2020 •

edited

Loading

alehander92 commented Sep 21, 2020 •

edited

Loading

haxscramper commented Sep 22, 2020 •

edited

Loading

haxscramper commented Sep 23, 2020 •

edited

Loading

metagn commented Sep 23, 2020 •

edited

Loading

haxscramper commented Sep 23, 2020 •

edited

Loading

haxscramper commented Sep 28, 2020 •

edited

Loading

konsumlamm commented Jan 28, 2021 •

edited

Loading

levovix0 commented May 24, 2021 •

edited

Loading

haxscramper commented May 24, 2021 •

edited

Loading

al6x commented Jun 2, 2021 •

edited

Loading

haxscramper commented Jun 2, 2021 •

edited

Loading