Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regular expression support to language #130

Closed
jclark opened this issue May 18, 2019 · 2 comments
Closed

Add regular expression support to language #130

jclark opened this issue May 18, 2019 · 2 comments
Assignees
Labels
Area/Lang Relates to the Ballerina language specification Area/LangLib Relates to lang.* libraries status/inprogress Fixes are in the process of being added Type/Improvement Enhancement to language design

Comments

@jclark
Copy link
Collaborator

jclark commented May 18, 2019

This issue is for adding some kind of regexp support to the language, not merely as a library module.

We split the things that need defining into four:

  1. the regexp dialect that we are using Specify regular expression syntax and semantics #1125
  2. the syntax and semantics of the regexp type in Ballerina Add regular expression type to language #1132
  3. langlib functions that take a regexp value as a parameter Add langlib functions using regular expressions #1130
  4. some minimal Unicode support in lang.string Add minimal Unicode support to lang.string #1129
@jclark jclark added Type/Improvement Enhancement to language design Area/Lang Relates to the Ballerina language specification labels May 18, 2019
@jclark jclark added this to the 2019Rn milestone May 18, 2019
@jclark jclark modified the milestones: 2020R1, 2020Rn Aug 14, 2019
@jclark jclark modified the milestones: 2020R3, 2021Rn Mar 4, 2020
@jclark jclark self-assigned this May 14, 2022
@jclark
Copy link
Collaborator Author

jclark commented Jun 8, 2022

Swift has regex support in the language https://github.com/apple/swift-evolution/blob/main/proposals/0350-regex-type-overview.md

The really interesting thing here is that it has a type parameter that represents the type of the captures.

@jclark
Copy link
Collaborator Author

jclark commented Jun 21, 2022

We need a way to enable m, s and i flags.

So one issue is: do we consider the i (and possibly others) as part of the regexp value or is it an additional parameter to the operations that use a regexp value?

If we do make it part of the regexp value, then how to do construct such a regexp?
Possibilities are:

  1. Put the flag in the syntax. The normal syntax for this is to put e.g. (?i) at the beginning of the regexp.
  2. Use a function. e.g. r.ignoreCase() will construct a new regexp from r that is the same except that it ignores case. Problem with this is
    • nothing sensible for toString() to do
    • no way to do this as a const

So I think the answer has to be (1).

ECMAScript defines the semantics for the i flag as part of the matching operation: the characters in the input string and the pattern are both case-folded, using Unicode simple (1-1) case mapping. As defined, it's thus an all-or-nothing flag: it does not support only part of the pattern ignoring case. But it's not a big delta to fix this.

This creates a problem if we consider the i flag as part of the regexp value: what happens when in a template you insert a regexp that has the i flag set. I think the only reasonable thing is to panic, but this is not good: the whole idea of template insertions is that you get compile-time syntax checking (so the type of the template literal does not need to include error).

Java (as well as Go, Rust and several other Perl-based regexp dialects) support syntax for making i apply to part of the pattern e.g. x(?i:y)z makes y ignore-case but not x or z not (this can be seen as part of non-capturing group syntax). I think the best solution is to support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Lang Relates to the Ballerina language specification Area/LangLib Relates to lang.* libraries status/inprogress Fixes are in the process of being added Type/Improvement Enhancement to language design
Projects
None yet
Development

No branches or pull requests

1 participant