Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip/gmt/match find only text #5721

Merged
merged 16 commits into from
Feb 23, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,8 @@
- [Moved regex functionality out of `Text.locate` and `Text.locate_all` into
`Text.match` and `Text.match_all`.][5679]
- [`File.parent` may return `Nothing`.][5699]
- [Removed non-regex functionality from `is_match`, `match`, and `match_all`,
and renamed them to `match`, `find`, `find_all` (respectively).][5721]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -488,7 +490,9 @@
[5645]: https://github.com/enso-org/enso/pull/5645
[5646]: https://github.com/enso-org/enso/pull/5646
[5656]: https://github.com/enso-org/enso/pull/5656
[5679]: https://github.com/enso-org/enso/pull/5679
[5699]: https://github.com/enso-org/enso/pull/5699
[5721]: https://github.com/enso-org/enso/pull/5721

#### Enso Compiler

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import project.Data.Locale.Locale

from project.Data.Boolean import Boolean, True, False

polyglot java import org.enso.base.text.TextFoldingStrategy

type Case_Sensitivity
Expand All @@ -25,3 +27,11 @@ type Case_Sensitivity
Case_Sensitivity.Sensitive -> TextFoldingStrategy.unicodeNormalizedFold
Case_Sensitivity.Insensitive locale ->
TextFoldingStrategy.caseInsensitiveFold locale.java_locale

## PRIVATE
Is case insensitive.
is_case_insensitive : Boolean
is_case_insensitive self = case self of
Case_Sensitivity.Default -> False
Case_Sensitivity.Sensitive -> False
Case_Sensitivity.Insensitive _ -> True
radeusgd marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import project.Data.Text.Encoding.Encoding
import project.Data.Text.Location
import project.Data.Text.Matching_Mode.Matching_Mode
import project.Data.Text.Regex
import project.Data.Text.Regex.Match.Match
import project.Data.Text.Regex.Regex_Mode.Regex_Mode
import project.Data.Text.Regex_Matcher.Regex_Matcher
import project.Data.Text.Span.Span
Expand Down Expand Up @@ -209,116 +210,77 @@ Text.characters self =
self.each bldr.append
bldr.to_vector

## ALIAS find

Matches the text in `self` against the provided `term`, returning the first
or last match if present or `Nothing` if there are no matches.
## Find the regular expression `pattern` in `self`, returning the first match
if present or `Nothing` if not found.

Arguments:
- term: The pattern to match `self` against. We recommend using _raw text_
to write your patterns.
- mode: This argument specifies whether the first or last match should be
returned.
- matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
regular expression and matched using the associated options.
- pattern: The pattern to match `self` against.
- case_sensitivity: Specifies if the text values should be compared case
sensitively.

> Example
Find the first substring matching the regex.

example_match =
regex = "a[ab]c"
"aabbbbccccaabcaaaa".match regex == "abc"

! Last Match in Regex Mode
Regex always performs the search from the front and matching the last
occurrence means selecting the last of the matches while still generating
matches from the beginning. Regex does not return overlapping matches - it
will return a match at some position and then continue the search after that
match. This will lead to slightly different behavior for overlapping
occurrences of a pattern in Regex mode than in exact text matching mode
where the matches are searched for from the back.

> Example
Comparing Matching in Last Mode in Regex and Text mode

"aAa".match "aa" mode=Matching_Mode.Last matcher=Text_Matcher.Case_Insensitive == "Aa"
"aAa".match "aa" mode=Matching_Mode.Last matcher=(Regex_Matcher.Value case_sensitivity=Case_Sensitivity.Insensitive) == "aA"

Text.match : Text -> Matching_Mode -> (Text_Matcher | Regex_Matcher) -> Text | Nothing
Text.match self term mode=Matching_Mode.First matcher=Regex_Matcher.Value = case matcher of
_ : Text_Matcher ->
case_sensitivity = case matcher of
Text_Matcher.Case_Sensitive -> Case_Sensitivity.Sensitive
Text_Matcher.Case_Insensitive _ -> Case_Sensitivity.Insensitive
case self.locate term mode case_sensitivity of
Nothing -> Nothing
span -> span.text
_ : Regex_Matcher -> case mode of
Matching_Mode.First ->
case matcher.compile term . match self Matching_Mode.First of
Nothing -> Nothing
match -> match.span 0 . to_grapheme_span . text
Matching_Mode.Last ->
case matcher.compile term . match self Regex_Mode.All of
Nothing -> Nothing
matches -> matches.last.span 0 . to_grapheme_span . text

## ALIAS find_all
example_find =
## This matches `abc` @ character 11
"aabbbbccccaabcaaaa".find "a[ab]c"
example_find_insensitive =
## This matches `aBc` @ character 11
"aabbbbccccaaBcaaaa".find "a[ab]c" Case_Sensitivity.Insensitive
Text.find : Text -> Case_Sensitivity -> Match | Nothing ! Compile_Error
Text.find self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
case_insensitive = Case_Sensitivity.is_case_insensitive case_sensitivity
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
Regex.compile pattern case_insensitive=case_insensitive . match self Matching_Mode.First

Matches all occurrences text in `self` against the provided `term`, returning
a vector of matches.
## Finds all the matches of the regular expression `pattern` in `self`,
returning a Vector. If not found, will be an empty Vector.

Arguments:
- term: The pattern to match `self` against. We recommend using _raw text_
to write your patterns.
- matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
regular expression and matched using the associated options.
- pattern: The pattern to match `self` against.
- case_sensitivity: Specifies if the text values should be compared case
sensitively.

> Example
Find all substrings matching the regex.

example_match =
regex = "a[ab]c"
"aabcbbccaacaa".match regex == ["abc", "aac"]
Text.match_all : Text -> (Text_Matcher | Regex_Matcher) -> Vector Text
Text.match_all self term=".*" matcher=Regex_Matcher.Value = case matcher of
_ : Text_Matcher ->
case_sensitivity = case matcher of
Text_Matcher.Case_Sensitive -> Case_Sensitivity.Sensitive
Text_Matcher.Case_Insensitive _ -> Case_Sensitivity.Insensitive
self.locate_all term case_sensitivity . map .text
_ : Regex_Matcher ->
case matcher.compile term . match self Regex_Mode.All of
Nothing -> []
matches -> matches.map m-> m.span 0 . to_grapheme_span . text
Find the substring matching the regex.

example_find_all =
## This matches `aabbbbc` @ character 0 and `abc` @ character 11
"aabbbbccccaabcaaaa".find_all "a[ab]+c"
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved
example_find_all_insensitive =
## This matches `aABbbbc` @ character 0 and `aBC` @ character 11
"aABbbbccccaaBCaaaa".find_all "a[ab]+c" Case_Sensitivity.Insensitive
Text.find_all : Text -> Case_Sensitivity -> Vector Match ! Compile_Error
Text.find_all self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
case_insensitive = Case_Sensitivity.is_case_insensitive case_sensitivity
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
case Regex.compile pattern case_insensitive=case_insensitive . match self Regex_Mode.All of
Nothing -> []
matches -> matches

## ALIAS Check Matches

Checks if the whole text in `self` matches a provided `pattern`.

Arguments:
- pattern: The pattern to match `self` against. We recommend using _raw text_
to write your patterns.
- matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
regular expression and matched using the associated options.
- pattern: The pattern to match `self` against.
- case_sensitivity: Specifies if the text values should be compared case
sensitively.

> Example
Checks if some text matches a basic email regex. NOTE: This regex is _not_
compliant with RFC 5322.
Checks if whole text matches a basic email regex.

example_match =
regex = ".+@.+"
"[email protected]".is_match regex
Text.is_match : Text -> (Text_Matcher | Regex_Matcher) -> Boolean ! Compile_Error
Text.is_match self pattern=".*" matcher=Regex_Matcher.Value = case matcher of
Text_Matcher.Case_Sensitive -> self == pattern
Text_Matcher.Case_Insensitive locale -> self.equals_ignore_case pattern locale
_ : Regex_Matcher ->
compiled_pattern = matcher.compile pattern
compiled_pattern.matches self
regex = ".+ct@.+"
# Evaluates to true
"[email protected]".match regex
example_match_insensitive =
regex = ".+ct@.+"
# Evaluates to true
"[email protected]".match regex Case_Sensitivity.Insensitive
Text.match : Text -> Case_Sensitivity -> Boolean ! Compile_Error
Text.match self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
case_insensitive = Case_Sensitivity.is_case_insensitive case_sensitivity
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
compiled_pattern = Regex.compile pattern case_insensitive=case_insensitive
compiled_pattern.matches self

## ALIAS Split Text

Expand Down Expand Up @@ -1379,5 +1341,4 @@ slice_text text char_ranges =
sb = StringBuilder.new
char_ranges.map char_range->
sb.append text char_range.start char_range.end
sb.toString

sb.toString
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ type Suite_Config
should_run_group self name =
regexp = self.only_group_regexp
case regexp of
_ : Text -> name.is_match regexp . catch Any (_->True)
_ : Text -> name.match regexp . catch Any (_->True)
_ -> True

should_output_junit self =
Expand Down
Loading