Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expression operator testing a string against a regular expression #4089

Open
lucaswoj opened this issue Feb 1, 2017 · 22 comments
Open
Labels
cross-platform 📺 Requires coordination with Mapbox GL Native (style specification, rendering tests, etc.) feature 🍏 needs discussion 💬

Comments

@lucaswoj
Copy link
Contributor

lucaswoj commented Feb 1, 2017

From @ansis on January 15, 2015 20:32

We've talked about this before but it was never implemented. How would it be specified?

Carto uses =~. We could also use regex.

Copied from original issue: mapbox/mapbox-gl-style-spec#233

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Feb 1, 2017

From @divya1c on March 15, 2016 21:26

Hi! is the regex feature added to mapbox gl yet?

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Feb 1, 2017

From @tmcw on March 15, 2016 21:33

If the issue is open, the task isn't done yet.

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Feb 1, 2017

From @1ec5 on July 10, 2016 22:53

regex (or like) would be more discoverable/memorable, since ~ has wildly different meanings in every language.

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Feb 1, 2017

From @tmcw on November 22, 2016 21:6

To unpack what's necessary to implement this feature:

  1. Naming - regex or ~
  2. Compatibility across GL JS and GL Native

1 will likely be 10% of the work or less. Problem number 2 is more complex.

If the JavaScript port (GL JS) uses JavaScript's built-in RegExp object, then GL Native will need to include a compatible implementation of regular expressions in order to ensure that maps render exactly the same in a native environment. There are many particular flavors of regular expressions, so picking a feature-filled Native regular expression engine would mean that styles break in less feature-filled GL JS RegExp implementations.

There's also the question of whether GL Native should use platform-provided regular expression libraries, or bring its own on the C++ level. From diving into my handy V8 source checkout (get your own today! they're great), V8's implementation is Irregexp).

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Feb 1, 2017

From @1ec5 on November 22, 2016 21:59

@jfirebaugh points out that we can use std::regex’s ECMAScript regex support in gl-native to ensure compatibility between GL JS and the native SDKs. However, note that std::regex supports some features that browsers don’t, like [:alnum:].

In any case, I would be very much prefer that we use the platform-provided regex facilities in gl-native, similar to how we use the platform-provided facilities for uppercasing and lowercasing strings. (This does lead to minor discrepancies among the platforms: mapbox/DEPRECATED-mapbox-gl#21 (comment).) Using the platform-provided regex facilities means we don’t incur an increase in the SDK’s size, and it ensures that the runtime styling API is compatible with any other regex the developer uses in their application.

To illustrate my point, style specification filters are represented on iOS and macOS as NSPredicate objects. This is as natural as representing strings as NSString objects. NSPredicate format strings accept a SQL-like syntax, where the MATCHES operator is documented to support ICU regex syntax. However, if core code only supports ECMAScript syntax, then the iOS and macOS SDKs need to transform ICU regex to EMCAScript regex, rejecting any regex that doesn’t translate, and transform in the other direction as well when getting the predicate of a style layer. Otherwise, without this SDK-level transformation, the SDK’s behavior would be perceived as a bug.

I recognize that bringing ICU regex to GL JS would be a challenging task, and that some Studio users could expect ECMAScript regex since the live preview is implemented using GL JS. Fortunately, there’s enough overlap between the two syntaxes that I think we should declare a common subset of ICU and ECMAScript to be the syntax we want to support for filters; anything else (like lookbehind or Unicode properties in ICU, or matching Turkish İ with [A-Z] in ECMAScript) comes with a caveat that it isn’t guaranteed to work across platforms. I think most Studio users would come to Studio without knowing that there are different regex syntaxes, and they’re just as likely to try an ICU or PCRE regex as they are to try an ECMAScript regex.

@lucaswoj
Copy link
Contributor Author

lucaswoj commented Feb 1, 2017

From @1ec5 on November 22, 2016 22:27

In chat, @tmcw brought up a valid concern that a user might input a regex for a filter that appears to do the right thing in GL JS, but it happens to prevent the layer from showing up at all on iOS. There’s no substitute for testing, but I agree that we should aim to make the live preview in Studio as faithful to the rendered output as possible.

The thing is, I think the subset of features in ICU but not ECMAScript is pretty small, and the subset in ICU but not ECMAScript even smaller. If Studio could detect the use of these features and display a warning icon, that would essentially enforce the subset of regex that we do officially support.

@1ec5 1ec5 added the cross-platform 📺 Requires coordination with Mapbox GL Native (style specification, rendering tests, etc.) label Feb 3, 2017
@nextstopsun
Copy link
Contributor

Any progress on regex filters?

@drumttocs8
Copy link

Is there currently any alternative to regex filters? Maybe not strong pattern matching, but perhaps where a string can be matched to see if it exists in any part of a property value?

@andrewharvey
Copy link
Collaborator

Is there currently any alternative to regex filters? Maybe not strong pattern matching, but perhaps where a string can be matched to see if it exists in any part of a property value?

If they regex's are static (and not determined at runtime) you can preprocess your data with the regex into a new attribute.

@kriscarle
Copy link

In case it helps anyone else who is stuck on this (and only needs to use querySourceFeatures in JS), I've hacked this together https://github.com/maphubs/mapbox-gl-regex-query

Could this be done with custom pluggable filters? All my hack really does is add a custom operator to the filter compile method and give it a custom comparator function. That might offer the best of both worlds? Then the platforms just ignore any operators they don't know, kind of like browser-specific CSS rules. It would also still allow a more limited SQL-like syntax for a simpler cross-platform option, for Studio users etc.

filter: ['like', '%name%']

or for advanced users that want to use regex

filter: [
'all', 
['~js', '/.*name.*/g'],
['~ios', '...']
]

@politvs
Copy link

politvs commented Jun 14, 2017

@kriscarle can yor hack be used without npm or yarn and how? It looks very promising!

@kriscarle
Copy link

@politvs lets move that discussion here https://github.com/maphubs/mapbox-gl-regex-query/issues/1 so we don't spam the Mapbox team :)

@1ec5
Copy link
Contributor

1ec5 commented Oct 26, 2017

Now that expressions have landed and can be used as filters, this is actually a request to add – wait for it – a regular expression expression operator. Specifically, there should be an operator that tests whether a string matches a regular expression.

@1ec5 1ec5 changed the title Support "regex" filters Add expression function testing a string against a regular expression Oct 26, 2017
@1ec5 1ec5 changed the title Add expression function testing a string against a regular expression Add expression operator testing a string against a regular expression Oct 26, 2017
@mkv123
Copy link

mkv123 commented Feb 25, 2018

My two cents (and a pull request get the discussion going) is that by returning the match groups as an array would instead of just a boolean value, would provide a much more flexible base to build on.

Returning match groups would allow for:

  1. The basic checking if the expression matched or not (by checking if it returned null)
  2. Search and replace (by capturing the bits that should not be replaced, and combining them and the replacement string with "concat") Add find-and-replace expression operator #4100
  3. Extracting portions of a property value (this is my personal reason for wanting this, I have data I can't easily modify, and need to get some pieces of text extracted from property values to be shown).

@anandthakker
Copy link
Contributor

I think the biggest open issue we need to resolve here is the one about cross-platform compatibility. Given @1ec5's point in #4089 (comment) that ICU and ECMAScript regexp syntaxes mostly overlap, would it be feasible to just only allow the common subset of both?

@mkv123
Copy link

mkv123 commented Feb 27, 2018

Looking at the latest ECMA script regex spec and the specification of ICU regex, it looks like ICU is a superset of ECMA script syntax.

I've checked all basic operations as well as the syntax for things like non capturing groups, lookaheads etc and all the ones in ECMA are also in ICU. However, I haven't been able to wrap my head around the unicode specific bits so I can't really comment on those.

Flag handling does seem to differ more though:

  • No "g" flag in ICU but implementing it manually shouldn't be too difficult (actually a bit unclear what the default behavior is for matching multiple times in ICU)
  • "u" flag to causes ECMA to treat pattern as unicode, I couldn't find reference in ICU, but I expect this is the default.
  • "y" flag is present in ECMA but not ICU but also irrelevant in this implementation.

Other ECMA flags are present in ICU.

@mkv123
Copy link

mkv123 commented Feb 27, 2018

Looks like the u flag also enables a unicode escape syntax not in ICU \u{1D306}

@stdmn
Copy link

stdmn commented Nov 21, 2018

@lucaswoj Any news on the implementation of this? There is an open PR (#6228) that has been around for a few months. This would be immensely useful.

@lucaswoj
Copy link
Contributor Author

@stdmn Unfortunately I'm not a good person to ask about this. I haven't worked on the GL core team for some time. It looks like that PR is still stuck on some design decisions. I'm sure folks would be interested in continuing the discussion if you took ownership of the PR.

@joewoodhouse
Copy link
Contributor

Any update this one? Would be immensely useful to me

@kkaefer
Copy link
Contributor

kkaefer commented Oct 15, 2019

I closed #6228 because there are still many open questions:

  • Styles (and thus expressions) are executed on multiple platforms, and JavaScript is just one of them. Regex engines on those platforms support a widely different spectrum of features, so it's easy to create regular expressions that work on one platform but don't work on another platform. We generally expect styles to work on all platform, and platform-specific regular expressions would counteract this expectation.
  • JavaScript regular expressions are heavily flawed when processing text, which is what I imagine as the main application for a regex expression. @1ec5 explains more in Implement "regex" expression that returns an array of match groups or… #6228 (comment)

@palhal
Copy link

palhal commented Oct 14, 2022

If regex isn't possible it would be very helpful to at least have startsWith and endsWith.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cross-platform 📺 Requires coordination with Mapbox GL Native (style specification, rendering tests, etc.) feature 🍏 needs discussion 💬
Projects
None yet
Development

No branches or pull requests