feat: make regexp parser throw an error if the global flag is missing #55

ladihzey · 2023-01-03T10:26:27Z

In this issue I've learnt that the g flag must be used in order to make regexp parser work correctly.

Should regexp parser check itself whether the g flag was used? It would be a minor improvement but it will prevent many mistakes. I think there is always a chance to forget adding this flag even if you know about this rule.

Maybe the paragraph in the documentation explaining this requirement should be emphasized to make this moment more obvious. But I don't think that documentation itself would be enough and this rule should be validated programmatically.

Adding this check somewhere would be a great addition:

if (!re.global) throw new Error('regexp parser must use g flag to process input correctly');

sigma/src/parsers/regexp.ts

Line 17 in 567b6f9

export function regexp(re: RegExp, expected: string): Parser<string> {

I can submit a PR myself if you agree with me on this point!

The text was updated successfully, but these errors were encountered:

ladihzey · 2023-01-03T11:43:13Z

Or even add this flag automatically to not bother a library consumer with this constraint:

RegExp(re.source, re.global ? re.flags : re.flags + 'g')

norskeld · 2023-01-03T13:21:11Z

Or even add this flag automatically to not bother a library consumer with this constraint:
RegExp(re.source, re.global ? re.flags : re.flags + 'g')

While this option looks convenient, it has some shortcomings:

It hits performance. I made a simple benchmark and re-run existing ones, and it's a bit slower (-5% in average) with automatic flag injection.
It does some "magic" under the hood, so you anyway want to document such behaviour.

As for the first option... I'm generally not into the idea of throwing, because effects suck. So how about returning a failure and halting the parsing?

As a side note: I've been thinking about implementing some sort of "levels" for errors, like 'soft' (recoverable) and 'hard' (non-recoverable). This suggestion would fit into the category of non-recoverable errors. Unfortunately, implementing these "levels" would also increase the amount of branching in the code, which in turn would lead to performance degradation. Not sure how bad would it be, though, maybe it'd completely tolerable.

ladihzey · 2023-01-03T13:41:06Z

Hmm, I wonder how sigma handles parsers? I was thinking that this change will only affect the bootrstrap time since it only affects parser construction (at least I think so, correct me if I'm wrong!). I guess if it only affects parser construction time it should be tolerable. If it's not I don't want to decrease performance of this library by any means since it's one of the main features (unless you are OK with that).

"magic" under the hood

Magic indeed and I agree that it's not great, but from the other side I'd rather prefer this kind of magic than the chances of missing to use the global flag and spending time on debugging it. Making the regexp parser fail would be a good solution too. I didn't mean that documentation is not imporant and it should explain this moment for sure. I just wanted to point that it would be more convinient if this rule was applied or validated programatically in addition to the documentation notes.

Nevertheless it's not a major issue rather a question of opinion and convinience. Please let me know which approach is more preferable to you and I'm happy to implement it:

Make the regexp parser fail
Ensure use of the global flag automatically

If none of the approaches suit this library you can just close this issue.

norskeld · 2023-01-04T21:32:47Z

Hmm, I wonder how sigma handles parsers?

Sigma doesn't "handle" parsers per se; all of them are essentially just pure functions which call other parsers and combinators and pass the state further. The whole process is recursive and happens top-down, from the root parser, hence "recursive descent parsers".

Ensure use of the global flag automatically

I experimented a bit more with it and actually if you do something like this:

export function regexp(rs: RegExp, expected: string): Parser<string> {
  const re = rs.global ? rs : new RegExp(rs.source, rs.flags + 'g')

  return {
    parse(input, pos) {
      // snip
    }
  }
}

The overhead is barely noticeable, something like:

Without auto-inject average:  795.65 ops/sec
With auto-inject average:     787.95 ops/sec

Without auto-inject median:  797 ops/sec
With auto-inject median:     792 ops/sec

---

Diff average: 0.972%
Diff median:  0.629%

These are aggregated results over 20 runs x 100 samples per run. Looks okay, so I guess we could just auto-inject the 'g' flag, just as you've suggested. It needs to be mentioned in the docs as well. Feel free to make a PR.

(I can't commit this change myself right now, because I'm on a diverged branch with a hot experimental mess.)

norskeld added the feat New feature or request label Jan 4, 2023

TheFedaikin mentioned this issue Jan 14, 2023

feat(parsers/regexp): add injection of global flag if it's not present #60

Closed

TheFedaikin self-assigned this Jan 14, 2023

TheFedaikin mentioned this issue Jan 15, 2023

feat(parsers/regexp): add injection of global flag if it's not present #61

Merged

norskeld closed this as completed in #61 Jan 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make regexp parser throw an error if the global flag is missing #55

feat: make regexp parser throw an error if the global flag is missing #55

ladihzey commented Jan 3, 2023 •

edited

Loading

ladihzey commented Jan 3, 2023

norskeld commented Jan 3, 2023 •

edited

Loading

ladihzey commented Jan 3, 2023 •

edited

Loading

norskeld commented Jan 4, 2023 •

edited

Loading

feat: make regexp parser throw an error if the global flag is missing #55

feat: make regexp parser throw an error if the global flag is missing #55

Comments

ladihzey commented Jan 3, 2023 • edited Loading

ladihzey commented Jan 3, 2023

norskeld commented Jan 3, 2023 • edited Loading

ladihzey commented Jan 3, 2023 • edited Loading

norskeld commented Jan 4, 2023 • edited Loading

ladihzey commented Jan 3, 2023 •

edited

Loading

norskeld commented Jan 3, 2023 •

edited

Loading

ladihzey commented Jan 3, 2023 •

edited

Loading

norskeld commented Jan 4, 2023 •

edited

Loading