Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make regexp parser throw an error if the global flag is missing #55

Closed
ladihzey opened this issue Jan 3, 2023 · 4 comments · Fixed by #61
Closed

feat: make regexp parser throw an error if the global flag is missing #55

ladihzey opened this issue Jan 3, 2023 · 4 comments · Fixed by #61
Assignees
Labels
feat New feature or request

Comments

@ladihzey
Copy link

ladihzey commented Jan 3, 2023

In this issue I've learnt that the g flag must be used in order to make regexp parser work correctly.

Should regexp parser check itself whether the g flag was used? It would be a minor improvement but it will prevent many mistakes. I think there is always a chance to forget adding this flag even if you know about this rule.

Maybe the paragraph in the documentation explaining this requirement should be emphasized to make this moment more obvious. But I don't think that documentation itself would be enough and this rule should be validated programmatically.

Adding this check somewhere would be a great addition:

if (!re.global) throw new Error('regexp parser must use g flag to process input correctly');

export function regexp(re: RegExp, expected: string): Parser<string> {

I can submit a PR myself if you agree with me on this point!

@ladihzey
Copy link
Author

ladihzey commented Jan 3, 2023

Or even add this flag automatically to not bother a library consumer with this constraint:

RegExp(re.source, re.global ? re.flags : re.flags + 'g')

@norskeld
Copy link
Owner

norskeld commented Jan 3, 2023

Or even add this flag automatically to not bother a library consumer with this constraint:

RegExp(re.source, re.global ? re.flags : re.flags + 'g')

While this option looks convenient, it has some shortcomings:

  • It hits performance. I made a simple benchmark and re-run existing ones, and it's a bit slower (-5% in average) with automatic flag injection.
  • It does some "magic" under the hood, so you anyway want to document such behaviour.

As for the first option... I'm generally not into the idea of throwing, because effects suck. So how about returning a failure and halting the parsing?

As a side note: I've been thinking about implementing some sort of "levels" for errors, like 'soft' (recoverable) and 'hard' (non-recoverable). This suggestion would fit into the category of non-recoverable errors. Unfortunately, implementing these "levels" would also increase the amount of branching in the code, which in turn would lead to performance degradation. Not sure how bad would it be, though, maybe it'd completely tolerable.

@ladihzey
Copy link
Author

ladihzey commented Jan 3, 2023

Hmm, I wonder how sigma handles parsers? I was thinking that this change will only affect the bootrstrap time since it only affects parser construction (at least I think so, correct me if I'm wrong!). I guess if it only affects parser construction time it should be tolerable. If it's not I don't want to decrease performance of this library by any means since it's one of the main features (unless you are OK with that).

"magic" under the hood

Magic indeed and I agree that it's not great, but from the other side I'd rather prefer this kind of magic than the chances of missing to use the global flag and spending time on debugging it. Making the regexp parser fail would be a good solution too. I didn't mean that documentation is not imporant and it should explain this moment for sure. I just wanted to point that it would be more convinient if this rule was applied or validated programatically in addition to the documentation notes.

Nevertheless it's not a major issue rather a question of opinion and convinience. Please let me know which approach is more preferable to you and I'm happy to implement it:

  1. Make the regexp parser fail
  2. Ensure use of the global flag automatically

If none of the approaches suit this library you can just close this issue.

@norskeld
Copy link
Owner

norskeld commented Jan 4, 2023

Hmm, I wonder how sigma handles parsers?

Sigma doesn't "handle" parsers per se; all of them are essentially just pure functions which call other parsers and combinators and pass the state further. The whole process is recursive and happens top-down, from the root parser, hence "recursive descent parsers".

Ensure use of the global flag automatically

I experimented a bit more with it and actually if you do something like this:

export function regexp(rs: RegExp, expected: string): Parser<string> {
  const re = rs.global ? rs : new RegExp(rs.source, rs.flags + 'g')

  return {
    parse(input, pos) {
      // snip
    }
  }
}

The overhead is barely noticeable, something like:

Without auto-inject average:  795.65 ops/sec
With auto-inject average:     787.95 ops/sec

Without auto-inject median:  797 ops/sec
With auto-inject median:     792 ops/sec

---

Diff average: 0.972%
Diff median:  0.629%

These are aggregated results over 20 runs x 100 samples per run. Looks okay, so I guess we could just auto-inject the 'g' flag, just as you've suggested. It needs to be mentioned in the docs as well. Feel free to make a PR.

(I can't commit this change myself right now, because I'm on a diverged branch with a hot experimental mess.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat New feature or request
Projects
None yet
3 participants