Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: Extract operation 'ParsePattern' #1866

Merged
merged 1 commit into from
May 26, 2020
Merged

Conversation

jmdyck
Copy link
Collaborator

@jmdyck jmdyck commented Feb 5, 2020

... from common code in IsValidRegularExpressionLiteral and RegExpInitialize.


This is the refactoring I referred to over here.


I've split the PR into a series of smaller refactorings so that it's easier to follow. I expect these will be squashed together at merge, so I didn't bother prefixing their commit messages with "Editorial:". (Except for the final commit, whose commit message can serve as the message for the squash.)

IVREL = IsValidRegularExpressionLiteral, REI = RegExpInitialize


Note that if the parse fails, ParsePattern returns an abrupt completion. This is somewhat odd when it's being called by IsValidRegularExpressionLiteral, because the latter is static semantics. (Its only invocation is from an early error). IsValidRegularExpressionLiteral translates the abrupt completion into *false*, so the abrupt completion doesn't go any further, but you might object to it even existing during static semantics processing. If that's a sticking point, we can replace the abrupt completion with a SyntaxError object or a List of them (as in ParseScript or ParseModule).

spec.html Show resolved Hide resolved
@bakkot
Copy link
Contributor

bakkot commented Feb 5, 2020

you might object to it even existing during static semantics processing

Yeah, that seems a little unfortunate. What would you think about returning ~empty~ instead?

@jmdyck
Copy link
Collaborator Author

jmdyck commented Feb 5, 2020

What would you think about returning ~empty~ instead?

Well, any value that can be distinguished from a Parse Node would work, but I think it'd be better to follow the example of ParseScript and ParseModule
(a) for consistency, and
(b) because returning a List of errors is a way to communicate information to RegExpInitialize that it can use in the SyntaxError that it throws. (This is somewhat hand-wavy, since the SyntaxError isn't required to have any specific information in it, but of course it will in practice.)

@bakkot
Copy link
Contributor

bakkot commented Feb 9, 2020

for consistency

I think the inconsistency of returning a completion from an operation used in static semantics bothers me more.

because returning a List of errors is a way to communicate information to RegExpInitialize that it can use in the SyntaxError that it throws

Eh, I think implementations can figure out the appropriate error to give without using having to pass around error objects.

All considered I think I would still prefer returning ~empty~, or I guess ~failure~ (as is already used when performing regex matching).

@jmdyck
Copy link
Collaborator Author

jmdyck commented Feb 10, 2020

for consistency

I think the inconsistency of returning a completion from an operation used in static semantics bothers me more.

Huh? I mean, I understand that returning a completion record bothers you, but I'm not sure how that's relevant to the suggestion to return a List of errors.

All considered I think I would still prefer returning ~empty~, or I guess ~failure~ (as is already used when performing regex matching).

I'd advise against using ~failure~: it already has a defined meaning (the match failed), and that's not this.

@jmdyck
Copy link
Collaborator Author

jmdyck commented Mar 19, 2020

(force-pushed to resolve a merge conflict)

@bakkot bakkot added the editor call to be discussed in the next editor call label Mar 28, 2020
@bakkot
Copy link
Contributor

bakkot commented Apr 8, 2020

A List of Errors is fine by me and the other editors; let's go with that.

@ljharb ljharb removed the editor call to be discussed in the next editor call label Apr 8, 2020
@jmdyck
Copy link
Collaborator Author

jmdyck commented Apr 12, 2020

(force-pushed to resolve merge conflicts and change the return-type to list-of-errors)

spec.html Outdated Show resolved Hide resolved
spec.html Outdated
1. Let _patternText_ be the result of interpreting each of _P_'s 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
1. Let _patternCharacters_ be a List whose elements are the code unit elements of _P_.
1. Let _parseResult_ be ParsePattern(_patternText_, _u_).
1. If _parseResult_ is a List of *SyntaxError* objects, throw a *SyntaxError* exception.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it's a List of something else, or an empty List?

It seems like ParsePattern returns either a Parse Node, or not - should this be "is not a Parse Node"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns either a Parse Node or a List of SyntaxError objects. There's no other case.

We could follow this with an Assert: _parseResult_ is a Parse Node, if you think that's valuable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, it just seems too vague to me to assert that it's a List of syntax error objects (without also reading the abstract operation), because of the possibilities that implies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ljharb Would adding Assert: _parseResult_ is a Parse Node as the following line resolve your concern?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer changing the condition to "is not a Parse Node", as suggested by @ljharb. It's semantically equivalent, but easier on the reader: you don't need to look at the definition of ParsePattern to know that the only other possibility is that it's a Parse Node. (And we don't need to Assert that it is.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should we say "throw one of the items in the list as an exception"? The impl has control of what's in the list anyways, so it wouldn't be normative, but would convey the intention you find important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a little bit of awkwardness there in that for that to be sensible the list has to be non-empty. Which it is guaranteed to be by the implementation of ParsePattern, but you have to look there to know that.

I'd be fine with that, I guess, but would still personally weakly lean towards the If _parseResult_ is a List of *SyntaxError* objects, throw a *SyntaxError* exception form + an assert.

But any of the suggestions here are adequate. I would mostly like us to just pick one and land it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's valuable to follow up this step with an assertion. I don't care which way around the test is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like any reasonable non-formalist reading of "a List of x objects" excludes the vacuous empty List case, but it doesn't hurt to qualify "a non-empty List of x objects", I guess.

And I agree with @bakkot on:

I kind of prefer writing it this way because it suggests that the error thrown could be one of the errors in the list

So I'd prefer directly talking about errors than asserting it's not a parse node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add I'm happy with an additional assert, but I like the condition for throwing checking for a list of errors over checking for not being a parse node.

Copy link
Contributor

@syg syg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review too closely to ensure correctness. I gladly trust @bakkot's existing review here that this is correct.

Editorially lgtm with rename for abstract closure.

spec.html Outdated
1. Let _patternText_ be the result of interpreting each of _P_'s 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements.
1. Let _patternCharacters_ be a List whose elements are the code unit elements of _P_.
1. Let _parseResult_ be ParsePattern(_patternText_, _u_).
1. If _parseResult_ is a List of *SyntaxError* objects, throw a *SyntaxError* exception.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like any reasonable non-formalist reading of "a List of x objects" excludes the vacuous empty List case, but it doesn't hurt to qualify "a non-empty List of x objects", I guess.

And I agree with @bakkot on:

I kind of prefer writing it this way because it suggests that the error thrown could be one of the errors in the list

So I'd prefer directly talking about errors than asserting it's not a parse node.

spec.html Outdated
1. Set _obj_.[[OriginalSource]] to _P_.
1. Set _obj_.[[OriginalFlags]] to _F_.
1. Set _obj_.[[RegExpMatcher]] to the abstract closure that evaluates the above parse by applying the semantics provided in <emu-xref href="#sec-pattern-semantics"></emu-xref> using _patternCharacters_ as the pattern's List of |SourceCharacter| values and _F_ as the flag parameters.
1. Set _obj_.[[RegExpMatcher]] to the abstract closure that evaluates _parseResult_ by applying the semantics provided in <emu-xref href="#sec-pattern-semantics"></emu-xref> using _patternCharacters_ as the pattern's List of |SourceCharacter| values and _F_ as the flag parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do a s/abstract closure/Abstract Closure/g. We recently capitalized it to be consistent with other spec types.

@jmdyck
Copy link
Collaborator Author

jmdyck commented May 10, 2020

force-pushed to:

  • resolve a merge conflict (capitalization of "Abstract Closure"),
  • insert "non-empty" before "List of SyntaxError objects", and
  • add an Assert re _parseResult_ being a Parse Node.

spec.html Outdated Show resolved Hide resolved
@ljharb
Copy link
Member

ljharb commented May 21, 2020

@jmdyck would you like to rebase this down to a single commit with a nicer commit message than me naively concatenating them? :-)

@jmdyck
Copy link
Collaborator Author

jmdyck commented May 25, 2020

Thanks for the offer. I decided that there wasn't much benefit to enumerating the refactoring steps in the final commit message, but I'll do so here:


IsValidRegularExpressionLiteral:

  • Introduce an 'Else'
  • Introduce _u_
  • Rename _pText_ as _patternText_ and introduce _patternText_ into the other arm of the if-else
  • Factor out the last sentence of each 'Parse' step

RegExpInitialize:

  • Replace _BMP_ with its complement _u_
  • Swap the order of the 2 arms of the 'If'
  • Rename _pText_ as _patternText_
  • Reword a step slightly
  • Split the 'If'
  • Factor out the last sentence of each 'Parse' step

Extract operation 'ParsePattern'

IsValidRegularExpressionLiteral:

  • Simplify

... from common code in IsValidRegularExpressionLiteral and RegExpInitialize.

(This was originally presented as a series of 12 small refactorings.
For more info, see the PR page.)
@ljharb ljharb merged commit d14a282 into tc39:master May 26, 2020
@jmdyck jmdyck deleted the ParsePattern branch May 26, 2020 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants