Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP 8.0 | Add support for match expressions #3037

Closed
jrfnl opened this issue Jul 26, 2020 · 21 comments · Fixed by #3226
Closed

PHP 8.0 | Add support for match expressions #3037

jrfnl opened this issue Jul 26, 2020 · 21 comments · Fixed by #3226
Milestone

Comments

@jrfnl
Copy link
Contributor

jrfnl commented Jul 26, 2020

PHP 8.0 introduces a new type of control structure: match expressions.

Ref: https://wiki.php.net/rfc/match_expression_v2

function matchWithDefault($i) {
    return match ($i) {
        1 => 1,
        2 => 2,
        default => 'default',
    };
}

I've been working on adding support for these to PHP_CodeSniffer and believe I am nearly finished, but, no surprise, I'm left with two three questions for which I'd like a second opinion from @gsherwood and whoever else has an opinion on it.

Double arrow handling/tokenizing

As can be seen in the above code sample, match expressions introduce yet another use for the double arrow.

I've ran some tests and code like the below appears to be supported:

$array = array(
    match ($test) { 1 => 'a', 2 => 'b' },
);

$array = [
    match ($test) { 1 => 'a', 2 => 'b' } => 'dynamic keys, woho!',
];

While I honestly and truly hope that I will never in my lifetime come across code such as the second sample in a real life codebase, it is valid in PHP 8.0+.

So, that brings me to my question: to prevent issues with array arrow alignment sniffs, should the double arrow when used in match expressions be tokenized differently ? I imagine a T_MATCH_ARROW custom token could be used.

Mind: if so, code like the below would need to tokenize correctly:

$array = [
    // In order: match arrow, array arrow, match arrow, array arrow, match arrow, array arrow, match arrow.
    match ($test) { 1 => [ 1 => 'a'], 2 => 'b' } => match ($test) { 1 => [ 1 => 'a'], 2 => 'b' },
];

Scope closer sharing with arrow functions

function matchInArrowFunction($x) {
    $fn = fn($x) => match(true) {
        1, 2, 3, 4, 5 => 'foo',
        default => 'bar',
    };
    
    return $fn;
}

For the above code sample, which would be valid in PHP 8, the scope closer of the arrow function and the scope closer of the match expression would both be the }, as things are at the moment.

So what should be the scope_condition for the close curly ? And what should be in the conditions array for the code between the match curly braces ?

I would personally find it more intuitive for the match to be the scope owner when writing sniffs.

So, what about if for the arrow function, the ; after the match expression is seen as the scope closer (if available) ?
That would line up with simpler arrow function code, like the below where the ; is also the scope closer for the arrow function.

$fn = fn($x) => 10 * $x;

Default case scope start/end

As the cases in a match expression have no break or return statement and default is treated as a scope and the tokenizer tries to assign it scope openers/closers, this becomes interesting.

The scope opener is clearly the T_DOUBLE_ARROW (or T_MATCH_ARROW if it would be named separately), the scope closer a T_COMMA or T_CLOSE_CURLY_BRACKET (shared with T_MATCH), where in both cases, we need to make sure that the scope closer doesn't get confused with comma's or curly braces from, for instance, a short array being returned or a nested match expression or other construct which also uses curly braces.

This also begs the question whether the T_DEFAULT in a T_MATCH should have its own token to prevent the scope setting for a "normal" switch default case getting confused by the different scope openers/closers allowed for the switch default and the match default.

Either way, I'd appreciate some input on this so I can finish off the PR.

@jrfnl
Copy link
Contributor Author

jrfnl commented Jul 27, 2020

FYI: I've edited the above issue to add a third question about the default case.

@jrfnl
Copy link
Contributor Author

jrfnl commented Aug 1, 2020

@gsherwood Any thoughts on the above ?

@jrfnl
Copy link
Contributor Author

jrfnl commented Aug 2, 2020

Initial inventory of affected sniffs:

  • Generic.CodeAnalysis.AssignmentInCondition
  • Generic.CodeAnalysis.EmptyStatement
  • Generic.PHP.LowerCaseKeyword
  • PEAR.ControlStructures.ControlSignature
  • PSR2.ControlStructures.ControlStructureSpacing
  • PSR12.ControlStructures.BooleanOperatorPlacement
  • PSR12.ControlStructures.ControlStructureSpacing
  • Squiz.Commenting.LongConditionClosingComment
  • Squiz.Commenting.PostStatementComment
  • Squiz.ControlStructures.ControlSignature
  • Squiz.ControlStructures.LowercaseDeclaration
  • Squiz.Formatting.OperatorBracket
  • Squiz.PHP.DisallowMultipleAssignments
  • Squiz.WhiteSpace.ControlStructureSpacing

I have commits ready for all sniffs with a checkmark and will pull those once the initial tokenizer changes have been pulled and merged.

Not claiming completeness, there may be more sniffs which need changes, but this should take care of the bulk of it.


Edit: Some more sniffs found in the mean time which I have adjusted and/or added tests to related to match structures:

  • PEAR.WhiteSpace.ScopeClosingBrace
  • Squiz.Objects.ObjectInstantiation
  • Squiz.WhiteSpace.ScopeClosingBrace

Edit: And yet some more:

  • Generic.Arrays.ArrayIndent
  • Generic.WhiteSpace.ScopeIndent (may still need more tests)
  • Generic.ControlStructures.DisallowYodaConditions

@gsherwood
Copy link
Member

Double arrow handling/tokenizing

If it makes life easier for the sniffs, then a new token for T_MATCH_ARROW makes sense. If the scope matching is solved, it's not going to be hard to skip match expressions when processing arrays or looking for T_DOUBLE_ARROW, so maybe it's not needed.

I normally don't add new tokens until I start repeating code in the sniffs and you've done a lot of work on them already, so I'm happy to go with whatever you think it most helpful for sniff developers. If you are unsure, leave it as T_DOUBLE_ARROW and we can review as more sniffs are worked on.

Scope closer sharing with arrow functions

For the case where an arrow function ends with scope closer, and there is a semicolon directly after that, I think we could safely say that the semicolon ends the scope. It gets a bit more strange when the arrow function is followed by a comma when used as a function argument, but that's already regarded as the scope closer for arrow functions (not technically correct, but close enough) so I think there is clear precedent for what you're suggesting.

So agree, the match expression should be the scope owner, and the semicolon/comma/whatever should close the arrow function.

Default case scope start/end

If I was personally working on this one, I'd try and get the scope closer for the T_DEFAULT to work without adding a new token, but I'd be expecting a lot of pain. I could see myself quickly switching to a T_MATCH_DEFAULT token just to stop the existing sniffs from getting confused, but it's still going to be a pain to get the scope closer matching properly. But at least the damage has been limited to new sniffs trying to enforce rules for match expressions.

Given you're working on this, I'll go with whatever you think is easiest. Although if you think a T_MATCH_ARROW token is useful, I think having a T_MATCH_DEFAULT to go alone with it would makes sense, so I would probably go straight down that path.

@jrfnl
Copy link
Contributor Author

jrfnl commented Aug 4, 2020

@gsherwood Thanks for responding.

I agree that if the scope condition setting, especially in combination with arrow functions, can be sorted properly, we may not need the T_MATCH_ARROW, but that's going to be a puzzle.

For the default keyword, I think the new token will probably make life a lot easier, as I wouldn't want to break the condition setting for the switch T_DEFAULT and adding more start/end tokens to that in the PHP::$scopeOpeners array has a ~95% risk of breaking the "end" finding for switch default cases, especially as there are no unit test for the existing code dealing with T_DEFAULT.

@rask
Copy link

rask commented Oct 27, 2020

Otherwise I'm seeing PHPCS work (as in not fail) with match, but Generic.WhiteSpace.ScopeIndent.IncorrectExact is off by 4 spaces when it tries to lint match blocks that contain a default. It thinks the indent is 4 spaces too deep.

Example code that gives me error:

$value = match ($value) {
    '' => null,
    'false' => false,
    'true' => true,
    default => $value, # this line is indented 4 spaces too deep according to PHPCS
};

The following code is linted and gives no error:

$value = match (true) {
    true => 1,
    false => 2,
};

@jrfnl jrfnl mentioned this issue Dec 7, 2020
@jrfnl
Copy link
Contributor Author

jrfnl commented Feb 13, 2021

For the default keyword, I think the new token will probably make life a lot easier, as I wouldn't want to break the condition setting for the switch T_DEFAULT and adding more start/end tokens to that in the PHP::$scopeOpeners array has a ~95% risk of breaking the "end" finding for switch default cases, especially as there are no unit test for the existing code dealing with T_DEFAULT.

Finally back to working on the PR to fix this and the longer I'm working on, the more I can only conclude that a separate T_MATCH_DEFAULT token is needed as the breakage to the tokenization of T_SWITCH statements, caused by adding the T_COMMA and T_CLOSE_CURLY_BRACKET tokens to the PHP:$scopeOpeners['T_DEFAULT']['end'] array, would be too huge and has a severe ripple effect for the setting of all scope closers after a switch.

So for now, I'm moving forward with this by adding the T_MATCH_DEFAULT and T_MATCH_ARROW tokens.

@jrfnl
Copy link
Contributor Author

jrfnl commented Feb 20, 2021

Nearly there... got the scope setting working completely, including in combination with arrow functions, which was the hardest part.

Only thing left to do is finalize the T_MATCH_ARROW retokenization.

@jrfnl
Copy link
Contributor Author

jrfnl commented Feb 22, 2021

PR #3226 addresses this issue.

Once that PR has been merged, I will pull follow-up PRs to address supporting match expressions in individual sniffs.
For the sniffs mentioned above, I have the necessary changes prepared and ready to go, though there may be some more sniffs which need adjusting.

@jrfnl
Copy link
Contributor Author

jrfnl commented Feb 23, 2021

Pulled the first round of sniff specific PRs. Will pull a second round after I've had some sleep ;-)

@gsherwood
Copy link
Member

Pulled the first round of sniff specific PRs. Will pull a second round after I've had some sleep ;-)

Thanking you here for all these incoming PRs instead of on each one. I'll also put something in the changelog when they are done as well.

@jrfnl
Copy link
Contributor Author

jrfnl commented Feb 24, 2021

Sounds like a plan.

A few more PRs coming up, aside from the six currently still open.

I'm looking at the File::findEndOfStatement() code at this moment and I fear that will need some highly custom code to handle finding the end of a case/default expression within a match statement, though I suppose it depends on the interpretation of "end".

Given this piece of test/example code:

$match = match ($a) {
    1, 2, => $a * $b,
    default, => 'something'
};

For anything within this 1, 2, => $a * $b code snippet, IMO, the comma after the $b should be returned as the end of statement.
And for the default, => 'something' snippet, it should be the 'something'.

@gsherwood Do you agree ? If so, I'll see what I can come up with, but basically, it looks like we'd need to check if this is within a match expression and if so, ignore comma's except when after the T_MATCH_ARROW.

Will probably need to set up something similar then for File::findStartOfStatement(), though that method currently doesn't have any unit tests, so changing anything is risky.

@jrfnl
Copy link
Contributor Author

jrfnl commented Feb 24, 2021

Status: aside from the above notes about the File::findStart/EndOfStatement() methods, I've reviewed all sniffs/code which refers to one of the following T_DOUBLE_ARROW, T_SWITCH, T_DEFAULT, Tokens::$scopeOpeners and Tokens::$parenthesisOpeners and either dismissed them as not applicable, added tests to them or adjusted them including adding tests.

The only two sniffs which are still on the to do list are:

  • Squiz.Arrays.ArrayDeclaration - I expect this one only needs some tests to confirm it handles things correctly as with the new tokens in place, I don't expect there to be issues.
  • Generic.Metrics.CyclomaticComplexity - this one will need adjusting, but I'm not 100% sure how to handle multi-case expressions.

@gsherwood
Copy link
Member

@gsherwood Do you agree ?

Sorry, just got back to this. I absolutely agree - it makes it consistent with how a single case (and default with no trialing comma) works, and how arrow functions are working.

I've written some test cases for this but haven't tried your suggested change yet. Will give it a go and see if I can get these tests passing.

@gsherwood
Copy link
Member

I think that's working fine. All I do is detect that the start token is inside a match expression, and then advance the start token to the match arrow and let it continue from there.

@gsherwood
Copy link
Member

I'm going to take a look at findStartOfStatement as well, but need to write some tests for it first.

@jrfnl
Copy link
Contributor Author

jrfnl commented Mar 12, 2021

All I do is detect that the start token is inside a match expression, and then advance the start token to the match arrow and let it continue from there.

What about if the start token given is in the "return" part of the match case expression ?

@gsherwood
Copy link
Member

What about if the start token given is in the "return" part of the match case expression ?

The code I've got looks backwards for a previous arrow, then forwards from there, looking for a comma. If there was a previous arrow but no comma found, we must be after the arrow. Otherwise, we must be before it, so apply that skip logic.

The actual code I've added to findEndOfStatement is:

// If the start token is inside the case part of a match expression,
// advance to the match arrow and continue looking for the
// end of the statement from there so that we skip over commas.
$matchExpression = $this->getCondition($start, T_MATCH);
if ($matchExpression !== false) {
    $beforeArrow = true;
    $prevMatchArrow = $this->findPrevious(T_MATCH_ARROW, ($start - 1), $this->tokens[$matchExpression]['scope_opener']);
    if ($prevMatchArrow !== false) {
        $prevComma = $this->findNext(T_COMMA, ($prevMatchArrow + 1), $start);
        if ($prevComma === false) {
            // No comma between this token and the last match arrow,
            // so this token exists after the arrow and we can continue
            // checking as normal.
            $beforeArrow = false;
        }
    }

    if ($beforeArrow === true) {
        $nextMatchArrow = $this->findNext(T_MATCH_ARROW, ($start + 1), $this->tokens[$matchExpression]['scope_closer']);
        if ($nextMatchArrow !== false) {
            $start = $nextMatchArrow;
        }
    }
}//end if

I've got a few test cases that are working but I likely need more before I commit anything. I also haven't tried any alternative methods of getting that logic working.

@gsherwood
Copy link
Member

Just an update that I've had to make some changes in findStartOfStatement based on some unit tests (it didn't work the way I'd expect in some cases) and for match expressions.

I'm pretty sure I'll need to change the find prev/next code to skip over parenthesis to support function calls in match arms and conditions properly, but I haven't tested this yet.

@jrfnl
Copy link
Contributor Author

jrfnl commented Mar 15, 2021

skip over parenthesis to support function calls in match arms and conditions properly, but I haven't tested this yet.

I haven't looked at the code yet (not even sure if it's public yet), but yes, both before the arrow as well as after the arrow, there can be an expression, be it a function call, array declaration or closure declaration to name but a few, so any kind of brackets would need to be jumped over.

@gsherwood
Copy link
Member

I've committed the tests and changes to support match expressions in findStart/EndOfStatement here: ef80e53

I've got some test cases in there, but very happy for people to provide more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants