Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sniff to cover correct use of @var #121

Merged
merged 2 commits into from
Mar 19, 2024
Merged

Conversation

andrewnicols
Copy link
Contributor

@andrewnicols andrewnicols commented Mar 14, 2024

Based on Squiz\Commenting\VariableComment, but:

  • prefer bool, not boolean
  • prefer int, not integer
  • check scalar types in string[]
  • also checks the inline @var docblocks for promoted properties

Note: Based on #119 as they share some new util methods.

@andrewnicols andrewnicols requested a review from stronk7 March 14, 2024 08:48
Copy link

codecov bot commented Mar 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.81%. Comparing base (7c207d5) to head (f4c4202).

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #121      +/-   ##
============================================
+ Coverage     97.69%   97.81%   +0.11%     
- Complexity      778      827      +49     
============================================
  Files            34       35       +1     
  Lines          2302     2426     +124     
============================================
+ Hits           2249     2373     +124     
  Misses           53       53              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andrewnicols added a commit to andrewnicols/moodle-local_moodlecheck that referenced this pull request Mar 14, 2024
@andrewnicols andrewnicols force-pushed the varSniff branch 3 times, most recently from 9ea01f8 to 2cd4563 Compare March 14, 2024 14:34
andrewnicols added a commit to andrewnicols/moodle-local_moodlecheck that referenced this pull request Mar 14, 2024
andrewnicols added a commit to andrewnicols/moodle-local_moodlecheck that referenced this pull request Mar 16, 2024
@stronk7
Copy link
Member

stronk7 commented Mar 16, 2024

(a rebase here would be welcome, just in case there is something in the latest #119 version making a difference)

@james-cnz
Copy link

Hi @andrewnicols,

I notice there's some overlap between my pull request #123 and this one. It looks like yours does a bunch of checks on variable comments, whereas mine mainly just checks types, but I think my type checking is probably more comprehensive overall (e.g. mine handles DNF types).

How do you feel about me taking out the few checks from mine that aren't specifcally related to type checking (checking there's a comment, that it has a var tag, and not more than one), and you taking out the checks from yours that are specifically related to type checking?

It looks like there are some things I would need to add to mine, though. Mine doesn't handle attributes. It looks like this is a PHP 8 feature. I've only tested against Moodle 4.1, so I guess I'll need to check if there's anything else in PHP 8.x I need to support.

Mine also doesn't check type short forms are used, and in lower case. I think I could probably add this, if it's important. Automatically fixing these could be tricky, though. My checker supports multi-line types. It extracts the type info from multiple T_DOC_COMMENT_STRING tags, and concats them together before doing type checking, so information about which tags the type is from, and where in the tag it is, is lost.

I notice your checker supports arrays in the form array(type1 => type2). This isn't a standard PHPDoc form, AFAIK. It doesn't seem to be supported by either PHPStan or Psalm, which I think are the two most commonly used PHP static checkers, and PHPStan in particular is pretty comprehensive, I think. I'd guess this is a Moodle idiom. I could probably add this to my checker, but I'm not sure it's a good idea to encourage it. I doubt there's much tooling support for it, so I think using it would probably really limit what the PHPDoc annotations are useful for.

@andrewnicols
Copy link
Contributor Author

Hi James.

I notice there's some overlap between my pull request #123 and this one. It looks like yours does a bunch of checks on variable comments, whereas mine mainly just checks types, but I think my type checking is probably more comprehensive overall (e.g. mine handles DNF types).

This sniff is intended to ensure:

  • each variable definition has a docblock
  • that the general format of that docblock is correct and meets one of the known variants (single-line, or multi-line)
  • that the types are, approximately, correct
  • to provide fixes where a clear fix is available

How do you feel about me taking out the few checks from mine that aren't specifcally related to type checking (checking there's a comment, that it has a var tag, and not more than one), and you taking out the checks from yours that are specifically related to type checking?

I'm fine with this in principle but I think it's the wrong approach in this instance.

I think we have two possible approaches here:

  1. Perform the type checking in your new Sniff and remove it from this one (your proposal)
  2. Perform the type checking in the var Sniff, but use the utilities from your patch to do so when that patch lands

The former has the benefit that all type checking is in one place, but the detractors that one place has to be aware of a variety of formatting options for a range of different tag types (@var, @param, @param-*, @return, etc.), and that all type checking is in one place and cannot be applied on a granular level.

The latter has the benefit that each usage check is with similar checks. That is to say that the Variable Sniff will check that a docblock is present for each variable, that it is correctly formatted, that there is only one mention, etc. whilst a ConstSniff will do the same for all const calls, and possibly define calls. The detractor is that in some cases there may be code duplication within the sniff or duplicated effort.

I'm more inclined to go for the latter approach where each variable check is with similar checks, but uses a library to check the value as this will be easier to manage and maintain, and easier to land these changes in a more progressive approach (not just big bang).

I've already tried to build my approach to thing along those lines with the intent of refactoring when the next part of the puzzle needs it. That is from:

protected static function  suggestTypes(string $type): string;

To a utility method:

public static function checkType(File $phpcsFile, int $stackPtr, string $type): string;

In this case what I would propose is that we initially introduce those checks in this issue and, since your change is significantly larger and likely to take longer to review and finalise and therefore likely to land after this one, we consume yours as a library instead of having it be a do-everything Sniff.

Doing it this way allows us to get the check moved over and to remove it from moodlecheck and possibly to perform a release so that these checks are more widely available earlier. It also allows people to filter and test different checks more easily.

It looks like there are some things I would need to add to mine, though. Mine doesn't handle attributes. It looks like this is a PHP 8 feature. I've only tested against Moodle 4.1, so I guess I'll need to check if there's anything else in PHP 8.x I need to support.

Yes - Moodle 4.3 onwards makes use of Attributes. and we are already planning to drop support for PHP 7.4 soon so your changes will need to support attributes.

Mine also doesn't check type short forms are used, and in lower case. I think I could probably add this, if it's important.

I feel these are fairly important as they are part of the coding style. They're also very easy with phpcs.

In this case, the basis for these came from an upstream Sniff from Squiz which was mostly suitable for our needs, but not quite.

Automatically fixing these could be tricky, though.

Ah - we should really aim to have anything which is auto-fixable do so. This is one of the many great benefits of phpcs over moodlechecker.

My checker supports multi-line types. It extracts the type info from multiple T_DOC_COMMENT_STRING tags, and concats them together before doing type checking, so information about which tags the type is from, and where in the tag it is, is lost.

I'm not sure that I follow there. Perhaps you can give an example. The phpcs parser already does much of this work for us and if you find the T_DOC_COMMENT_OPEN_TAG tag, you'll find that phpcs includes in its structure an array of tags named comment_tags whose value is the ptr for each of the @ tags in the docblock. For example:

<?php

/**
 * @covers \some_class
 */
class some_test extends \advanced_testcase {

If I fetch the classblock's open tag here, and get the tokens:

$tokens = $phpcsFile->getTokens();
while ($docPtr = $phpcsFile->findNext(T_DOC_COMMENT_OPEN_TAG, $stackPtr)) {
    $docblock = $tokens[$docPtr];
    foreach ($docblock['comment_tags'] as $tagPtr) {
        // The tag name is here:
        $tagName = $tokens[$tagPtr]['content'];

Unfortunately we still have to process the actual vars, but there should be no need to concat the docs together or anything like that.

I notice your checker supports arrays in the form array(type1 => type2). This isn't a standard PHPDoc form, AFAIK. It doesn't seem to be supported by either PHPStan or Psalm, which I think are the two most commonly used PHP static checkers, and PHPStan in particular is pretty comprehensive, I think. I'd guess this is a Moodle idiom.

No, it's actually something I copied over from the original phpcs check. I'm not a fan and I had already contemplated removing it, but now I definitely will.

Since phpdoc now supports the same syntax as psalm, I'll change it:

// From
/** @var array(int) */
// To
/** @var array<array-key, int> */

// And From:
/** @var array(int => some_object) */
// To
/** @var array<int, some_object> */

This is compliant with both psalm and phpdoc.
I've added an auto-fix rule for this too.

@james-cnz
Copy link

Hi Andrew,

Regarding doing type checking as part of specific construct checks (variable var, function param & return, not sure what you mean by param-* though) vs considered its own thing, I really think type checking should be considered it's own thing.

From an implementation perspective, I'm gathering namespace and use alias info for the type checking, and doing a preliminary pass over the file to gather class hierarchies. So although either approach would likely involve some duplication of code, I think having type checking spanning sniffs (at least in the case of doing the checking I am) would probably involve more time overhead.

From a usability perspective, I can only really think of two reasons why someone might want to disable type checking. For one, they don't want it at all, either because they're using some in-house style, or they're using types informally. In this case, I can imagine people wanting to enable or disable type checks on a file or directory basis, for different parts of a project, but not on a per-construct basis. I can't imagine anyone wanting, e.g. the standard style for function param and return types, but a custom in-house style for variable var types.

For the other, possibly they do want type checking eventually, but to add it progressively. Again, in this case, I can imagine people wanting to enable or disable type checks on a file or directory basis, for different parts of a project, but not on a per-construct basis. Ideally, code is separated into files because it's a cohesive unit. To figure out what the types should be, I think someone's likely to be focusing primarily on the code in a file at a time, to figure out what parameters it can accept, what it could return, and what it's going to store in and expect to retrieve from variables. I don't think they're likely to want to, e.g., look at a file and figure out the param and return types, then go away and look at param and return types in other files, then come back again later for var types.

If people want type checking in a particular file or directory, I think they're going to want it for all tags, and having the type checking spanning two sniffs would just mean they're going to have to enable or disable two sniffs to do what they want. I can imagine, though, people wanting to do some var checks, but not the type check.

Regarding comment tags, I'm being pedantic, and checking that the tags start on a new line. Technically, I think param, return, and var tags within a line don't count, so I'm not sure the list would help me. But the issue is with multi-line types like this:

    /** @var non-empty-array<\stdClass&object{type: string, namespace: string, uses: string[], templates: string[],
     *                                  classname: ?string, parentname: ?string, opened: bool, closer: ?int}>
     * file scope: classish, function, etc.  We only need a closer if we might be in a switch statement. */
    protected array $scopes;

I think (although don't quite me) the type starts with a T_DOC_COMMENT_STRING, then there's a T_DOC_COMMENT_WHITESPACE for the line break, a T_DOC_COMMENT_WHITESPACE at the start of the line, a T_DOC_COMMENT_STAR, another T_DOC_COMMENT_WHITESPACE after that, then the type continues in another T_DOC_COMMENT_STRING. I concat the T_DOC_COMMENT_STRINGs together with some whitespace between, and pass it to the type parser. I think it would take a bit of work to keep track of where the text came from, and figure out how to put the replacements back in the appropriate tags.

@andrewnicols
Copy link
Contributor Author

Regarding doing type checking as part of specific construct checks (variable var, function param & return, not sure what you mean by param-* though) vs considered its own thing, I really think type checking should be considered it's own thing.

I'm inclined to disagree because each tag can be used differently and can have different rules applied. For example:

  • @var on properties, and constants in form @var TYPE [optional description] or * Description\n* @var TYPE
  • @param on method docblocks in the form @param TYPE VARIABLE [optional description]
  • @param-read for params in properties returned by __get() in the form @param-read TYPE VARIABLE [optional description]
  • @param-write for params in properties settable by __set() in the form @param-read TYPE VARIABLE [optional description]
  • @return on method docblocks in the form @return TYPE [optional description]
  • Possibly others

The type checking should not be tightly linked to the specific usage, just the checking on a case-by-case basis.

From an implementation perspective, I'm gathering namespace and use alias info for the type checking, and doing a preliminary pass over the file to gather class hierarchies. So although either approach would likely involve some duplication of code, I think having type checking spanning sniffs (at least in the case of doing the checking I am) would probably involve more time overhead.

I agree, it's a balance we need to find.

From a usability perspective, I can only really think of two reasons why someone might want to disable type checking. For one, they don't want it at all, either because they're using some in-house style, or they're using types informally. In this case, I can imagine people wanting to enable or disable type checks on a file or directory basis, for different parts of a project, but not on a per-construct basis. I can't imagine anyone wanting, e.g. the standard style for function param and return types, but a custom in-house style for variable var types.

The third is that they want to fix things gradually. It's challenging to review large groups of automatic fixes.

Regarding comment tags, I'm being pedantic, and checking that the tags start on a new line. Technically, I think param, return, and var tags within a line don't count, so I'm not sure the list would help me. But the issue is with multi-line types like this:

Sorry, what is this in relation to? I'm not sure of the context of your reply.

    /** @var non-empty-array<\stdClass&object{type: string, namespace: string, uses: string[], templates: string[],
     *                                  classname: ?string, parentname: ?string, opened: bool, closer: ?int}>
     * file scope: classish, function, etc.  We only need a closer if we might be in a switch statement. */
    protected array $scopes;

I think (although don't quite me) the type starts with a T_DOC_COMMENT_STRING, then there's a T_DOC_COMMENT_WHITESPACE for the line break, a T_DOC_COMMENT_WHITESPACE at the start of the line, a T_DOC_COMMENT_STAR, another T_DOC_COMMENT_WHITESPACE after that, then the type continues in another T_DOC_COMMENT_STRING. I concat the T_DOC_COMMENT_STRINGs together with some whitespace between, and pass it to the type parser. I think it would take a bit of work to keep track of where the text came from, and figure out how to put the replacements back in the appropriate tags.

The above has the following AST, noting that the line nos are some other parts will be context-specific.

(
    [0] => Array
        (
            [content] => @var
            [code] => PHPCS_T_DOC_COMMENT_TAG
            [type] => T_DOC_COMMENT_TAG
            [line] => 45
            [column] => 9
            [length] => 4
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [1] => Array
        (
            [content] =>  
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 45
            [column] => 13
            [length] => 1
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [2] => Array
        (
            [content] => non-empty-array<\stdClass&object{type: string, namespace: string, uses: string[], templates: string[],
            [code] => PHPCS_T_DOC_COMMENT_STRING
            [type] => T_DOC_COMMENT_STRING
            [line] => 45
            [column] => 14
            [length] => 102
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [3] => Array
        (
            [content] => 

            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 45
            [column] => 116
            [length] => 0
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [4] => Array
        (
            [content] =>      
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 46
            [column] => 1
            [length] => 5
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [5] => Array
        (
            [content] => *
            [code] => PHPCS_T_DOC_COMMENT_STAR
            [type] => T_DOC_COMMENT_STAR
            [line] => 46
            [column] => 6
            [length] => 1
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [6] => Array
        (
            [content] =>                                   
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 46
            [column] => 7
            [length] => 34
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [7] => Array
        (
            [content] => classname: ?string, parentname: ?string, opened: bool, closer: ?int}>
            [code] => PHPCS_T_DOC_COMMENT_STRING
            [type] => T_DOC_COMMENT_STRING
            [line] => 46
            [column] => 41
            [length] => 69
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [8] => Array
        (
            [content] => 

            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 46
            [column] => 110
            [length] => 0
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [9] => Array
        (
            [content] =>      
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 47
            [column] => 1
            [length] => 5
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [10] => Array
        (
            [content] => *
            [code] => PHPCS_T_DOC_COMMENT_STAR
            [type] => T_DOC_COMMENT_STAR
            [line] => 47
            [column] => 6
            [length] => 1
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [11] => Array
        (
            [content] =>  
            [code] => PHPCS_T_DOC_COMMENT_WHITESPACE
            [type] => T_DOC_COMMENT_WHITESPACE
            [line] => 47
            [column] => 7
            [length] => 1
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [12] => Array
        (
            [content] => file scope: classish, function, etc.  We only need a closer if we might be in a switch statement. 
            [code] => PHPCS_T_DOC_COMMENT_STRING
            [type] => T_DOC_COMMENT_STRING
            [line] => 47
            [column] => 8
            [length] => 98
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [13] => Array
        (
            [content] => */
            [code] => PHPCS_T_DOC_COMMENT_CLOSE_TAG
            [type] => T_DOC_COMMENT_CLOSE_TAG
            [comment_opener] => 248
            [line] => 47
            [column] => 106
            [length] => 2
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

    [14] => Array
        (
            [type] => T_WHITESPACE
            [code] => 392
            [content] => 

            [line] => 47
            [column] => 108
            [length] => 0
            [level] => 1
            [conditions] => Array
                (
                    [18] => 333
                )

        )

The general approach to this would be:

  • identify the start and end of the type
  • store these, along with the concatenated type
  • replace the first type part with the new type
  • replace the rest of them with an empty string

That said, we want to strongly discourage complex objects unless they are a class in their own right. They may be syntactically valid, but they're not a good pattern.

In fact, the above is not recognised by phpDocumentor in any case. See the attached screenshot.

complexObjectDoc

In any case, I feel that the above should be discussed in your issue, not here.

@james-cnz
Copy link

I'm inclined to disagree because each tag can be used differently and can have different rules applied. For example:

From a type checking perspective, these are all basically of the form: tag, type, maybe var, and little else of interest, with a couple of wrinkles. The type checker needn't care whether there's comments before or after, or anything. After I've extracted the contents of the comment, I do the rest in one function parseTypeAndVar with a parameter $getwhat indicating what needs to be fetched. It's 95 lines including blank lines and comments. If it were seperate functions for each variation, I think it would probably be quite a bit longer, and the copy-paste detector would complain.

Regarding param-read & param-write, I guess you mean property, property-read, and property-write. Bother, I'd completely forgotten about those.

The third is that they want to fix things gradually. It's challenging to review large groups of automatic fixes.

As I said, again, in this case, I can imagine people wanting to enable or disable type checks on a file or directory basis, for different parts of a project, but not on a per-construct basis. Ideally, code is separated into files because it's a cohesive unit. To figure out what the types should be, I think someone's likely to be focusing primarily on the code in a file at a time, to figure out what parameters it can accept, what it could return, and what it's going to store in and expect to retrieve from variables. I don't think they're likely to want to, e.g., look at a file and figure out the param and return types, then go away and look at param and return types in other files, then come back again later for var types.

andrewnicols added a commit to andrewnicols/moodle-local_moodlecheck that referenced this pull request Mar 18, 2024
andrewnicols added a commit to andrewnicols/moodle-local_moodlecheck that referenced this pull request Mar 18, 2024
@james-cnz james-cnz mentioned this pull request Mar 18, 2024
@andrewnicols andrewnicols self-assigned this Mar 19, 2024
@stronk7
Copy link
Member

stronk7 commented Mar 19, 2024

This is compliant with both psalm and phpdoc.

Aside from everything else... since when are those generics compliant with PHPDoc? I remember something similar was in the PHPDoc (like 8 years ago), but it's not, anymore, in current version.

Ciao :-)

@andrewnicols
Copy link
Contributor Author

Aside from everything else... since when are those generics compliant with PHPDoc? I remember something similar was in the PHPDoc (like 8 years ago), but it's not, anymore, in current version.

Damnit... I could have sworn I'd found it in the phpdoc documentation, but now I can't find it.

I did check that it worked though, for example the following content:

    public function __construct(
        /** @var array<string, string> Examples of how it is done */
        public readonly array $examples,

        /** @var array<string, int> */
        public readonly array $names,

        /** @var array<int, Example> */
        public readonly array $foods,
    ) {}

Generates the following docs:
Screenshot 2024-03-19 at 21 34 00

It looks like it happened in this commit where they switched to phpstan/phpdoc-parser for type parsing, which brought with it the same support for generics as phpstan.

@andrewnicols
Copy link
Contributor Author

Okay, I've removed the generics support and I think I've addressed all other comments.

@andrewnicols
Copy link
Contributor Author

Rebased.

@stronk7
Copy link
Member

stronk7 commented Mar 19, 2024

I like current approach in this PR, avoiding to make any type / matching check and, instead, just verify that the the @var tags are well-formed, and provide the fixing of some long => short types.

I think that we can implement the type checks apart, maybe looking to @james-cnz stuff (I've not yet), let's see. But better keep this Sniff away from any type check between the phpdoc and the php code.

The only two points that, maybe, could be missing in this PR, that I can imagine are:

  1. Maybe it would make sense to also have null in the types, so they can be lowercased and things like that? It's valid phpdoc type, so far.
  2. The other case that I think that is not covered and it looks like it could be easy to implement, are the fixes when the type is a phpdoc union, I mean, things like boolean|INTEGER => boo|int and similar ones.
  3. This is a question, does phpdoc allow intersection (&) nomenclature too? If so, the above should apply also to them.

In any case, as said, I'd propose to create another issue for those cases, they are not critical for this PR that perfectly implements what we are trying to replace (from moodle-local_moodlecheck) and more.

So, all right, IMO.

Ciao :-)

@stronk7 stronk7 merged commit 24bc48a into moodlehq:main Mar 19, 2024
8 checks passed
stronk7 added a commit to moodlehq/moodle-local_moodlecheck that referenced this pull request Mar 19, 2024
@stronk7
Copy link
Member

stronk7 commented Mar 19, 2024

Answering to myself about point 3. above, it seems that phpdoc (the current proposal) does support intersection types:

https://github.com/php-fig/fig-standards/blob/master/proposed/phpdoc.md#details

So, all the comments above about union types do also apply to intersection ones (maybe also to (A|B)&C combos, I'd infer.

Re-ciao :-)

@andrewnicols andrewnicols mentioned this pull request Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants