Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments on empty blocks are dropped. #382

Closed
tsufeki opened this issue May 1, 2017 · 4 comments
Closed

Comments on empty blocks are dropped. #382

tsufeki opened this issue May 1, 2017 · 4 comments

Comments

@tsufeki
Copy link

tsufeki commented May 1, 2017

$ php-parse --var-dump '<?php /* a */{}'
====> Code <?php /* a */{}
==> var_dump():
/home/.../bin/php-parse:86:
array(0) {
}

Non-empty blocks work fine. I'm using master.

@tsufeki
Copy link
Author

tsufeki commented May 1, 2017

It seems that comments "in the middle" of compound statements are ignored too, for example

<?php if /* a */ (7) {}

@tsufeki
Copy link
Author

tsufeki commented May 1, 2017

One more thing: comments are often repeated, i.e. same comment appears in many nodes. This can be seen in tests, for example parser/blockComments.test. Is this intended? If so, what are the rules for repetition? Thanks for your work.

@7fe
Copy link

7fe commented Jul 24, 2017

@tsufeki in my opinion <?php if /* a */ (7) {} is acted as intended. I believe the comments are intended for typical documentation which wouldn't include such.

nikic added a commit that referenced this issue Oct 1, 2017
@nikic
Copy link
Owner

nikic commented Oct 1, 2017

There are three separate issues here:

  • Preservation of comments on empty blocks. This has been fixed by d418bf3.
  • Duplicate comments. This is a known bug tracked in Comments are assigned multiple times #253.
  • Comments in the middle of nodes. These are indeed not represented. As usual, it's still possible to extract them using token offsets and a bit of extra work. Assuming token offsets are enabled in the lexer, something along the lines of (not tested):
// $tokens from Lexer->getTokens()
function getInteriorComments(Node $node, array $tokens) {
    $comments = [];
    $pos = $node->getStartTokenPos();
    foreach ($node->getSubNodeNames() as $name) {
        $subNode = $node->$name;
        if ($subNode instanceof Node) {
            $endPos = backwardsAdjust($tokens, $subNode->getStartTokenPos());
            $comments = array_merge($comments, extractComments($tokens, $pos, $endPos));
            $pos = $subNode->getEndTokenPos() + 1;
        }
    }
    $endPos = $node->getEndTokenPos() + 1;
    $comments = array_merge($comments, extractComments($tokens, $pos, $endPos));
    return $comments;
}

function extractComments(array $tokens, int $startPos, int $endPos) {
    $comments = [];
    for ($pos = $startPos; $pos < $endPos; $pos++) {
        $token = $tokens[$pos];
        if ($token[0] === T_COMMENT) {
            $comments[] = new Comment($token[1]);
        } else if ($token[0] === T_DOC_COMMENT) {
            $comments[] = new Comment\Doc($token[1]);
        }
    }
    return $comments;
}

function backwardsAdjust(array $tokens, int $pos) {
    for (; $pos > 0; $pos--) {
        $token = $tokens[$pos-1];
        if (!in_array($token[0], [T_COMMENT, T_DOC_COMMENT, T_WHITESPACE])) {
            break;
        }
    }
    return $pos;
}

What this code does is to fetch all the comments that are inside the node but not inside a subnode. An additional complication (the backwardsAdjust function) is that we also want to skip the comments that are directly preceding the subnodes, as there will already be associated with the subnodes. (This would have been somewhat easier if comments also stored their starting token offset -- they currently only have the file offset.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants