Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for interpolated strings to HEREDOC #121

Merged
merged 4 commits into from
Jun 22, 2022

Conversation

cfroystad
Copy link
Collaborator

@cfroystad cfroystad commented Jan 31, 2022

Adds support for:

  • interpolated strings to heredoc with a structure similar to encapsed_string
  • access to the heredoc and nowdoc tags
  • string_value nodes to allow getting the actual string value without the surrounding '|"

Issues

In order to not have text_interpolation kick in whenever ?> or <? is seen in heredocs (could be across heredocs),
I've had to add these as tokens in _interpolated_string_body_heredoc to be aliased as string_value. This feels ugly and
should be unneccessary, but I've yet to find a way around it with what limited time I'm able to devote at the moment. Any
suggestions would be most welcome!

Checklist:

  • All tests pass in CI
  • There are sufficient tests for the new fix/feature
  • Grammar rules have not been renamed unless absolutely necessary (0 rules renamed)
  • The conflicts section hasn't grown too much (1 new conflict)
  • The parser size hasn't grown too much (master: 2352, PR: 2543)

@cfroystad

This comment was marked as outdated.

@cfroystad cfroystad marked this pull request as draft February 3, 2022 07:52
@cfroystad

This comment was marked as outdated.

@cfroystad cfroystad removed the request for review from maxbrunsfeld February 3, 2022 12:16
@cfroystad cfroystad force-pushed the heredoc branch 2 times, most recently from 169fc16 to 42ecf1a Compare May 6, 2022 10:06
@cfroystad cfroystad marked this pull request as ready for review May 6, 2022 14:04
@cfroystad cfroystad requested a review from maxbrunsfeld May 9, 2022 16:03
@cfroystad cfroystad linked an issue May 10, 2022 that may be closed by this pull request
@Sjord
Copy link
Contributor

Sjord commented May 31, 2022

Empty heredocs don't seem to work:

<?php
$a = <<< EOF
EOF;
$ tree-sitter parse test.php
(program [0, 0] - [3, 0]
  (php_tag [0, 0] - [0, 5])
  (expression_statement [1, 0] - [2, 4]
    (assignment_expression [1, 0] - [2, 3]
      left: (variable_name [1, 0] - [1, 2]
        (name [1, 1] - [1, 2]))
      (ERROR [1, 5] - [2, 0]
        (heredoc_start [1, 8] - [1, 12]))
      right: (name [2, 0] - [2, 3]))))
test.php	0 ms	(ERROR [1, 5] - [2, 0])

@Sjord
Copy link
Contributor

Sjord commented May 31, 2022

Also, this fails to parse:

<?php
$a = <<<EOT
 \$
EOT;
$ tree-sitter parse test.php
(program [0, 0] - [4, 0]
  (php_tag [0, 0] - [0, 5])
  (expression_statement [1, 0] - [3, 4]
    (assignment_expression [1, 0] - [3, 3]
      left: (variable_name [1, 0] - [1, 2]
        (name [1, 1] - [1, 2]))
      right: (heredoc [1, 5] - [3, 3]
        identifier: (heredoc_start [1, 8] - [1, 11])
        value: (heredoc_body [1, 11] - [2, 3]
          (ERROR [2, 1] - [2, 2])
          (string_value [2, 2] - [2, 3]))
        end_tag: (heredoc_end [3, 0] - [3, 3])))))
test.php	0 ms	(ERROR [2, 1] - [2, 2])

@Sjord
Copy link
Contributor

Sjord commented May 31, 2022

Also, lines with only whitespace fail to parse:

<?php

echo <<<EOT
hello
 
world
EOT;

The line between "hello" and "world" contains a single space.

$ tree-sitter parse test.php
(program [0, 0] - [7, 0]
  (php_tag [0, 0] - [0, 5])
  (echo_statement [2, 0] - [6, 4]
    (ERROR [2, 5] - [5, 5]
      (heredoc_start [2, 8] - [2, 11])
      (string_value [3, 0] - [3, 5])
      (name [5, 0] - [5, 5]))
    (name [6, 0] - [6, 3])))
test.php	0 ms	(ERROR [2, 5] - [5, 5])

@cfroystad
Copy link
Collaborator Author

Thanks for testing, @Sjord

The issues has been fixed and your excellent edge cases has been added to the test suite.

@Sjord
Copy link
Contributor

Sjord commented Jun 9, 2022

Thanks. I have tested some more and came up with some more edge cases:

  • It currently fails to parse when there is a null-byte in the heredoc content.
  • It fails to parse carriage return followed by some letter.
  • It fails to parse when a line starts with a partial end tag:
<?php
echo <<< EOF
E
EOF;
<?php
echo <<< EOF
EO
EOF;

@cfroystad
Copy link
Collaborator Author

Nice finds! Testing is something I'm working on becoming better at, so thank you for the "lesson" 🙂

I'm unable to add a test for the carriage return version since git seems to be converting my line endings 🙁 However, all bugs has been fixed. I've added CR as a choice in_new_line since that's how Notepad++ interprets it and how Mac OS handled line endings until 2002.

@Sjord
Copy link
Contributor

Sjord commented Jun 10, 2022

Thanks. I have found some more nowdocs that fail to parse. In particular, a nowdoc with an empty line:

<?php
echo <<< 'EOF'

EOF;

Also, with \x0b, \x0c or \x0d on a line.

@cfroystad
Copy link
Collaborator Author

Thanks! Given that I've understood the issues correctly, they're all fixed in the latest commit.

@Sjord
Copy link
Contributor

Sjord commented Jun 21, 2022

Confirmed, everything works.

I see I was a little bit unclear with the hex values in my last message. With \x0b I meant the byte which hexadecimal representation is 0b, not literally \x0b. But I tested it and it works as expected.

@cfroystad cfroystad requested a review from aryx June 22, 2022 08:32
Copy link
Contributor

@aryx aryx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@aryx aryx merged commit 6ff54a9 into tree-sitter:master Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants