Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

line:col positions in parser #8203

Merged
merged 8 commits into from
Nov 8, 2023
Merged

line:col positions in parser #8203

merged 8 commits into from
Nov 8, 2023

Conversation

kazcw
Copy link
Contributor

@kazcw kazcw commented Nov 1, 2023

Pull Request Description

Add line:column information to source code references produced by the parser. This information will be used by GUI2 as part of the solution to #8134.

Important Notes

  • parse_all_enso_files.sh has been used to ensure this doesn't affect tree structures.
  • parse_all_enso_files.sh now checks emitted locations for consistency, and has been used to verify that all line:col references match the values found by an independent scan of the source up to the given UTF8 position.

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • [] If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@kazcw kazcw added the CI: No changelog needed Do not require a changelog entry for this PR. label Nov 1, 2023
@kazcw kazcw self-assigned this Nov 1, 2023
@kazcw kazcw marked this pull request as ready for review November 1, 2023 14:32
@kazcw kazcw force-pushed the wip/kw/parser-line-col branch from 5acbcd2 to 5be9014 Compare November 1, 2023 14:39
Comment on lines 40 to 42
/// Offset from start of line, in Unicode characters.
#[reflect(hide)]
pub col: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "unicode characters" mean here exactly? The amount of utf16 code units, or codepoints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be UTF16 code units. Fixed.

Comment on lines 37 to 39
/// Offset from the first line.
#[reflect(hide)]
pub line: u32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "offset" is not clear enough here. Also it is not really clear what is a "line" really. In LSP, the lines are strictly defined to be separated by a numer of possible EOL characters. Therefore the file line ending style doesn't influence the locations at all. We should probably adapt exactly the same semantics. https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it to match LSP (which matches the rest of the language syntax).

@kazcw kazcw marked this pull request as draft November 1, 2023 16:13
@kazcw kazcw marked this pull request as ready for review November 1, 2023 16:15
Copy link
Contributor

@farmaazon farmaazon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check lexer.rs later today.

lib/rust/parser/debug/src/lib.rs Outdated Show resolved Hide resolved
":spanLeftOffsetCodeLenUtf8",
":spanLeftOffsetCodeLenUtf16",
":spanLeftOffsetCodeLenNewlines",
":spanLeftOffsetCodeLenLineChars16",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the node has multiple lines, this is the length of the last line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a line:col length contains the information necessary to reach an end line:col location from a start location.

utf16: u32,
pub repr: StrRef<'s>,
#[reflect(flatten)]
offset: Location,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename the field as well, as it's now more than just the offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were overusing "offset"; renamed it to "start".

Comment on lines 268 to 271
if c == '\n' {
newlines += 1;
line_chars16 = 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere else we say about interpreting CRLF, CR and LF as newlines, but here we consider only LF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and added testing.

@farmaazon farmaazon self-requested a review November 3, 2023 08:41
Comment on lines 1889 to 1890
//lex_and_validate_spans("Windows\r\n...");
//lex_and_validate_spans("Linux\n...");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it commented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Linux engine job is OK, but Mac and Windows are failing:

  • Linux - OK
  • Mac - probably line endings
  • Windows panic in native (probably Rust) code

@farmaazon
Copy link
Contributor

What is the status of this PR? Are the @Frizi comments addressed? Do we want to merge it or discard in favor of #8236 ?

@kazcw
Copy link
Contributor Author

kazcw commented Nov 8, 2023

I think it's an improvement regardless of #8236. In the future we might make use of it for a more edit-resilient metadata map format.

@Frizi Frizi added the CI: Ready to merge This PR is eligible for automatic merge label Nov 8, 2023
@mergify mergify bot merged commit ce04256 into develop Nov 8, 2023
33 of 35 checks passed
@mergify mergify bot deleted the wip/kw/parser-line-col branch November 8, 2023 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants