-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
line:col positions in parser #8203
Conversation
5acbcd2
to
5be9014
Compare
lib/rust/parser/src/source/code.rs
Outdated
/// Offset from start of line, in Unicode characters. | ||
#[reflect(hide)] | ||
pub col: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "unicode characters" mean here exactly? The amount of utf16 code units, or codepoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be UTF16 code units. Fixed.
lib/rust/parser/src/source/code.rs
Outdated
/// Offset from the first line. | ||
#[reflect(hide)] | ||
pub line: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "offset" is not clear enough here. Also it is not really clear what is a "line" really. In LSP, the lines are strictly defined to be separated by a numer of possible EOL characters. Therefore the file line ending style doesn't influence the locations at all. We should probably adapt exactly the same semantics. https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it to match LSP (which matches the rest of the language syntax).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check lexer.rs later today.
":spanLeftOffsetCodeLenUtf8", | ||
":spanLeftOffsetCodeLenUtf16", | ||
":spanLeftOffsetCodeLenNewlines", | ||
":spanLeftOffsetCodeLenLineChars16", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the node has multiple lines, this is the length of the last line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a line:col length contains the information necessary to reach an end line:col location from a start location.
lib/rust/parser/src/source/code.rs
Outdated
utf16: u32, | ||
pub repr: StrRef<'s>, | ||
#[reflect(flatten)] | ||
offset: Location, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename the field as well, as it's now more than just the offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were overusing "offset"; renamed it to "start".
lib/rust/parser/src/source/code.rs
Outdated
if c == '\n' { | ||
newlines += 1; | ||
line_chars16 = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere else we say about interpreting CRLF, CR and LF as newlines, but here we consider only LF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and added testing.
lib/rust/parser/src/lexer.rs
Outdated
//lex_and_validate_spans("Windows\r\n..."); | ||
//lex_and_validate_spans("Linux\n..."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it commented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's an improvement regardless of #8236. In the future we might make use of it for a more edit-resilient metadata map format. |
Pull Request Description
Add
line:column
information to source code references produced by the parser. This information will be used by GUI2 as part of the solution to #8134.Important Notes
parse_all_enso_files.sh
has been used to ensure this doesn't affect tree structures.parse_all_enso_files.sh
now checks emitted locations for consistency, and has been used to verify that all line:col references match the values found by an independent scan of the source up to the given UTF8 position.Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
If GUI codebase was changed, the GUI was tested when built using./run ide build
.