-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: report line and column on requirements parser errors #2100
feat: report line and column on requirements parser errors #2100
Conversation
crates/requirements-txt/src/lib.rs
Outdated
break; | ||
} | ||
// This should work fine for both Windows and Linux line endings | ||
if char == '\n' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here, if we see \r\n
, I think we technically want to avoid incrementing column
after we see the \r
. I can't see how this would matter in practice for this use-case, but e.g., you can see an example of how that's done here in Ruff: https://github.com/astral-sh/ruff/blob/c9931a548ff07c031130571f3664343bea224026/crates/ruff_source_file/src/line_index.rs#L29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we technically want to avoid incrementing column after we see the \r
Agreed, I left this as-is on purpose but I really debated to either keep this for simplicity or track the prev_char reference for these types of checks. Luckily it can be changed easily whichever route we want to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to change it just for completeness, in case this gets reused elsewhere. Are you ok to modify it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this is what you had in mind d0f7161
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was aiming for something more like: if we see \r
, check if the next character is \n
; if so, skip it. (That would correctly handle \r
, \n
, and \r\n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay (work 😆), hopefully this is closer 223fd99
crates/requirements-txt/src/lib.rs
Outdated
write!( | ||
f, | ||
"{message} in `{}` at position {location}", | ||
"{message} in `{}` at position {line}:{column}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make that format at <REQUIREMENTS_TXT>:<line>:<col>
? This format is support by many IDEs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in 223fd99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, mileage may vary, I noticed that for full-paths it does highlight on my IDE, but not relative paths like ./requirements.txt
. We could try to canonicalize always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using current_dir()
and joining the path on top of it is the best way, canonicalize could be a path outside the apparent project root (we used to canonicalize everything but rolled that back since it caused problems where software expected the "virtual" name of the link).
crates/requirements-txt/src/lib.rs
Outdated
let mut line = 1; | ||
let mut column = 1; | ||
|
||
for (index, char) in content.char_indices() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want byte indices rather than char indices here? CC @BurntSushi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char_indices
does actually return byte offsets. The index
is referring to the index in content
of where char
starts. For example, this code:
let content = "a💩b";
for (i, ch) in content.char_indices() {
eprintln!("{}:{}", i, ch);
}
Has this output:
0:a
1:💩
5:b
Since 💩
has a UTF-8 encoding that uses 4 bytes.
crates/requirements-txt/src/lib.rs
Outdated
let mut line = 1; | ||
let mut column = 1; | ||
|
||
for (index, char) in content.char_indices() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char_indices
does actually return byte offsets. The index
is referring to the index in content
of where char
starts. For example, this code:
let content = "a💩b";
for (i, ch) in content.char_indices() {
eprintln!("{}:{}", i, ch);
}
Has this output:
0:a
1:💩
5:b
Since 💩
has a UTF-8 encoding that uses 4 bytes.
crates/requirements-txt/src/lib.rs
Outdated
line += 1; | ||
column = 1; | ||
} else if char != '\r' { | ||
column += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest using the unicode-width
crate to compute the visual width of the codepoint via UnicodeWidthChar::width
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be using width
here. I would need to check again but what I remember is that editors use character offsets for column indices and not their width. Ruff also uses character offsets and not the width for diagnostics. What uv
output should match editors or shortcuts like go to position won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that we should do whatever editors are likely to use here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think char_indices is okay here, but I would like to tweak the newline handling slightly in line with the Ruff reference above.
crates/requirements-txt/src/lib.rs
Outdated
@@ -317,6 +317,27 @@ pub struct RequirementsTxt { | |||
pub no_index: bool, | |||
} | |||
|
|||
/// Calculates column and line based on the cursor and content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you use unicode-width
per my suggestion below, can you just add a quick note here documenting that we define column in this context as the, "offset according to the visual width of each codepoint."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If unicode-width
is not the right thing to use, then just adding, "offset according to the number of Unicode codepoints."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it as a comment in 223fd99
}); | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a couple of tests that include non-ASCII codepoints. More concretely, one test with a non-ASCII codepoint like, say, 💩
. And then another test with grapheme cluster made up of more than one codepoint like à̖
(that's a\u{0300}\u{0316}
as a string literal in Rust).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 223fd99, included one with two codepoints and your example one with three.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
} | ||
match char { | ||
'\r' => { | ||
// If the next character is a newline, skip it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweaked this a little because \r
on its own should be considered a newline (it's not used on (m)any modern platforms, but older macOS did use it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Np, I went a bit back and forth on that. Glad it's settled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the back-and-forth, I just have scars from Ruff where we didn't handle all the newline kinds and eventually hit panics in certain projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries, makes sense, and feedback is always welcomed 💯
Summary
Closes #2012
This changes
RequirementsTxtParserError::Parser
to take a line, column instead of cursor location to improve reporting of parser errors. A new function was added to compute the line and column based on the content and cursor location when a parser error occurs for simplicity.Given
uv pip compile .\requirements.txt
of belowBefore:
After:
Open Question: Do we want to support
line:column
for other types of errors? I didn't look dig other potential error types where this might be desired.Test Plan
New test was added to
requirements-txt
crate with this example.