feat: report line and column on requirements parser errors #2100

samypr100 · 2024-03-01T03:32:35Z

Summary

Closes #2012

This changes RequirementsTxtParserError::Parser to take a line, column instead of cursor location to improve reporting of parser errors. A new function was added to compute the line and column based on the content and cursor location when a parser error occurs for simplicity.

Given uv pip compile .\requirements.txt of below

numpy>=1,<2
  --borken
tqdm

Before:

error: Unexpected '-', expected '-c', '-e', '-r' or the start of a requirement in `.\requirements.txt` at position 14

After:

error: Unexpected '-', expected '-c', '-e', '-r' or the start of a requirement in `.\requirements.txt` at position 2:3

Open Question: Do we want to support line:column for other types of errors? I didn't look dig other potential error types where this might be desired.

Test Plan

New test was added to requirements-txt crate with this example.

…rors

charliermarsh · 2024-03-01T03:43:11Z

crates/requirements-txt/src/lib.rs

+            break;
+        }
+        // This should work fine for both Windows and Linux line endings
+        if char == '\n' {


So here, if we see \r\n, I think we technically want to avoid incrementing column after we see the \r. I can't see how this would matter in practice for this use-case, but e.g., you can see an example of how that's done here in Ruff: https://github.com/astral-sh/ruff/blob/c9931a548ff07c031130571f3664343bea224026/crates/ruff_source_file/src/line_index.rs#L29

I think we technically want to avoid incrementing column after we see the \r

Agreed, I left this as-is on purpose but I really debated to either keep this for simplicity or track the prev_char reference for these types of checks. Luckily it can be changed easily whichever route we want to go.

I'd prefer to change it just for completeness, in case this gets reused elsewhere. Are you ok to modify it?

Let me know if this is what you had in mind d0f7161

I was aiming for something more like: if we see \r, check if the next character is \n; if so, skip it. (That would correctly handle \r, \n, and \r\n.

Sorry for the delay (work 😆), hopefully this is closer 223fd99

konstin · 2024-03-01T10:43:04Z

crates/requirements-txt/src/lib.rs

                write!(
                    f,
-                    "{message} in `{}` at position {location}",
+                    "{message} in `{}` at position {line}:{column}",


Could you make that format at <REQUIREMENTS_TXT>:<line>:<col>? This format is support by many IDEs

Changed in 223fd99

Note, mileage may vary, I noticed that for full-paths it does highlight on my IDE, but not relative paths like ./requirements.txt. We could try to canonicalize always?

I think using current_dir() and joining the path on top of it is the best way, canonicalize could be a path outside the apparent project root (we used to canonicalize everything but rolled that back since it caused problems where software expected the "virtual" name of the link).

konstin · 2024-03-01T10:46:02Z

crates/requirements-txt/src/lib.rs

+    let mut line = 1;
+    let mut column = 1;
+
+    for (index, char) in content.char_indices() {


I think we want byte indices rather than char indices here? CC @BurntSushi

char_indices does actually return byte offsets. The index is referring to the index in content of where char starts. For example, this code:

let content = "a💩b"; for (i, ch) in content.char_indices() { eprintln!("{}:{}", i, ch); }

Has this output:

0:a 1:💩 5:b

Since 💩 has a UTF-8 encoding that uses 4 bytes.

BurntSushi · 2024-03-01T11:32:19Z

crates/requirements-txt/src/lib.rs

+    let mut line = 1;
+    let mut column = 1;
+
+    for (index, char) in content.char_indices() {


char_indices does actually return byte offsets. The index is referring to the index in content of where char starts. For example, this code:

let content = "a💩b"; for (i, ch) in content.char_indices() { eprintln!("{}:{}", i, ch); }

Has this output:

0:a 1:💩 5:b

Since 💩 has a UTF-8 encoding that uses 4 bytes.

BurntSushi · 2024-03-01T11:38:07Z

crates/requirements-txt/src/lib.rs

+            line += 1;
+            column = 1;
+        } else if char != '\r' {
+            column += 1;


I would suggest using the unicode-width crate to compute the visual width of the codepoint via UnicodeWidthChar::width.

I don't think we should be using width here. I would need to check again but what I remember is that editors use character offsets for column indices and not their width. Ruff also uses character offsets and not the width for diagnostics. What uv output should match editors or shortcuts like go to position won't work.

Yeah I agree that we should do whatever editors are likely to use here.

Yeah, I think char_indices is okay here, but I would like to tweak the newline handling slightly in line with the Ruff reference above.

BurntSushi · 2024-03-01T11:39:12Z

crates/requirements-txt/src/lib.rs

@@ -317,6 +317,27 @@ pub struct RequirementsTxt {
    pub no_index: bool,
 }

+/// Calculates column and line based on the cursor and content.


Assuming you use unicode-width per my suggestion below, can you just add a quick note here documenting that we define column in this context as the, "offset according to the visual width of each codepoint."

If unicode-width is not the right thing to use, then just adding, "offset according to the number of Unicode codepoints."

I left it as a comment in 223fd99

BurntSushi · 2024-03-01T11:40:52Z

crates/requirements-txt/src/lib.rs

+        });
+
+        Ok(())
+    }


Could you add a couple of tests that include non-ASCII codepoints. More concretely, one test with a non-ASCII codepoint like, say, 💩. And then another test with grapheme cluster made up of more than one codepoint like à̖ (that's a\u{0300}\u{0316} as a string literal in Rust).

Done in 223fd99, included one with two codepoints and your example one with three.

charliermarsh

Awesome, thanks!

charliermarsh · 2024-03-01T21:40:52Z

crates/requirements-txt/src/lib.rs

+        }
+        match char {
+            '\r' => {
+                // If the next character is a newline, skip it.


Tweaked this a little because \r on its own should be considered a newline (it's not used on (m)any modern platforms, but older macOS did use it).

Np, I went a bit back and forth on that. Glad it's settled

Sorry for all the back-and-forth, I just have scars from Ruff where we didn't handle all the newline kinds and eventually hit panics in certain projects.

no worries, makes sense, and feedback is always welcomed 💯

feat: report line and column on RequirementsTxtParserError::Parser er…

20b85f2

…rors

samypr100 marked this pull request as ready for review March 1, 2024 03:38

charliermarsh reviewed Mar 1, 2024

View reviewed changes

fixup: stop column count on \r

d0f7161

konstin approved these changes Mar 1, 2024

View reviewed changes

BurntSushi reviewed Mar 1, 2024

View reviewed changes

samypr100 and others added 5 commits March 1, 2024 13:43

fixup: impl follow-up suggestions

223fd99

clippy clipping

d742401

Treat r as a newline

b80d83c

Merge branch 'main' into requirements-parsing-line-column

5f33e06

Format

88bf1d8

charliermarsh approved these changes Mar 1, 2024

View reviewed changes

charliermarsh reviewed Mar 1, 2024

View reviewed changes

Revert back to calculate_

e47ca80

charliermarsh added the error messages Messaging when something goes wrong label Mar 1, 2024

charliermarsh enabled auto-merge (squash) March 1, 2024 21:41

charliermarsh merged commit c7c3aff into astral-sh:main Mar 1, 2024
7 checks passed

BrewTestBot mentioned this pull request Mar 4, 2024

uv 0.1.14 Homebrew/homebrew-core#165011

Merged

samypr100 deleted the requirements-parsing-line-column branch March 11, 2024 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: report line and column on requirements parser errors #2100

feat: report line and column on requirements parser errors #2100

samypr100 commented Mar 1, 2024 •

edited

Loading

charliermarsh Mar 1, 2024

samypr100 Mar 1, 2024 •

edited

Loading

charliermarsh Mar 1, 2024

samypr100 Mar 1, 2024

charliermarsh Mar 1, 2024

samypr100 Mar 1, 2024

konstin Mar 1, 2024

samypr100 Mar 1, 2024

samypr100 Mar 1, 2024

konstin Mar 1, 2024

konstin Mar 1, 2024

BurntSushi Mar 1, 2024

BurntSushi Mar 1, 2024

BurntSushi Mar 1, 2024

MichaReiser Mar 1, 2024 •

edited

Loading

BurntSushi Mar 1, 2024

charliermarsh Mar 1, 2024

BurntSushi Mar 1, 2024

BurntSushi Mar 1, 2024

samypr100 Mar 1, 2024

BurntSushi Mar 1, 2024

samypr100 Mar 1, 2024 •

edited

Loading

charliermarsh left a comment

charliermarsh Mar 1, 2024

samypr100 Mar 1, 2024

charliermarsh Mar 1, 2024

samypr100 Mar 1, 2024

+                      });
+                      Ok(())
+                  }

feat: report line and column on requirements parser errors #2100

feat: report line and column on requirements parser errors #2100

Conversation

samypr100 commented Mar 1, 2024 • edited Loading

Summary

Test Plan

Choose a reason for hiding this comment

samypr100 Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samypr100 Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

charliermarsh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samypr100 commented Mar 1, 2024 •

edited

Loading

samypr100 Mar 1, 2024 •

edited

Loading

MichaReiser Mar 1, 2024 •

edited

Loading

samypr100 Mar 1, 2024 •

edited

Loading