Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider either reporting or accepting #noqa (without space) #5177

Closed
mairas opened this issue Jun 19, 2023 · 0 comments · Fixed by #5554
Closed

Consider either reporting or accepting #noqa (without space) #5177

mairas opened this issue Jun 19, 2023 · 0 comments · Fixed by #5554
Labels
bug Something isn't working suppression Related to supression of violations e.g. noqa

Comments

@mairas
Copy link

mairas commented Jun 19, 2023

I just tried to add a noqa comment on my Pydantic validator that doesn't get correctly introspected:

import pydantic


class Device(pydantic.BaseModel):
    serial: int

    @pydantic.validator("serial")
    def validate_serial(cls, v: int) -> int:  #noqa: N805
        return v

But as a typo, didn't add a space between the comment # mark and the noqa pragma. When running ruff check --fix, the noqa got ignored and ruff reported an error. black would fix these errors but it is run after ruff (due to --fix). Needless to say, this resulted in quite a bit of confused debugging. :-)

My first idea would be to implement a new rule that would report such errors. Or alternatively, ruff could accept noqa pragmas even without the preceding whitespace.

Ruff settings:

[tool.ruff] # https://github.com/charliermarsh/ruff
fix = true
ignore = [
  "B904",
  "B905",
  "BLE001",
  "E501",
  "ERA001",
  "N802",
  "N803",
  "PGH003",
  "PLR2004",
  "RET504",
  "S101",
  "S104",
]
ignore-init-module-imports = true
line-length = 88
select = [
  #"A",
  "B",
  "BLE",
  #"C4",
  #"C90",
  #"D",
  #"DTZ",
  "E",
  "ERA",
  "F",
  "G",
  "I",
  "INP",
  "ISC",
  "N",
  "NPY",
  "PGH",
  "PIE",
  "PLC",
  "PLE",
  "PLR",
  #"PLW",
  "PT",
  "PTH",
  "PYI",
  #"RET",
  "RSE",
  "RUF",
  "S",
  #"SIM",
  "T10",
  "T20",
  "TID",
  "UP",
  "W",
  "YTT",
]
target-version = "py310"
unfixable = ["ERA001", "F841", "T201", "T203"]

Ruff version: 0.0.265

@charliermarsh charliermarsh added question Asking for support or clarification rule Implementing or modifying a lint rule suppression Related to supression of violations e.g. noqa and removed rule Implementing or modifying a lint rule labels Jun 20, 2023
@charliermarsh charliermarsh added bug Something isn't working and removed question Asking for support or clarification labels Jul 4, 2023
charliermarsh added a commit that referenced this issue Jul 6, 2023
#5554)

## Summary

I'll write up a more detailed description tomorrow, but in short, this
PR removes our regex-based implementation in favor of "manual" parsing.

I tried a couple different implementations. In the benchmarks below:

- `Directive/Regex` is our implementation on `main`.
- `Directive/Find` just uses `text.find("noqa")`, which is insufficient,
since it doesn't cover case-insensitive variants like `NOQA`, and
doesn't handle multiple `noqa` matches in a single like, like ` # Here's
a noqa comment # noqa: F401`. But it's kind of a baseline.
- `Directive/Memchr` uses three `memchr` iterative finders (one for
`noqa`, `NOQA`, and `NoQA`).
- `Directive/AhoCorasick` is roughly the variant checked-in here.

The raw results:

```
Directive/Regex/# noqa: F401
                        time:   [273.69 ns 274.71 ns 276.03 ns]
                        change: [+1.4467% +1.8979% +2.4243%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) low mild
  8 (8.00%) high mild
  4 (4.00%) high severe
Directive/Find/# noqa: F401
                        time:   [66.972 ns 67.048 ns 67.132 ns]
                        change: [+2.8292% +2.9377% +3.0540%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  8 (8.00%) high mild
  3 (3.00%) high severe
Directive/AhoCorasick/# noqa: F401
                        time:   [76.922 ns 77.189 ns 77.536 ns]
                        change: [+0.4265% +0.6862% +0.9871%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
Directive/Memchr/# noqa: F401
                        time:   [62.627 ns 62.654 ns 62.679 ns]
                        change: [-0.1780% -0.0887% -0.0120%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe
Directive/Regex/# noqa: F401, F841
                        time:   [321.83 ns 322.39 ns 322.93 ns]
                        change: [+8602.4% +8623.5% +8644.5%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Directive/Find/# noqa: F401, F841
                        time:   [78.618 ns 78.758 ns 78.896 ns]
                        change: [+1.6909% +1.8771% +2.0628%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Directive/AhoCorasick/# noqa: F401, F841
                        time:   [87.739 ns 88.057 ns 88.468 ns]
                        change: [+0.1843% +0.4685% +0.7854%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Directive/Memchr/# noqa: F401, F841
                        time:   [80.674 ns 80.774 ns 80.860 ns]
                        change: [-0.7343% -0.5633% -0.4031%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) low severe
  9 (9.00%) low mild
  1 (1.00%) high mild
Directive/Regex/# noqa  time:   [194.86 ns 195.93 ns 196.97 ns]
                        change: [+11973% +12039% +12103%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) low mild
  1 (1.00%) high mild
Directive/Find/# noqa   time:   [25.327 ns 25.354 ns 25.383 ns]
                        change: [+3.8524% +4.0267% +4.1845%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe
Directive/AhoCorasick/# noqa
                        time:   [34.267 ns 34.368 ns 34.481 ns]
                        change: [+0.5646% +0.8505% +1.1281%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
Directive/Memchr/# noqa time:   [21.770 ns 21.818 ns 21.874 ns]
                        change: [-0.0990% +0.1464% +0.4046%] (p = 0.26 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe
Directive/Regex/# type: ignore # noqa: E501
                        time:   [278.76 ns 279.69 ns 280.72 ns]
                        change: [+7449.4% +7469.8% +7490.5%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Directive/Find/# type: ignore # noqa: E501
                        time:   [67.791 ns 67.976 ns 68.184 ns]
                        change: [+2.8321% +3.1735% +3.5418%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
Directive/AhoCorasick/# type: ignore # noqa: E501
                        time:   [75.908 ns 76.055 ns 76.210 ns]
                        change: [+0.9269% +1.1427% +1.3955%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Directive/Memchr/# type: ignore # noqa: E501
                        time:   [72.549 ns 72.723 ns 72.957 ns]
                        change: [+1.5881% +1.9660% +2.3974%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  10 (10.00%) high mild
  5 (5.00%) high severe
Directive/Regex/# type: ignore # nosec
                        time:   [66.967 ns 67.075 ns 67.207 ns]
                        change: [+1713.0% +1715.8% +1718.9%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe
Directive/Find/# type: ignore # nosec
                        time:   [18.505 ns 18.548 ns 18.597 ns]
                        change: [+1.3520% +1.6976% +2.0333%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
Directive/AhoCorasick/# type: ignore # nosec
                        time:   [16.162 ns 16.206 ns 16.252 ns]
                        change: [+1.2919% +1.5587% +1.8430%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
Directive/Memchr/# type: ignore # nosec
                        time:   [39.192 ns 39.233 ns 39.276 ns]
                        change: [+0.5164% +0.7456% +0.9790%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
Directive/Regex/# some very long comment that # is interspersed with characters but # no directive
                        time:   [81.460 ns 81.578 ns 81.703 ns]
                        change: [+2093.3% +2098.8% +2104.2%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
Directive/Find/# some very long comment that # is interspersed with characters but # no directive
                        time:   [26.284 ns 26.331 ns 26.387 ns]
                        change: [+0.7554% +1.1027% +1.3832%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
Directive/AhoCorasick/# some very long comment that # is interspersed with characters but # no direc...
                        time:   [28.643 ns 28.714 ns 28.787 ns]
                        change: [+1.3774% +1.6780% +2.0028%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Directive/Memchr/# some very long comment that # is interspersed with characters but # no directive
                        time:   [55.766 ns 55.831 ns 55.897 ns]
                        change: [+1.5802% +1.7476% +1.9021%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low mild
```

While memchr is faster than aho-corasick in some of the common cases
(like `# noqa: F401`), the latter is way, way faster when there _isn't_
a match (like 2x faster -- see the last two cases). Since most comments
_aren't_ `noqa` comments, this felt like the right tradeoff. Note that
all implementations are significantly faster than the regex version.

(I know I originally reported a 10x speedup, but I ended up improving
the regex version a bit in some prior PRs, so it got unintentionally
faster via some refactors.)

There's also one behavior change in here, which is that we now allow
variable spaces, e.g., `#noqa` or `# noqa`. Previously, we required
exactly one space. This thus closes #5177.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working suppression Related to supression of violations e.g. noqa
Projects
None yet
2 participants