Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --exclude-regex and --no-make-paths-absolute to exclude specific file paths #115

Merged
merged 10 commits into from
Oct 30, 2022

Conversation

cosmicexplorer
Copy link
Contributor

Problem

.pyi files are not executed by a python interpreter, so they will often have different python version constraints than python source files (indeed, this is the reason they exist). However, vermin considers .pyi files python source code, so it will fail .pyi files for things like import typing (which is the sole reason .pyi files are used) if the rest of the project must still be compatible with 2.7, for example.

While the existing --exclude option can avoid checking files given their module path, this fails when .pyi files are used, since .pyi files must have the same module path as the .py source file they provide types for. If the existing --exclude option is used to exclude a specific .pyi file, that also excludes the corresponding .py file from vermin analysis.

In order to use .pyi files with vermin in spack/spack#32919, I couldn't figure out anything more elegant than to temporarily rename all .pyi files before running vermin in our CI script: https://github.com/spack/spack/pull/32919/files#diff-200bb42ff2b106936781e2d2fd613af62e85d789323dd12eb370a1ce946fab44.

Solution

  • Introduce [--exclude-regex <regex pattern>] ... to match file or directory paths which will be excluded from vermin analysis.
  • Introduce --no-make-paths-absolute to allow --exclude-regex patterns to be relative to the directory vermin is invoked from.

Alternatives Considered

Result

If I want to add the first .pyi file to a project using vermin that requires 2.7 compatibility, I can simply modify the pyproject.toml to add:

[tool.vermin]
exclusion_regex = [
  '\.pyi$',
]

without having to change any command lines.

NOTE: tests are broken here because [!/] doesn't work the way I thought it would. The next commit
will change the glob option to use a regex.
- tests pass now
- interface more familiar for windows users
@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Oct 5, 2022

I ran make test check-all as per CONTRIBUTING.md and fixed everything that seemed related to my change, along with several pre-existing flake8 failures that seemed easy to fix. However, there were quite a few errors from pylint that I wasn't sure how to fix, didn't seem to be from this change, and would have increased the size of this diff significantly. Everything else passed, though.

@netromdk
Copy link
Owner

netromdk commented Oct 5, 2022

Thank you for the PR! I will look into it after work today. From a birds eye view, it does make sense what you want to achieve, though.

Regarding the preexisting fixes of linters, you probably didn't use the virtual environment installed versions. So they'd be too new, is my guess.

@netromdk
Copy link
Owner

netromdk commented Oct 5, 2022

Yes, you've used "too new" linters. Those fixes I have in certain branches for version 1.6 release, while 1.5 release (the next one) will not yet include those. If you could run make setup and be sure to choose the environment before running tests and checks, like source .venv/bin/activate.

@cosmicexplorer
Copy link
Contributor Author

Looks like I didn't read CONTRIBUTING.md close enough; thanks so much for debugging that for me! I'll do that and make sure it passes tests.

@cosmicexplorer cosmicexplorer force-pushed the exclude-glob branch 2 times, most recently from 6cdedfc to da8b2ba Compare October 5, 2022 05:24
Add pylint overrides for tests which access the private memoized master regex. I think these tests
are necessary if we think memoizing the regex is a useful optimization, but this may well be
premature optimization.
@cosmicexplorer
Copy link
Contributor Author

Using the venv from the instructions in CONTRIBUTING.md worked immediately!

@cosmicexplorer
Copy link
Contributor Author

The test failure (https://github.com/netromdk/vermin/actions/runs/3187499365/jobs/5199138152) appears to be spurious:

Submitting coverage to coveralls.io...
Could not submit coverage: 422 Client Error: Unprocessable Entity for url: https://coveralls.io/api/v1/jobs
resubmitting with id None
Traceback (most recent call last):
  File "/home/runner/work/vermin/vermin/.venv/lib/python3.7/site-packages/coveralls/api.py", line 290, in submit_report
    response.raise_for_status()
  File "/home/runner/work/vermin/vermin/.venv/lib/python3.7/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://coveralls.io/api/v1/jobs

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/vermin/vermin/.venv/lib/python3.7/site-packages/coveralls/cli.py", line 95, in main
    result = coverallz.wear()
  File "/home/runner/work/vermin/vermin/.venv/lib/python3.7/site-packages/coveralls/api.py", line 257, in wear
    return self.submit_report(json_string)
  File "/home/runner/work/vermin/vermin/.venv/lib/python3.7/site-packages/coveralls/api.py", line 294, in submit_report
    'Could not submit coverage: {}'.format(e)) from e
coveralls.exception.CoverallsException: Could not submit coverage: 422 Client Error: Unprocessable Entity for url: https://coveralls.io/api/v1/jobs
Error: Process completed with exit code 1.

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Oct 5, 2022

I want to highlight the final commit in this PR: 27302a3. It removes the straightforward loop over regex matches and replaces it with a memoized regex created by joining the individual --exclude-regex entries together. I think this would be a good idea if a user reports any performance concerns when using multiple --exclude-regex on a very large number of files, but as you can see in that commit it also requires additional testing. It should have the exact same behavior as looping over regex matches, but it might have different behavior in some edge case I haven't tested.

I'm not sure what approach is best for this project. I think unless I can show a benchmark demonstrating a speedup with the memoized master regex in 27302a3, then we should keep it; otherwise, we can revert it to make this change a bit easier to review.

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Oct 5, 2022

Ok, when I create a directory with 1,000,000 files and run vermin with 10 separate --exclude-regex arguments, it produces a speedup, but I don't think the complexity is worth it, so I'm going to revert the memoized joined regex optimization.

The optimization improved the runtime of scanning 1 million files from 3.202 seconds to 2.654 seconds, a speedup of 0.548 seconds ~= 17.1%. However, scanning file paths appears to already be well optimized, and I strongly suspect that actually executing vermin over all those files would take much more time than scanning file paths, so I would much prefer to avoid doing fancy regex tricks to avoid having to fix bugs for weird edge cases when joining regexps.

My benchmark was (while on 27302a3):

; mkdir tmp/
; pushd tmp
; seq 1000000 | parallel -n200 -j16 touch {}
; popd
; time ./vermin.py -vvvv -t=2.7 -t=3.5- --no-make-paths-absolute --exclude-regex '1' --exclude-regex '2' --exclude-regex '3' --exclude-regex '4' --exclude-regex '5' --exclude-regex '6' --exclude-regex '7' --exclude-regex '8' --exclude-regex '9' --exclude-regex '0' tmp/
Detecting python files..
No files specified to analyze!
./vermin.py -vvvv -t=2.7 -t=3.5- --no-make-paths-absolute --exclude-regex '1'  2.38s user 0.33s system 102% cpu 2.654 total
; git checkout HEAD~1
; time ./vermin.py -vvvv -t=2.7 -t=3.5- --no-make-paths-absolute --exclude-regex '1' --exclude-regex '2' --exclude-regex '3' --exclude-regex '4' --exclude-regex '5' --exclude-regex '6' --exclude-regex '7' --exclude-regex '8' --exclude-regex '9' --exclude-regex '0' tmp/
Detecting python files..
No files specified to analyze!
./vermin.py -vvvv -t=2.7 -t=3.5- --no-make-paths-absolute --exclude-regex '1'  2.93s user 0.33s system 101% cpu 3.202 total

@netromdk
Copy link
Owner

netromdk commented Oct 5, 2022

I can see the need to solve this issue with .pyi files.

In order to use .pyi files with vermin in spack/spack#32919, I couldn't figure out anything more elegant than to temporarily rename all .pyi files before running vermin in our CI script: https://github.com/spack/spack/pull/32919/files#diff-200bb42ff2b106936781e2d2fd613af62e85d789323dd12eb370a1ce946fab44.

This is definitely not a nice solution, I agree!

Copy link
Owner

@netromdk netromdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great stuff! 💪🏻

Only a few small tweaks to be done.

The additions to vermin/config.py should also be added to the sample.vermin.ini, and be added as test cases, too, in tests/config.py like testing parsing of exclusion regexes and make paths absolute.

vermin/arguments.py Show resolved Hide resolved
vermin/arguments.py Outdated Show resolved Hide resolved
vermin/arguments.py Outdated Show resolved Hide resolved
tests/general.py Outdated Show resolved Hide resolved
@netromdk
Copy link
Owner

netromdk commented Oct 5, 2022

The optimization improved the runtime of scanning 1 million files from 3.202 seconds to 2.654 seconds, a speedup of 0.548 seconds ~= 17.1%. However, scanning file paths appears to already be well optimized, and I strongly suspect that actually executing vermin over all those files would take much more time than scanning file paths, so I would much prefer to avoid doing fancy regex tricks to avoid having to fix bugs for weird edge cases when joining regexps.

It's a nice optimization. We might do that later but, as you said, if you detect 1 million files, the analysis of them will be a lot more than the optimized-away 0.5 seconds.

@netromdk
Copy link
Owner

netromdk commented Oct 5, 2022

The test failure (https://github.com/netromdk/vermin/actions/runs/3187499365/jobs/5199138152) appears to be spurious

Yeah, I think so. I've never had that myself so it's a little weird. Maybe it'll go away soon.

@cosmicexplorer
Copy link
Contributor Author

This is really great stuff! 💪🏻

Thanks so much!! Vermin is a really really useful tool :D

The additions to vermin/config.py should also be added to the sample.vermin.ini, and be added as test cases, too, in tests/config.py like testing parsing of exclusion regexes and make paths absolute.

I will add this now! Please also take a look at my response to your comment about the --help text: #115 (comment)

No rush to reply!

- also add 'yes'/'no' test cases for other boolean config flags
netromdk
netromdk previously approved these changes Oct 7, 2022
Copy link
Owner

@netromdk netromdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! And also nice you added the extra cases with yes and no. 🤝

@netromdk
Copy link
Owner

netromdk commented Oct 7, 2022

Oh, right. Wrt. errors like https://github.com/netromdk/vermin/actions/runs/3206864588/jobs/5242230923#step:7:45

Instead of testing for the compiled patterns, maybe we check the patterns before compiling them instead?

Like perhaps:

--- a/tests/config.py
+++ b/tests/config.py
@@ -345,21 +345,22 @@ exclusion_regex =
 """, []],
     [u"""[vermin]
 exclusion_regex = \\.pyi$
-""", [re.compile(r"\.pyi$")]],
+""", [r"\.pyi$"]],
     [u"""[vermin]
 exclusion_regex = \\.pyi$
   ^a/b$
-""", [re.compile(r"\.pyi$"), re.compile(r"^a/b$")]],
+""", [r"\.pyi$", r"^a/b$"]],
     [u"""[vermin]
 exclusion_regex =
   ^a/b$
   \\.pyi$
-""", [re.compile(r"\.pyi$"), re.compile(r"^a/b$")]],
+""", [r"\.pyi$", r"^a/b$"]],
   ])
   def test_parse_exclusion_regex(self, data, expected):
     config = Config.parse_data(data)
     self.assertIsNotNone(config)
-    self.assertEqual(config.exclusion_regex(), expected)
+    patterns = [regex.pattern for regex in config.exclusion_regex()]
+    self.assertEqual(patterns, expected)

@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

I don't know the coverage API keeps failing all of a sudden. I'm going to test it myself and then merge when they succeed.

@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

There we go:

Have to handle paths on Windows with \\ instead of /: https://github.com/netromdk/vermin/actions/runs/3209301979
Note that if you need it I have the decorator skipUnlessPlatform(platform) that can be used for only running on Windows or you can simply make the pattern locally, in the tests, using the platform.

@cosmicexplorer
Copy link
Contributor Author

I think I fixed it in 661738f! Please let me know if you'd prefer me to use os.path.join() over os.path.sep.

@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

Thanks. That's fine but there's a problem:

 FAIL: test_exclude_regex_relative (tests.general.VerminGeneralTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\vermin\vermin\tests\general.py", line 382, in test_exclude_regex_relative
    self.assertEqual(paths, ["a{0}code.py".format(os.path.sep)])
AssertionError: Lists differ: [] != ['a\\code.py']

https://github.com/netromdk/vermin/actions/runs/3209342269/jobs/5245969385#step:7:71

@cosmicexplorer
Copy link
Contributor Author

I tried to fix it using a combination of os.path.join() and re.escape() -- we'll see if that succeeds.

@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

Sorry, now it seems to be getting close: https://github.com/netromdk/vermin/actions/runs/3209386859/jobs/5246055387#step:7:71
Weird that the pyi is included now:

AssertionError: Lists differ: ['a\\code.py', 'a\\code.pyi'] != ['a\\code.py']

@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

The problem is that "^{0}$".format(re.escape("a\\b")) results in re.compile('^a\\\\b$') and not re.compile('^a\\b$')

@cosmicexplorer cosmicexplorer force-pushed the exclude-glob branch 2 times, most recently from 0ffbc5f to 4b5ba31 Compare October 8, 2022 06:24
@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

There's a stray 'x':

self.config.add_exclusion_regex("^a{0}.+pyi$".format(os.path.sep)x)

@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

Okay, same: https://github.com/netromdk/vermin/actions/runs/3209473950/jobs/5246216990#step:7:71

Need another approach here. The problem is something else.

@cosmicexplorer
Copy link
Contributor Author

I agree! May have to call it quits for tonight, it's quite late here...thanks so much for your prompt attempts to help! I can finish this up this weekend.

@netromdk
Copy link
Owner

netromdk commented Oct 8, 2022

No worries. Thank you for all the hard work! :)
I have to get going soon as well. I think the next step is actually testing comparisons with some Windows paths. Doesn't have to happen on Windows but just creating some in string form and comparing with compiled regex patterns.

@netromdk
Copy link
Owner

hey @cosmicexplorer . should we try to get this one done? :)

@cosmicexplorer
Copy link
Contributor Author

Hello @netromdk: yes absolutely, and thanks so much for continuing to follow up here! I had a very eventful couple weeks but will page back in now. Thanks again for the ping.

@cosmicexplorer
Copy link
Contributor Author

Ok, I finally have a repro for the test failure on a google cloud windows vm! Will definitely be able to figure this out quickly now.

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Oct 28, 2022

Ok, using re.escape(os.path.sep) correctly constructs the regex for windows!

I was worried that this would somehow mean windows users would have to add extra backslashes to their command lines, but I just tried it and there's no issue! See:

PS C:\Users\danieldmcclanahan\vermin> mkdir tmp
PS C:\Users\danieldmcclanahan\vermin> cd tmp
PS C:\Users\danieldmcclanahan\vermin> echo "print('this is code')" > wow.py
PS C:\Users\danieldmcclanahan\vermin> mkdir a
PS C:\Users\danieldmcclanahan\vermin> echo "print('this is code')" > wow.py
PS C:\Users\danieldmcclanahan\vermin> cd ../..
PS C:\Users\danieldmcclanahan\vermin> cmd /c .\vermin.py -vvv -t=2.7 -t=3.5- tmp
Detecting python files..
Analyzing 2 files using 16 processes..
...
(without any exclusions, it detects 2 files)
PS C:\Users\danieldmcclanahan\vermin> cmd /c .\vermin.py -vvv -t=2.7 -t=3.5- --no-make-paths-absolute --exclude-regex "^^tmp\\a\\[^^\\]+\.py$" tmp
Detecting python files..
Analyzing using 16 processes..
...
(this output on windows means it only sees one file)
PS C:\Users\danieldmcclanahan\vermin> cmd /c .\vermin.py -vvv -t=2.7 -t=3.5- --no-make-paths-absolute --exclude-regex "^^tmp\\.+\.py$" tmp
Detecting python files..
No files specified to analyze!

I was thinking about adding additional help examples for windows users, but since they don't have to do anything special except escape the \ and ^ chars for powershell, I think our examples with --help using / as a directory separator are sufficient!

Copy link
Owner

@netromdk netromdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thank you, @cosmicexplorer :)

@netromdk netromdk merged commit 1acef9e into netromdk:master Oct 30, 2022
@netromdk
Copy link
Owner

There was some weirdness to the tests, as you know, but I've verified it for myself and your changes are merged!

Thanks again! Since Python 3.11 has been released I'm going to release Vermin 1.5 very soon now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants