Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve coverage of argparse module #103558

Closed
sobolevn opened this issue Apr 15, 2023 · 12 comments
Closed

Improve coverage of argparse module #103558

sobolevn opened this issue Apr 15, 2023 · 12 comments
Labels
tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@sobolevn
Copy link
Member

sobolevn commented Apr 15, 2023

Since none of currently active core-devs have deep expertise in argparse, but we still need to fix bugs in this module. The best way to start is to add more tests and increase the coverage of this module.

Right now it has a pretty solid coverage of 97%
Снимок экрана 2023-04-15 в 13 15 01

You can get this yourself:

# compile python, create and activate venv, then:
./python.exe -m pip install coverage
coverage run --pylib --branch --source=argparse -m test -v test_argparse
coverage html

Here's the initial (current) state report. argparse_coverage.tar.gz

Any help is welcome :)

Linked PRs

@sobolevn sobolevn added type-bug An unexpected behavior, bug, or error tests Tests in the Lib/test dir labels Apr 15, 2023
@gaogaotiantian
Copy link
Member

I'll work on this.

@AlexWaygood
Copy link
Member

I think that for something like argparse, "idiom coverage" is more important than line coverage or branch coverage. By "idiom coverage", I mean "Are there any reasonably common uses of argparse that aren't currently tested in the test suite?"

The reason why it's currently so difficult to make changes to argparse is that many users have found ways of using the module that weren't anticipated when the code was written, and have grown to assume that these uses are supported because of the fact that the code has remained unchanged for so long. (Obligatory reference to Hyrum's Law.) Seemingly innocuous bugfixes to argparse often end up breaking these "unintended APIs", causing a lot of pain for users and making development of the module difficult.

Unfortunately, "idiom coverage" is quite difficult to measure compared to line coverage or branch coverage. One possible way of working on improving this might be to look at popular PyPI projects that use argparse, and see if any of them use the module in unusual/advanced ways that aren't currently exercised in the argparse test suite.

@sobolevn
Copy link
Member Author

@AlexWaygood this is the next step :)

@gaogaotiantian
Copy link
Member

gaogaotiantian commented Apr 16, 2023

From my experience working on this, "idiom coverage" without proper tools would be extremely difficult. argparse has like 3000+ lines and the current test has 5000+. None of the original author is actively working on this (as far as I know). The module is not easy to follow and it takes a lot of time to figure out what's a piece of code is doing.

To find gaps between the current PyPI project usage and our test suite, we need to know:

  • What feature/behavior they depend on
  • What feature/behavior we have tested

Neither is trivial to know. Do we manually understand them using our human brains? That seems overwhelming. Their test coverage could be poor - the actual corner case is highly possibly not tested. No one knows exactly we have tested.

So if we want to do this, we need advanced tools. Line coverage and branch coverage is not golden but it's something that we can rely on, more importantly, it's feasible. We can do another step forward - stack coverage. Record all the appeared stack states (inside argparse only) for the PyPI projects and our tests and compare the difference. Another step forward would be code block coverage - an even smaller unit compared to stack.

This method would still depend on the fact that the PyPI projects have decent tests, but combining a lot of them, we should have a pretty good coverage (compared to reading the docs and interpreting them).

The only problem is - we don't have the tool yet.

@AlexWaygood
Copy link
Member

To be clear, I think working on improving line coverage and branch coverage is very useful! And I completely agree that "idiom coverage" is much harder. It's fine to work on line/branch coverage as a first step. But as the stats show, argparse's line coverage is already very high -- and from the history of bug reports relating to this module, I think "idiom coverage" is really what's lacking here.

@gaogaotiantian
Copy link
Member

About the stack coverage I mentioned above. I did a really quick prototype on it. I ran our current test suites, then I tried a recent issue #103498. The results shows that this stack(among with some others) appeared in the code given in the prototype, but not in our test suite:

['parse_args:1869', 'parse_known_args:1906', '_parse_known_args:2142']

And that is exactly where the problem is - we never tested the combination of self.exit_on_error == False + required_actions == True. We tested them separately, but not together.

With a larger test base from other projects, we will have a much better idea of the stacks that we have never tested before, we can even do a frequency label on the uncovered stack to check if there's a common combination that we missed.

Of course, this is super hacky now and not completely ready to use even not for production, but I guess at least this shows a possibility - even for other complicated and heavily used libraries.

@sobolevn
Copy link
Member Author

sobolevn commented Apr 17, 2023

Found use-cases to cover

(Feel free to ping me, so I will update the list)

  • argparse.REMAINDER with different types of arguments pos, --flags, etc (legacy, ref)

@gaogaotiantian
Copy link
Member

gaogaotiantian commented Apr 17, 2023

Actually, I realized that we don't even need the outside tests to find missing stacks.

Our code coverage is high, which means almost all the lines are covered somehow. So we have a local calling relation - say f():123 can call into g():234. We used those relations that are already knew with our current test cases, and we analyze stacks - basically we build a tree with all "possible call relations", then we check missing paths.

For example, with a small prototype, I realized ('ArgumentParser.parse_args:1878', 'ArgumentParser.parse_known_args:1915') has a lot of missing paths(the line number is not main because of the changes). Going back there it's easy to realize it's because we did not test self.exit_on_error == False nearly as much as self.exit_on_error == True. So even though self._parse_known_args is covered pretty well, it's almost only executed under the case self.exit_on_error == True, and that's a missing piece.

        if self.exit_on_error:
            try:
                namespace, args = self._parse_known_args(args, namespace)
            except ArgumentError as err:
                self.error(str(err))
        else:
            namespace, args = self._parse_known_args(args, namespace)

Also found some false alarms :)

It's not perfect and it still requires some human analysis, but I think it's at least a tool to move forward, as compared to we just try to look at 5000+ lines of testing code and figure out what's missing.

@hpaulj
Copy link

hpaulj commented Apr 18, 2023

exit_on_error is a relatively recent addition. I can think of many (or at least some) where patches like this have unintended consequences. The default case may pass all the unit tests, but we (devs and others who follow the issues), don't test the new cases nearly as well as we should. And that is basically a function of how much time and interest we devote to proposed changes.

In the case of exit_on_error, placing its test in parse_known_args is a clean implementation. catching all ArgumentError cases. But that doesn't catch all the calls to self.error. An ArgumentError is raised only when the error is linked to one specific Action. In _parse_known_args there are two required cases that call self.error directly. And in parse_args, unrecognized arguments trigger a error call. Maybe exit_on_error should have been implemented in parser.error, as opposed to trying to catch all calls to that method.

allow_abbrev is another buggy addition.

Changing how required actions was tested and reported let subparsers fall through the cracks. We had to add the required option, and correct its error reporting.

I don't know if correcting coverage as proposed here will cover these kinds of problems.

@hugovk
Copy link
Member

hugovk commented Nov 10, 2023

Thanks!

@hugovk hugovk closed this as completed Nov 10, 2023
@github-project-automation github-project-automation bot moved this from Bugs to Doc issues in Argparse issues Nov 10, 2023
@AlexWaygood
Copy link
Member

We tackled the low-hanging fruit here, but AFAIK we never got round to any of the more advanced ways of analysing coverage, as floated in #103558 (comment). #103558 (comment) still has an unchecked box, as well. So maybe there's still some stuff to do here.

Having said that, I don't know if @sobolevn or @gaogaotiantian are still interested in working on this (I probably won't be ;). If neither has any plans to work on this, we may as well leave this closed for now.

@gaogaotiantian
Copy link
Member

I believe the major issue with argparse is that all the bugs are features now. There are a lot of reports about unexpected behaviors of it, but we can't do anything because it won't be backward compatible. This affects increasing the coverage as well - we don't know whether to test the "expected behavior" or "current behavior". That being said, I'm not currently working on this so I'm totally okay for closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
Projects
Status: Doc issues
Development

No branches or pull requests

5 participants