Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start moving to pytest #1723

Closed
wants to merge 20 commits into from
Closed

Start moving to pytest #1723

wants to merge 20 commits into from

Conversation

refi64
Copy link
Contributor

@refi64 refi64 commented Jun 18, 2016

Ref. #1673.

Sorry this took so long! I was busy (ACT)/distracted (E3)/everything else...

In addition, I ended up moving over the command-line tests and semanal tests as well as the check tests, since it was pretty trivial.

Also, I noticed the comment in runtests.py saying that the command-line tests took longer than regular unit tests. Why is that? They only take 7s, vs around 45s for everything else. Not a significant difference.

Unfortunately, I can't actually try this in parallel because I think I keep hitting pytest-dev/pytest#596...

@refi64
Copy link
Contributor Author

refi64 commented Jun 18, 2016

Actually, I got xdist to work (see pytest-dev/pytest-xdist#72). Finished all the tests in 32 seconds across 3 cores!

@gnprice
Copy link
Collaborator

gnprice commented Jun 20, 2016

Awesome, thanks! Really looking forward to completing this move and also to running the unit tests in parallel.

One thing that's useful both in the final commit message (for people reading the change later to understand it) and in the process of code review (for reviewers to understand the changes) is to write up in one place a description of everything that's changing and why. If you look at recent commits, most of them that aren't trivial changes are good examples; in particular see these by Guido, me, David, and Reid.

The way these work logistically in our GitHub workflow is that when one of us is ready to hit the "Merge pull request" button, we'll get a textarea right there for the final commit message. It'll be pre-filled with all your commit messages from the branch, and typically we'll take the relevant parts of that and also copy in relevant paragraphs from your PR message and any comments. So the message when first opening the PR is a great place to put this description, and when you've already opened a PR a comment is a fine place to add it. Would you write one now?

I'm also just about to read through the code.


$ ./runtests.py pytest -t -k -t NestedListAssignment
$ py.test -k NestedListAssignment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"To run these, you must use pytest instead", sounds like these tests might not run if you just say ./runtests.py. They do, right?

Maybe a clearer description here would be something like "To run these unit tests individually, you can run py.test directly, or pass inferior arguments with -t." (I think you mean -t rather than -k -- is that right or am I misreading the example?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@gnprice
Copy link
Collaborator

gnprice commented Jun 20, 2016

(Going afk for a bit, will read the rest later. Thanks again!)

@@ -209,3 +206,5 @@ def parse_flags(self, program_text: str) -> List[str]:
return m.group(1).split()
else:
return []

TestTypeCheck.setup_tests()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, typically the magic of pytest makes it so you don't need to make this kind of call up at toplevel. Is there another way we can prepare these?

Also, "setup" in the pytest context usually refers to something you do just before running some tests in order to set up some state that the test itself, or the application code it's testing, expect in order for the test to run. It's not normally something you'd want to do at toplevel like this. Is "collect" or "test collection" more what this step is about?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@gnprice
Copy link
Collaborator

gnprice commented Jun 21, 2016

GitHub is indicating in the status box at the bottom of this PR page that there are "conflicts". I think these are easy -- Git actually handles everything automatically when I locally try a rebase or merge from your branch, and the only thing it even mentions having to do anything about is that there were some changes in mypy/testcheck.py, which you renamed.

So you'll want to rebase eventually, but there's no rush -- actually I think it's best if you hold off to rebase until the PR is otherwise nearly ready to merge, because if you do I think GitHub will get confused into thinking all the comments on specific lines are on "obsolete" versions of your changes and will hide them.

@gnprice
Copy link
Collaborator

gnprice commented Jun 21, 2016

As it happens, I spent a number of hours last week becoming more familiar with the internals of pytest and with how to configure it -- two subjects which are unfortunately hard to separate from each other, because one thing pytest really is not good at is documentation -- than I'd ever really desired to be, for the sake of some non-mypy-related stuff at work. So I may be able to make some pretty specific comments and suggestions on how to manipulate pytest into doing exactly what we want here as smoothly as possible. Let's see.

@gnprice
Copy link
Collaborator

gnprice commented Jun 21, 2016

BTW, I really like how you've set things up so that the various mypy/test/testfoo.py test drivers need only small scattered changes in order to move over to pytest. That helps a lot in making this diff easy to read and understand (both now for review and in the future for others reading the history), and it's going to help a lot also in converting over the remaining test drivers.

@gnprice
Copy link
Collaborator

gnprice commented Jun 21, 2016

Let's zoom out a bit to discuss what we'd like this all to look like in pytest terms. (Once we have a clear picture of this which we'd like, this would be a great thing to explain in the commit message! Or in fact, the parts of it that are directly about the code as it ends up, rather than the relationship to myunit, will be good for a long comment and/or docstring somewhere, perhaps around PytestSuite.)

As you might already know from whatever study you've made of how pytest works, it operates basically in two phases: first it "collects" a bunch of test "items", then it runs those items. Collection is a recursive process: it descends the filesystem tree recursively looking for files that appear to be test files, and invokes appropriate collectors for each of those; for a Python file, the usual collector will import it as a module and then recursively collect promising functions and classes in the module, and the usual collector on a class will recurse further. Every step of this process can be hooked into and modified.

What we want here is to hook into the point where pytest tries to collect one of our PytestSuite classes. The main model here for comparison is how pytest's built-in plugin for handling unittest.TestCase classes does it:
https://github.com/pytest-dev/pytest/blob/master/_pytest/unittest.py#L14

So it adds an implementation of the hook called pytest_pycollect_makeitem -- which you can do in a conftest.py file. If that hook returns any non-None value, that will be the answer and pytest will move on without trying any other implementations of the hook.

Here, I think you'd basically want that hook implementation to end up calling your setup_tests (probably with a name more like collect, as I mentioned above.) Or rather, it should have a layer of indirection much like the example I linked to: the hook returns a new collector, an instance of a class you write, with that class inheriting from pytest.Class and adding its own collect implementation which is basically a version of your setup_tests.

That will fit in better with the normal pytest flow of things, and will let you avoid calling setup_tests at the module toplevel.

@gnprice
Copy link
Collaborator

gnprice commented Jun 21, 2016

Also, I think I see now why your perform parameter/attribute ends up with a hard-to-type signature. I think the solution is probably for DataDrivenTestCase in the new pytest-based world to just stop taking responsibility for actually running the code, so that the run method is something that is only invoked in the myunit-based world and there's no need for the perform parameter/attribute in the pytest world.

Then in the pytest world, when your setup_tests aka collect method takes responsibility for setting up some kind of pytest "item" to run the given test case, instead of using a perform parameter it relies on some convention you establish for a method that a PytestSuite needs to implement which does the job of perform. All these methods and functions have different names now, and some are methods on the suite while others are standalone functions, but that's an accident that can be changed and I think we'll make things easier for ourselves by changing it to a standard convention (and it won't make the conversion diffs much bigger, either.)

@gnprice
Copy link
Collaborator

gnprice commented Jun 21, 2016

Haven't yet read the runtests.py changes in detail, or the other two drivers other than testcheck.py. Otherwise, that's what I have for this round. I'll be eager to see the next version! Running these unit tests in parallel, and getting to use pytest directly, are going to be great.

@gvanrossum
Copy link
Member

Is this stalled? Maybe it would be easier to get this moving if it wasn't such a massive set of diffs?

@refi64
Copy link
Contributor Author

refi64 commented Jun 30, 2016

@gvanrossum Sorry for the delay! I just fixed most of the issues mentioned.

@gnprice Two things:

  1. Since I usually end up squashing the commits anyway, I'll probably just edit in a message then.
  2. I'm sorry; could you elaborate a bit on what you said in the last comment for pytest not having anything to do with perform? You lost me at "takes responsibility for setting up some kind of pytest "item" to run the given test case"... Are you saying that collect_tests would just create pytest items, like the unittest example you linked? And what exactly do you mean by "All these methods and functions have different names now"? Sorry; my brain can't fit all that in at once... :/

@gnprice
Copy link
Collaborator

gnprice commented Jul 3, 2016

Thanks for the update! Reading in detail now.


def pytest_pycollect_makeitem(collector, name, obj):
if (inspect.isclass(obj) and issubclass(obj, PytestSuite) and
obj is not PytestSuite):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid the and obj is not PytestSuite part here? A special case like this always makes me a little uneasy that there's a bug, or at least a gotcha in the API, hiding somewhere.

Ah, in fact I think what it would take to make this work cleanly without that condition is just to add a cases method right on the PytestSuite base class that just returns an empty list. Will continue in an inline comment over there for better context.

@gnprice
Copy link
Collaborator

gnprice commented Jul 3, 2016

So here's some reading, and thoughts, on how to do this test collection using the pytest API as documented. As I said above, I think the existing core logic will do fine and it shouldn't be too complicated to put it inside that API.

There are two relevant examples in the pytest codebase itself, for doctest and unittest (i.e. the stdlib's unittest module).
https://github.com/pytest-dev/pytest/blob/master/_pytest/doctest.py
https://github.com/pytest-dev/pytest/blob/master/_pytest/unittest.py

The unittest module seems to have a bunch of mess in it at this point, most of that apparently due to wanting to support twisted.trial.unittest within the same code, but here's a much simpler version from earlier in history:
https://github.com/pytest-dev/pytest/blob/929291775/_pytest/unittest.py
There've probably been API changes in pytest and improvements in this module since that version, so I'd take the current version as the one to copy snippets from, but I'd take the early version as the guide to what snippets and structure you actually need. (And even some of that, like addError and so on, is specific to unittest.)

The doctest example is perhaps simpler -- in DoctestItem, it demonstrates using pytest.Item rather than the more complex pytest.Function. There's a lot of stuff in repr_failure there, but I think that's all specific to supporting particular features of doctest nicely and not anything you'd need to worry about.

So in brief, I think it'd go like

  • pytest_pycollect_makeitem returns an instance of, say, DataDrivenSuite;
  • DataDrivenSuite inherits from pytest.Class and has a collect method, which invokes cases and for each case makes an instance of say DataDrivenCase and yields that;
  • DataDrivenCase inherits from pytest.Item and has a runtest which actually runs the test case.

@gnprice
Copy link
Collaborator

gnprice commented Jul 3, 2016

I'm sorry; could you elaborate a bit on what you said in the last comment for pytest not having anything to do with perform? You lost me at "takes responsibility for setting up some kind of pytest "item" to run the given test case"... Are you saying that collect_tests would just create pytest items, like the unittest example you linked? And what exactly do you mean by "All these methods and functions have different names now"? Sorry; my brain can't fit all that in at once... :/

Sure! I guess this is especially relevant after my last comment. :-)

Yep, as I said in that comment just now, I'd have a method like collect_tests just create pytest items. I think the place that method lives will be a little different from where you currently have it -- more details in that comment just above this one. Happy to say more if that comment isn't clear.

About cutting out perform, and the method and function names:

Let's take our favorite example: test_check. In your current version -- which follows in this respect the way the myunit version of things works -- the cases method passes parse_test_cases a callback to actually run the test, which in turn gets stuck onto the DataDrivenTestCase instances as an attribute. This is what parse_test_cases and DataDrivenTestCase call perform, and in this particular driver it's cls.run_test i.e. TestTypeCheck.run_test. The tests are run by invoking run on the DataDrivenTestCase instances, which turns straight around and invokes that callback.

Another way this could be done is: just as PytestSuite requires its subclasses to have a cases classmethod, it requires them to have a run_test classmethod. (Or instance method, but I think it can just be a classmethod.) Then instead of ever calling DataDrivenTestCase.run, we just call that method directly.

This saves a layer of indirection in running the test. It also saves a parameter in the rather complicated signature of parse_test_cases. And it saves us from overloading that perform parameter and attribute, which saves us from having to give it a type involving Any.

Some of the other drivers don't currently have a run_test method; instead they have a method with some other name, or a toplevel function, but in every case they have a method or function that plays this role. That's what I meant by "All these methods and functions have different names now" -- they're differently named now, so you'll need to rename those methods and functions all to be methods with a single name (like run_test). Which is probably good for consistency in any event.

Does that help? Are there aspects of this that still don't make sense? Happy to discuss further.

Looking forward to seeing this migrated!

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 13, 2016

I finally got around to trying this. The general py.test workflow looks great, thanks for working on this! There are few details that probably need some more work, discussed below. Some of them were also discussed previously, but I'm repeating them here as I may have a slightly different take on things.

In general I'd like to get the most important details right before merging this. The previous big test runner revamp from 2015 introduced a bunch of regressions and slowed down productivity for a long while, and some of the regressions still haven't been fixed. If we merge a not-quite-fully finished workflow, it has a tendency to become the norm and developers just get used to it, and nobody has enough incentive or time to further improve things. I've seen this happen over and over again :-( Just preserving existing functionality and important UX niceties is good enough -- no need to implement new stuff such as unified context diffs.

1) Displaying actual/expected output

I want to preserve an (almost) verbatim (lightweight diff markers are okay) actual/expected output in the py.test output, even when running without -v. (-v feels too verbose to me when running more than a handful of tests, and I find it difficult to find the relevant output without wasting time scrolling around.)

Specifics:

  • The actual vs expected output is the most important result in a typical test case run, so I want the actual output to be easily visible by default and without scrolling, even if I'm on my 13 inch laptop which can't show many lines in a terminal.
  • With the legacy test runner, I can easily copy the actual output to a test case description. A common workflow is to write a new test case with a dummy output (such as XXX), run the test, and copy&paste the actual output from test runner to the test case description. I generally need to deindent by 2 spaces and remove (diff) lines, but that's already muscle memory to me. The py.test output doesn't make this easy. For some cases the test output can be quite long, so doing a line-by-line copy-paste can be frustrating. Example:
E         Full diff:
E         - ['xxx']
E         + ['main:1: error: "int" not callable',
E         +  'main:2: error: "int" not callable',
E         +  'main:3: error: "int" not callable',
E         +  'main:4: error: "int" not callable',
E         +  'main:5: error: "str" not callable']

Editing this to a clean test output requires a few too many steps to my liking (also, the non-verbose output isn't very useful).

2) Parallel tests

I can't get the xdist plugin to work for running tests in parallel. This is what I get when I run py.test -n 2:

============================= test session starts ==============================
platform darwin -- Python 3.5.1, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/jukka/src/pytest-mypy, inifile: pytest.ini
plugins: xdist-1.14
gw0 I / gw1 I
gw0 [1854] / gw1 [1854]

scheduling tests via LoadScheduling

==================================== ERRORS ====================================
_____________________________ ERROR collecting gw1 _____________________________
Different tests were collected between gw0 and gw1. The difference is:
--- gw0

+++ gw1

@@ -1,1854 +1,1854 @@

+mypy/test/test_check.py::TestTypeCheck::()::testPow
+mypy/test/test_check.py::TestTypeCheck::()::testAttributeInGenericTypeWithTypevarValues1
+mypy/test/test_check.py::TestTypeCheck::()::testReturnWithCallExprAndIsinstance
+mypy/test/test_check.py::TestTypeCheck::()::testMethodOverridingAcrossDeepInheritanceHierarchy1
+mypy/test/test_check.py::TestTypeCheck::()::testClassAttributeInitialization-skip
[... a LOT of output trimmed ...]

3) Stack trace formatting

I'd prefer the short stack trace format by default that just shows the line where the failure happened instead of the full function body. I have this enabled on other projects that I work on that use py.test and I think that the short format stack trace is better, at least for running tests interactively. In Travis CI it may make sense to include the long stack trace format for more context, but not sure about that. This is particularly relevant for mypy as for some of the most common test failures (unexpected test case output only) the stack trace isn't very useful.

4) -i and -u options

These options automatically change test case descriptions based on actual output, and these have saved a lot of work in the past when doing large-scale changes to error messages. Not sure if these still work, but it would be really nice to still have these.

@gnprice
Copy link
Collaborator

gnprice commented Jul 27, 2016

This PR has been quiet a few weeks -- @kirbyfan64, I assume you've become busy for a bit -- and I wouldn't want this work to go to waste. So I spent a few hours today putting together a new attempt, #1944. It takes ideas from @kirbyfan64's original PR here, with adjustments following my suggestions above, and also taking into account @JukkaL's request just above to pay close attention to putting similar information and verbosity in the output to what myunit does.

I believe if we merge that it will largely replace this PR, but there'll be plenty of follow-on work to do to complete #1673.

gnprice added a commit that referenced this pull request Jul 27, 2016
…over to it (#1944)

This is a step toward #1673 (switching entirely to pytest from myunit
and runtests.py), using some of the ideas developed in @kirbyfan64's
PR #1723.

Both `py.test` with no arguments and `py.test mypy/test/testcheck.py`
work just as you'd hope, while `./runtests.py` continues to run all
the tests.

The output is very similar to the myunit output.  It doesn't spew
voluminous stack traces or other verbose data, and it continues using
`assert_string_arrays_equal` to produce nicely-formatted comparisons
when e.g. type-checker error messages differ.  On error it even
includes the filename and line number for the test case itself, which
I've long wanted, and I think pytest's separator lines and coloration
make the output slightly easier to read when there are multiple
failures.

The `-i` option from myunit is straightforwardly ported over as
`--update-data`, giving it a longer name because it feels like the
kind of heavyweight and uncommon operation that deserves such a name.
It'd be equally straightforward to port over `-u`, but in the presence
of source control I think `--update-data` does the job on its own.

One small annoyance is that if there's a failure early in a test run,
pytest doesn't print the detailed report on that failure until the
whole run is over.  This has often annoyed me in using pytest on other
projects; useful workarounds include passing `-x` to make it stop at
the first failure, `-k` to filter the set of tests to be run, or
(especially with our tests where errors often go through
`assert_string_arrays_equal`) `-s` to let stdout and stderr pass
through immediately.  For interactive use I think it'd nearly always
be preferable to do what myunit does by immediately printing the
detailed information, so I may come back to this later to try to get
pytest to do that.

We don't yet take advantage of `xdist` to parallelize within a
`py.test` run (though `xdist` works for me out of the box in initial
cursory testing.)  For now we just stick with the `runtests.py`
parallelization, so we set up a separate `py.test` command for each
test module.
@gnprice
Copy link
Collaborator

gnprice commented Jul 27, 2016

Closing as #1944 has now landed. Still plenty to do to complete #1673, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants