Start moving to pytest #1723

refi64 · 2016-06-18T22:20:00Z

Sorry this took so long! I was busy (ACT)/distracted (E3)/everything else...

In addition, I ended up moving over the command-line tests and semanal tests as well as the check tests, since it was pretty trivial.

Also, I noticed the comment in runtests.py saying that the command-line tests took longer than regular unit tests. Why is that? They only take 7s, vs around 45s for everything else. Not a significant difference.

Unfortunately, I can't actually try this in parallel because I think I keep hitting pytest-dev/pytest#596...

…est trying to run it

refi64 · 2016-06-18T22:30:49Z

Actually, I got xdist to work (see pytest-dev/pytest-xdist#72). Finished all the tests in 32 seconds across 3 cores!

gnprice · 2016-06-20T21:36:39Z

Awesome, thanks! Really looking forward to completing this move and also to running the unit tests in parallel.

One thing that's useful both in the final commit message (for people reading the change later to understand it) and in the process of code review (for reviewers to understand the changes) is to write up in one place a description of everything that's changing and why. If you look at recent commits, most of them that aren't trivial changes are good examples; in particular see these by Guido, me, David, and Reid.

The way these work logistically in our GitHub workflow is that when one of us is ready to hit the "Merge pull request" button, we'll get a textarea right there for the final commit message. It'll be pre-filled with all your commit messages from the branch, and typically we'll take the relevant parts of that and also copy in relevant paragraphs from your PR message and any comments. So the message when first opening the PR is a great place to put this description, and when you've already opened a PR a comment is a fine place to add it. Would you write one now?

I'm also just about to read through the code.

gnprice · 2016-06-20T21:38:36Z

README.md

+
+    $ ./runtests.py pytest -t -k -t NestedListAssignment
+    $ py.test -k NestedListAssignment
+


"To run these, you must use pytest instead", sounds like these tests might not run if you just say ./runtests.py. They do, right?

Maybe a clearer description here would be something like "To run these unit tests individually, you can run py.test directly, or pass inferior arguments with -t." (I think you mean -t rather than -k -- is that right or am I misreading the example?)

gnprice · 2016-06-20T22:01:04Z

(Going afk for a bit, will read the rest later. Thanks again!)

gnprice · 2016-06-20T23:30:11Z

mypy/test/test_check.py

@@ -209,3 +206,5 @@ def parse_flags(self, program_text: str) -> List[str]:
            return m.group(1).split()
        else:
            return []
+
+TestTypeCheck.setup_tests()


Hmm, typically the magic of pytest makes it so you don't need to make this kind of call up at toplevel. Is there another way we can prepare these?

Also, "setup" in the pytest context usually refers to something you do just before running some tests in order to set up some state that the test itself, or the application code it's testing, expect in order for the test to run. It's not normally something you'd want to do at toplevel like this. Is "collect" or "test collection" more what this step is about?

gnprice · 2016-06-21T00:47:27Z

GitHub is indicating in the status box at the bottom of this PR page that there are "conflicts". I think these are easy -- Git actually handles everything automatically when I locally try a rebase or merge from your branch, and the only thing it even mentions having to do anything about is that there were some changes in mypy/testcheck.py, which you renamed.

So you'll want to rebase eventually, but there's no rush -- actually I think it's best if you hold off to rebase until the PR is otherwise nearly ready to merge, because if you do I think GitHub will get confused into thinking all the comments on specific lines are on "obsolete" versions of your changes and will hide them.

gnprice · 2016-06-21T00:50:26Z

As it happens, I spent a number of hours last week becoming more familiar with the internals of pytest and with how to configure it -- two subjects which are unfortunately hard to separate from each other, because one thing pytest really is not good at is documentation -- than I'd ever really desired to be, for the sake of some non-mypy-related stuff at work. So I may be able to make some pretty specific comments and suggestions on how to manipulate pytest into doing exactly what we want here as smoothly as possible. Let's see.

gnprice · 2016-06-21T00:59:13Z

BTW, I really like how you've set things up so that the various mypy/test/testfoo.py test drivers need only small scattered changes in order to move over to pytest. That helps a lot in making this diff easy to read and understand (both now for review and in the future for others reading the history), and it's going to help a lot also in converting over the remaining test drivers.

gnprice · 2016-06-21T01:56:36Z

Let's zoom out a bit to discuss what we'd like this all to look like in pytest terms. (Once we have a clear picture of this which we'd like, this would be a great thing to explain in the commit message! Or in fact, the parts of it that are directly about the code as it ends up, rather than the relationship to myunit, will be good for a long comment and/or docstring somewhere, perhaps around PytestSuite.)

As you might already know from whatever study you've made of how pytest works, it operates basically in two phases: first it "collects" a bunch of test "items", then it runs those items. Collection is a recursive process: it descends the filesystem tree recursively looking for files that appear to be test files, and invokes appropriate collectors for each of those; for a Python file, the usual collector will import it as a module and then recursively collect promising functions and classes in the module, and the usual collector on a class will recurse further. Every step of this process can be hooked into and modified.

What we want here is to hook into the point where pytest tries to collect one of our PytestSuite classes. The main model here for comparison is how pytest's built-in plugin for handling unittest.TestCase classes does it:
https://github.com/pytest-dev/pytest/blob/master/_pytest/unittest.py#L14

So it adds an implementation of the hook called pytest_pycollect_makeitem -- which you can do in a conftest.py file. If that hook returns any non-None value, that will be the answer and pytest will move on without trying any other implementations of the hook.

Here, I think you'd basically want that hook implementation to end up calling your setup_tests (probably with a name more like collect, as I mentioned above.) Or rather, it should have a layer of indirection much like the example I linked to: the hook returns a new collector, an instance of a class you write, with that class inheriting from pytest.Class and adding its own collect implementation which is basically a version of your setup_tests.

That will fit in better with the normal pytest flow of things, and will let you avoid calling setup_tests at the module toplevel.

gnprice · 2016-06-21T02:06:55Z

Also, I think I see now why your perform parameter/attribute ends up with a hard-to-type signature. I think the solution is probably for DataDrivenTestCase in the new pytest-based world to just stop taking responsibility for actually running the code, so that the run method is something that is only invoked in the myunit-based world and there's no need for the perform parameter/attribute in the pytest world.

Then in the pytest world, when your setup_tests aka collect method takes responsibility for setting up some kind of pytest "item" to run the given test case, instead of using a perform parameter it relies on some convention you establish for a method that a PytestSuite needs to implement which does the job of perform. All these methods and functions have different names now, and some are methods on the suite while others are standalone functions, but that's an accident that can be changed and I think we'll make things easier for ourselves by changing it to a standard convention (and it won't make the conversion diffs much bigger, either.)

gnprice · 2016-06-21T02:09:29Z

Haven't yet read the runtests.py changes in detail, or the other two drivers other than testcheck.py. Otherwise, that's what I have for this round. I'll be eager to see the next version! Running these unit tests in parallel, and getting to use pytest directly, are going to be great.

gvanrossum · 2016-06-29T22:52:56Z

Is this stalled? Maybe it would be easier to get this moving if it wasn't such a massive set of diffs?

refi64 · 2016-06-30T18:57:10Z

@gvanrossum Sorry for the delay! I just fixed most of the issues mentioned.

@gnprice Two things:

Since I usually end up squashing the commits anyway, I'll probably just edit in a message then.
I'm sorry; could you elaborate a bit on what you said in the last comment for pytest not having anything to do with perform? You lost me at "takes responsibility for setting up some kind of pytest "item" to run the given test case"... Are you saying that collect_tests would just create pytest items, like the unittest example you linked? And what exactly do you mean by "All these methods and functions have different names now"? Sorry; my brain can't fit all that in at once... :/

gnprice · 2016-07-03T00:43:43Z

Thanks for the update! Reading in detail now.

gnprice · 2016-07-03T00:55:23Z

conftest.py

+
+def pytest_pycollect_makeitem(collector, name, obj):
+     if (inspect.isclass(obj) and issubclass(obj, PytestSuite) and
+            obj is not PytestSuite):


Can you avoid the and obj is not PytestSuite part here? A special case like this always makes me a little uneasy that there's a bug, or at least a gotcha in the API, hiding somewhere.

Ah, in fact I think what it would take to make this work cleanly without that condition is just to add a cases method right on the PytestSuite base class that just returns an empty list. Will continue in an inline comment over there for better context.

gnprice · 2016-07-03T01:58:05Z

So here's some reading, and thoughts, on how to do this test collection using the pytest API as documented. As I said above, I think the existing core logic will do fine and it shouldn't be too complicated to put it inside that API.

There are two relevant examples in the pytest codebase itself, for doctest and unittest (i.e. the stdlib's unittest module).
https://github.com/pytest-dev/pytest/blob/master/_pytest/doctest.py
https://github.com/pytest-dev/pytest/blob/master/_pytest/unittest.py

The unittest module seems to have a bunch of mess in it at this point, most of that apparently due to wanting to support twisted.trial.unittest within the same code, but here's a much simpler version from earlier in history:
https://github.com/pytest-dev/pytest/blob/929291775/_pytest/unittest.py
There've probably been API changes in pytest and improvements in this module since that version, so I'd take the current version as the one to copy snippets from, but I'd take the early version as the guide to what snippets and structure you actually need. (And even some of that, like addError and so on, is specific to unittest.)

The doctest example is perhaps simpler -- in DoctestItem, it demonstrates using pytest.Item rather than the more complex pytest.Function. There's a lot of stuff in repr_failure there, but I think that's all specific to supporting particular features of doctest nicely and not anything you'd need to worry about.

So in brief, I think it'd go like

pytest_pycollect_makeitem returns an instance of, say, DataDrivenSuite;
DataDrivenSuite inherits from pytest.Class and has a collect method, which invokes cases and for each case makes an instance of say DataDrivenCase and yields that;
DataDrivenCase inherits from pytest.Item and has a runtest which actually runs the test case.

gnprice · 2016-07-03T02:13:33Z

I'm sorry; could you elaborate a bit on what you said in the last comment for pytest not having anything to do with perform? You lost me at "takes responsibility for setting up some kind of pytest "item" to run the given test case"... Are you saying that collect_tests would just create pytest items, like the unittest example you linked? And what exactly do you mean by "All these methods and functions have different names now"? Sorry; my brain can't fit all that in at once... :/

Sure! I guess this is especially relevant after my last comment. :-)

Yep, as I said in that comment just now, I'd have a method like collect_tests just create pytest items. I think the place that method lives will be a little different from where you currently have it -- more details in that comment just above this one. Happy to say more if that comment isn't clear.

About cutting out perform, and the method and function names:

Let's take our favorite example: test_check. In your current version -- which follows in this respect the way the myunit version of things works -- the cases method passes parse_test_cases a callback to actually run the test, which in turn gets stuck onto the DataDrivenTestCase instances as an attribute. This is what parse_test_cases and DataDrivenTestCase call perform, and in this particular driver it's cls.run_test i.e. TestTypeCheck.run_test. The tests are run by invoking run on the DataDrivenTestCase instances, which turns straight around and invokes that callback.

Another way this could be done is: just as PytestSuite requires its subclasses to have a cases classmethod, it requires them to have a run_test classmethod. (Or instance method, but I think it can just be a classmethod.) Then instead of ever calling DataDrivenTestCase.run, we just call that method directly.

This saves a layer of indirection in running the test. It also saves a parameter in the rather complicated signature of parse_test_cases. And it saves us from overloading that perform parameter and attribute, which saves us from having to give it a type involving Any.

Some of the other drivers don't currently have a run_test method; instead they have a method with some other name, or a toplevel function, but in every case they have a method or function that plays this role. That's what I meant by "All these methods and functions have different names now" -- they're differently named now, so you'll need to rename those methods and functions all to be methods with a single name (like run_test). Which is probably good for consistency in any event.

Does that help? Are there aspects of this that still don't make sense? Happy to discuss further.

Looking forward to seeing this migrated!

JukkaL · 2016-07-13T13:07:54Z

I finally got around to trying this. The general py.test workflow looks great, thanks for working on this! There are few details that probably need some more work, discussed below. Some of them were also discussed previously, but I'm repeating them here as I may have a slightly different take on things.

In general I'd like to get the most important details right before merging this. The previous big test runner revamp from 2015 introduced a bunch of regressions and slowed down productivity for a long while, and some of the regressions still haven't been fixed. If we merge a not-quite-fully finished workflow, it has a tendency to become the norm and developers just get used to it, and nobody has enough incentive or time to further improve things. I've seen this happen over and over again :-( Just preserving existing functionality and important UX niceties is good enough -- no need to implement new stuff such as unified context diffs.

1) Displaying actual/expected output

I want to preserve an (almost) verbatim (lightweight diff markers are okay) actual/expected output in the py.test output, even when running without -v. (-v feels too verbose to me when running more than a handful of tests, and I find it difficult to find the relevant output without wasting time scrolling around.)

Specifics:

The actual vs expected output is the most important result in a typical test case run, so I want the actual output to be easily visible by default and without scrolling, even if I'm on my 13 inch laptop which can't show many lines in a terminal.
With the legacy test runner, I can easily copy the actual output to a test case description. A common workflow is to write a new test case with a dummy output (such as XXX), run the test, and copy&paste the actual output from test runner to the test case description. I generally need to deindent by 2 spaces and remove (diff) lines, but that's already muscle memory to me. The py.test output doesn't make this easy. For some cases the test output can be quite long, so doing a line-by-line copy-paste can be frustrating. Example:

E         Full diff:
E         - ['xxx']
E         + ['main:1: error: "int" not callable',
E         +  'main:2: error: "int" not callable',
E         +  'main:3: error: "int" not callable',
E         +  'main:4: error: "int" not callable',
E         +  'main:5: error: "str" not callable']

Editing this to a clean test output requires a few too many steps to my liking (also, the non-verbose output isn't very useful).

2) Parallel tests

I can't get the xdist plugin to work for running tests in parallel. This is what I get when I run py.test -n 2:

============================= test session starts ==============================
platform darwin -- Python 3.5.1, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /Users/jukka/src/pytest-mypy, inifile: pytest.ini
plugins: xdist-1.14
gw0 I / gw1 I
gw0 [1854] / gw1 [1854]

scheduling tests via LoadScheduling

==================================== ERRORS ====================================
_____________________________ ERROR collecting gw1 _____________________________
Different tests were collected between gw0 and gw1. The difference is:
--- gw0

+++ gw1

@@ -1,1854 +1,1854 @@

+mypy/test/test_check.py::TestTypeCheck::()::testPow
+mypy/test/test_check.py::TestTypeCheck::()::testAttributeInGenericTypeWithTypevarValues1
+mypy/test/test_check.py::TestTypeCheck::()::testReturnWithCallExprAndIsinstance
+mypy/test/test_check.py::TestTypeCheck::()::testMethodOverridingAcrossDeepInheritanceHierarchy1
+mypy/test/test_check.py::TestTypeCheck::()::testClassAttributeInitialization-skip
[... a LOT of output trimmed ...]

3) Stack trace formatting

I'd prefer the short stack trace format by default that just shows the line where the failure happened instead of the full function body. I have this enabled on other projects that I work on that use py.test and I think that the short format stack trace is better, at least for running tests interactively. In Travis CI it may make sense to include the long stack trace format for more context, but not sure about that. This is particularly relevant for mypy as for some of the most common test failures (unexpected test case output only) the stack trace isn't very useful.

4) -i and -u options

These options automatically change test case descriptions based on actual output, and these have saved a lot of work in the past when doing large-scale changes to error messages. Not sure if these still work, but it would be really nice to still have these.

gnprice · 2016-07-27T00:44:13Z

This PR has been quiet a few weeks -- @kirbyfan64, I assume you've become busy for a bit -- and I wouldn't want this work to go to waste. So I spent a few hours today putting together a new attempt, #1944. It takes ideas from @kirbyfan64's original PR here, with adjustments following my suggestions above, and also taking into account @JukkaL's request just above to pay close attention to putting similar information and verbosity in the output to what myunit does.

I believe if we merge that it will largely replace this PR, but there'll be plenty of follow-on work to do to complete #1673.

@kirbyfan64

…over to it (#1944) This is a step toward #1673 (switching entirely to pytest from myunit and runtests.py), using some of the ideas developed in @kirbyfan64's PR #1723. Both `py.test` with no arguments and `py.test mypy/test/testcheck.py` work just as you'd hope, while `./runtests.py` continues to run all the tests. The output is very similar to the myunit output. It doesn't spew voluminous stack traces or other verbose data, and it continues using `assert_string_arrays_equal` to produce nicely-formatted comparisons when e.g. type-checker error messages differ. On error it even includes the filename and line number for the test case itself, which I've long wanted, and I think pytest's separator lines and coloration make the output slightly easier to read when there are multiple failures. The `-i` option from myunit is straightforwardly ported over as `--update-data`, giving it a longer name because it feels like the kind of heavyweight and uncommon operation that deserves such a name. It'd be equally straightforward to port over `-u`, but in the presence of source control I think `--update-data` does the job on its own. One small annoyance is that if there's a failure early in a test run, pytest doesn't print the detailed report on that failure until the whole run is over. This has often annoyed me in using pytest on other projects; useful workarounds include passing `-x` to make it stop at the first failure, `-k` to filter the set of tests to be run, or (especially with our tests where errors often go through `assert_string_arrays_equal`) `-s` to let stdout and stderr pass through immediately. For interactive use I think it'd nearly always be preferable to do what myunit does by immediately printing the detailed information, so I may come back to this later to try to get pytest to do that. We don't yet take advantage of `xdist` to parallelize within a `py.test` run (though `xdist` works for me out of the box in initial cursory testing.) For now we just stick with the `runtests.py` parallelization, so we set up a separate `py.test` command for each test module.

gnprice · 2016-07-27T23:09:34Z

Closing as #1944 has now landed. Still plenty to do to complete #1673, though.

refi64 added 15 commits June 12, 2016 17:30

Rename test-requirements.txt -> requirements-testing.txt to avoid pyt…

a3db0c7

…est trying to run it

Move most of testcheck over to pytest

8b29ba0

Tweaks

6435c00

Update runtests.py for pytest

a57a633

Move test_cmdline tests to pytest and reorg some stuff

1bf37e7

Cleanup

c2a3f9c

Move over test_semanal

30089db

Fix myunit tests

49ccce1

Explanation

e0bdfac

Test fixes

7bf2f27

Prevent picking up stuff that isn't actually a test

8257ee8

Don't attempt to run test helpers

1efadc7

Fix flake8

d709522

Fix typing

d42933b

Update requirements

e4beb6d

Update docs

e704545

gnprice reviewed Jun 20, 2016
View reviewed changes

refi64 added 4 commits June 30, 2016 13:34

Fix README

624b5f2

Rename some functions and clean up

464057d

Rename is_skip -> marked_skip and add some informative comments

1506da5

Avoid explicit collect_tests/setup_tests calls

5b7aba1

gnprice reviewed Jul 3, 2016
View reviewed changes

gnprice mentioned this pull request Jul 27, 2016

Set up to use pytest for our data-driven tests, and switch testcheck over to it #1944

Merged

gnprice closed this Jul 27, 2016

gnprice mentioned this pull request Jul 28, 2016

Switch to pytest from our homegrown test runner #1673

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start moving to pytest #1723

Start moving to pytest #1723

refi64 commented Jun 18, 2016 •

edited

Loading

refi64 commented Jun 18, 2016 •

edited

Loading

gnprice commented Jun 20, 2016

gnprice Jun 20, 2016

gnprice Jul 3, 2016

gnprice commented Jun 20, 2016

gnprice Jun 20, 2016

gnprice Jul 3, 2016

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gvanrossum commented Jun 29, 2016

refi64 commented Jun 30, 2016 •

edited

Loading

gnprice commented Jul 3, 2016

gnprice Jul 3, 2016

gnprice commented Jul 3, 2016

gnprice commented Jul 3, 2016

JukkaL commented Jul 13, 2016

gnprice commented Jul 27, 2016

gnprice commented Jul 27, 2016


		$ ./runtests.py pytest -t -k -t NestedListAssignment
		$ py.test -k NestedListAssignment

Start moving to pytest #1723

Start moving to pytest #1723

Conversation

refi64 commented Jun 18, 2016 • edited Loading

refi64 commented Jun 18, 2016 • edited Loading

gnprice commented Jun 20, 2016

gnprice Jun 20, 2016

Choose a reason for hiding this comment

gnprice Jul 3, 2016

Choose a reason for hiding this comment

gnprice commented Jun 20, 2016

gnprice Jun 20, 2016

Choose a reason for hiding this comment

gnprice Jul 3, 2016

Choose a reason for hiding this comment

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gnprice commented Jun 21, 2016

gvanrossum commented Jun 29, 2016

refi64 commented Jun 30, 2016 • edited Loading

gnprice commented Jul 3, 2016

gnprice Jul 3, 2016

Choose a reason for hiding this comment

gnprice commented Jul 3, 2016

gnprice commented Jul 3, 2016

JukkaL commented Jul 13, 2016

gnprice commented Jul 27, 2016

gnprice commented Jul 27, 2016

refi64 commented Jun 18, 2016 •

edited

Loading

refi64 commented Jun 18, 2016 •

edited

Loading

refi64 commented Jun 30, 2016 •

edited

Loading