-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate integration tests to some Makefile/shell based format #2797
Comments
+1, if you (or someone else) could make this happen it would be great. I guess a feasible way to go here would to put this new system in place and not bother converting the existing test cases? (I may be sufficiently motivated to do so because I may need to add some PackageTest cases soonish.) EDIT: Removed a bit of nonsense; I wasn't really familiar enough with the way cabal-install's PackageTests currently work. |
Given that one of my "contributions" had to be reverted a few days ago (basically due to an error caused by a lack of automated tests), I think I'll take a stab at an initial implementation this weekend. @ezyang : Just to make sure we're not duplicating effort: are you actively working on this, or...? |
Nope, have at it! |
So, I think I have a basic test runner set up working now:
The file layout is like this:
(I'll probably rename to something other than IntegrationTests, but you get the idea.) Each of the shell scripts can have an accompanying ".out" and ".err" file alongside it which will be compared against the stdout/stderr of running the script. (At the moment I'm not actually running the shell scripts nor comparing stdout/stderr, but that should be pretty easy to do -- I'm mostly just working on the overall setup wrt. locations of test cases/data/etc. at the moment.) A few of things to think about:
Comments/thoughts? |
For should fail tests: sh-3.2$ ! true
sh-3.2$ echo $?
1
sh-3.2$ ! false
sh-3.2$ echo $?
0 |
You can always crib the relevant design decisions from GHC's test suite:
|
Here's the work so far: https://github.com/BardurArantsson/cabal/tree/gh-2797 or as a single commit: The code uses the usual "tasty" infrastructure, so we get color, timing information and failed/success counts for free.
I still haven't rewritten any of the actual existing PackageTest tests, so it remains to be seen if I've forgotten anything crucial. |
Oh, yes, here's an example of the output (sans color):
(I auto-add the "ignoring ..." bit just to make sure that it's possible to see at a glance if the discovery code is finding the .out and .err files as expected. I'm not sure if we can make it even more in-your-face or if we need to.) |
|
I've pushed an update which now reads the sub-process stdout/stderr asynchronously. Yeah, I'll just go with "bash" for the shell, I think. Since it's only a requirement for development, I guess the MSYS requirement isn't too onerous -- and as I mentioned it seems to be a de-facto requirement for the existing test suite due to its invocation of "bash". About the wrapper script: I can see your point, but I should think it would be a good thing to avoid a) excess bolierplate, but perhaps more importantly avoid b) accidentally calling the system-wide "cabal" executable by accident -- it's easy to forget that you have to use "special syntax" in test suites when reviewing code, etc. Hence I'd want to shadow the name "cabal" in test scripts anyway -- perhaps just to provide an appropriate error message about using $CABAL $CABAL_ARGS instead. What do you think about the latter approach? |
Pushed a further little update:
Next up:
I think that should about do it. |
Pushed a little update:
Please have a look and see what you think! This has uncovered a new potential source of problems, namely "unpredictable" file system state created by cabal-install and the need to do proper cleanup in a simple and reliable way. In the case of "cabal freeze" it's |
You mean "loss" :) Can't you just set the |
Please use either |
I think it might just be better to use Make, in this case? |
That would probably be ideal, we won't have to worry about shell interpreter at all. I just wanted to make sure that if we do use a shell, we call it in a standards-compliant way. :) |
@ttuegel: Yes, the plan was to use "usr/bin/env" with "bash". AFAIUI the "sh" emulation in "bash" isn't nearly strict enough to ensure that people would only write valid "sh" code (ie. bash accepts many bash-isms even in "sh mode"), so it's unrealistic to expect people to write proper "sh" code anyway. @ezyang: I don't understand why you'd want to involve "make" here. Make just calls out to the system shell to execute commands, and I don't see what "make" buys us. (Plus, it's a dependency which I'm pretty sure you don't need to have to hack on Cabal/cabal-install, but I could be mistaken about that. "bash" may also not be strictly necessary, but as of today it's already effectively required and also generally massively more likely to be included in a "base" install of e.g. MSYS, though I don't know for sure if it's actually required to be installed with MSYS.) (@ezyang: I'll respond to your other comment separately. Don't want a wall of text -- rather keep the "threads" separate. :)) |
@BardurArantsson Based on a recent discussion on this very thing here: https://phabricator.haskell.org/D1267 GHC doesn't have a bash dependency because they don't want to force BSD users to install bash to run the test suite. ...Honestly, I think the biggest reason to use make is so you can put multiple tests in one file. Helps reduce overhead of making tests. |
@ezyang: "cabal.config" is just an example of a larger issue (see below). Btw, I'm already giving cabal the "--config-file" argument, but "cabal freeze" is actually creating a file named "cabal.config" regardless. Could this actually be a bug in "cabal freeze"? I understand and share your concerns about debuggability, and I certainly want to have the option of preserving whatever files are generated by the tests, etc. -- perhaps doing so by default if a test fails. (And pointing the developer towards them in test output, etc.) Ideally, it should also be possible to re-run only that specific test -- I'm not sure if "tasty" has support for that out of the box, but I'd definitely want that as a feature. However, there's a real danger here with accidentally introducing dependencies between tests in the same "group" simply by forgetting to do proper cleanup. (This is exacerbated by the need to have each test preemptively clean up everything Trust me, you really don't want to end up in a situation where you have these accidental dependencies between tests because it makes tests incredibly fragile and frustrating to work on[2]... a situation we're trying to avoid as much as possible with this work. The only realistic way of mitigating this issue that I've found works somewhat in practice is to forcibly randomize the order in which tests run[1]. However, this also carries a real risk of becoming enormously frustrating since it's not guaranteed to root out dependencies, so you may end up having to fix an issue that somebody else introduced. (Which, by sheer luck, happened to survive their own testing and the pre-merge tests.). It happens a lot less frequently, but it does happen from time to time. I still haven't come up with a great solution for this issue, so this is all just tossing ideas around. I think my really ideal solution would be something like a layered virtual file system where the tests are all running in their directory with read-only access to everything, but where all writes transparently go into a separate isolated file system layered on top (at the OS layer, so no "escaping" the sandbox). That, however, isn't practical for a multitude of reasons, so we'll have to come up with something else. Btw, calling the moral equivalent of As usual, any ideas around this welcome. I think it's a problem that needs a solution, but I also don't want to let 'perfect' be the enemy of 'good', so... (Sorry about the wall of text. Let me know if anything's unclear in the above.) [1] Given that [2] Yeah, I was often "the cleanup guy" in these situations. |
I don't disagree with using "sh" in principle, I just see this as a practical matter since those of us on Linux (the vast majority, probably?) or OS X are probably guaranteed to accidentally introduce bashisms? (Is the /bin/sh on OS X actually a real sh? Anybody know?) |
So, with certain driver tests in GHC, this can be a problem, but most of us run the test suite in parallel (e.g. with 12 tests running simultaneously at the same time) so we tend to notice when things clobber each other. Yes, occasionally I have to fix someone clobbering something but it's not that frequent. I also object to the claim that there is "too much ambient state". If this is true, then our application is designed poorly and we should fix it. (This is confounded slightly by the fact that you've been working on cabal-install tests and not Cabal tests; I think things should be a lot better for Cabal.) Like, if it is THAT hard to convince cabal-install to not rely on some unexpected state, cabal-install is broken and should be fixed. But it seems like the easy answer to this problem is to just force each test case to live in its own directory? |
Cabal isn't an executable, so I don't understand how you'd do this kind of testing using shell scripts (or make or whatever) for that :). Oh, and cabal-install is designed poorly... or rather it's extremely ad-hoc since it has basically evolved to where it is now in starts and fits by a lot of different people with no coherent vision. That doesn't tend to produce well-factored designs. :/ EDIT: Almost everything of any consequence happens in IO, for example. What I wouldn't give for a Free/Operational monad version of cabal-install :).
Changing the test framework to make tests more reliable and easier to write is -- in my conception of the world, at least -- a prerequisite for being able to fix the problem without introducing new ones. Getting rid of non-relocatable state in cabal-install is a highly non-trivial undertaking.
Yup, I considered that too, but that leads to duplication of the .cabal files that a lot of the test cases share. I suppose that could be worked around by dropping symlinks in there, but then I'd be worried about Windows support[1]. [1] But perhaps it's a non-issue on Windows with MSYS? I don't know. |
Perhaps separate directories with symlinks for .cabal files is the 80% solution that can get us to where we want to be. I'll try a little experiment later to see what the other downsides are, if any. |
Yeah, I suspected your real problem is that it's really annoying to setup Cabal projects. Perhaps we should define some format for easily specifying Cabal projects in a compact way. Could be something as simple as:
etc. Should make it less unpleasant to do this? |
BTW, it is awesome that you are working on this, and would really like to see even an imperfect version of this go in. Will make dev experience a lot better. |
So, I've been converting a few more test cases... and I'm not sure having ".out" and ".err" files is actually worth it. It sounds like a good idea to standardize it, but I find myself having to do "blah > /dev/null" quite a lot. Checking for the presence of some particular text in the output is mostly also pretty easy: In "should_run" tests you just do "blah && grep foo". (This changes a bit if you want to check for multiple things on different lines, etc. but...). In particular, the output from auto-configure also necessitates always doing an explicit "cabal configure > /dev/null" before actually running the command(s) of interest. I probably won't get around to trying the "directory-per-test-case" thing today, but I'll try to get to it tomorrow. However, I'm currently quite confident that it'll be... cumbersome. (Turns out some tests share a lot more than just a .cabal file... so more symlinks would be necessary. I think it'll be pretty annoying to have to maintain manually.) EDIT: As an example the "exec" tests share "my.cabal", "My.hs", "Foo.hs" and a "subdir" directory (which is empty at the start, not sure if it's actually necessary since I haven't converted all the test cases yet). |
Oh, and a further observation: I was actually surprised by this, but it seems it takes roughly 0.60s to execute each test case at a bare minimum. This surprised me -- I'm not sure if it's purely "cabal" start-up overhead or what it is, but I think some work may be needed to bring it down significantly. A slow-running test-suite is always a pain. |
Re out/err, it's interesting that in the GHC test suite (which has Cabal tests), Setup (which is how we test the Cabal library) is usually called with -v0, and then just the output of a few commands we query to see what the resulting system looks like is actually used. This might be a good fit? I think we want to avoid lots of grepping... sometimes it's necessary but it's pretty annoying, and then you have to manually update tests if output changes. If directory-per-test-case is going to be cumbersome, let's not do it. I'm OK with the original plan of copying things to a separate directory and running them, as long as it's obvious from the test runner how this setup is done. Re length: parallelization should help, right? I think that is a must have feature. |
-v0: Might be workable -- I'll have to try it out on a few test cases, but I could imagine the auto-configure thing being a bit of an issue here in that you probably want selectively switch off the configure output when auto-configure kicks in. Either way, we can just leave the .out and .err support code in -- it doesn't actually do anything unless you explicitly create .out and .err files, so it's not really harmful as such. (EDIT: And you can always combine usage of "> /dev/null" or "-v0" on carefully selected commands with usage of .out/.err.) Yeah, parallelization should help. I'm not seeing particularly high CPU usage (~10% of one CPU on an i7-4470K), so the time taken per test doesn't seem to be CPU bound, at least. I'm not sure if we can get parallel test runs from tasty, but I'd certainly hope so. (Of course this is predicated on being able to isolate test cases sufficiently from another, but I think the "copy + run somewhere else" thing should be able to achieve sufficient isolation for that.) |
I dunno, I don't feel like I ever really care about the output of configure. 🤷 |
Indeed not :). Other than perhaps when testing "configure" :). TODO list to self:
|
Pushed the converted "exec" tests just now. |
Alright, parallelism is built into I'm not sure if we can do that by default, though, since the ARM build disables the "-threading" flag (for good reasons, I presume). Either way it's not that big a deal since running in parallel is a much bigger win if you keep running the test suite, and for that you can easily supply the RTS flags directly on the command line. Btw, time to run went from 14s to <4s, but that's probably unfair since I'm not sure if the tests actually ran to completion (likely that at least some of them didn't.) Will implement "copy+run" next to see if I can get parallel runs to be stable and repeatable. (Will probably also add randomization on test order to try to prevent accidental dependencies between test cases.) |
Just a little note to self: As was mentioned on #2664, line endings may be a problem for "err" and "out" files. |
Alright, I've pushed what I think could be considered Release Candidate 1 of this. See https://github.com/BardurArantsson/cabal/tree/gh-2797. There are still a couple of minor issues marked with XXX, but nothing major. One is the use of the 'temporary' package -- this is just a bit of laziness on my part since Distribution.Compat.TempFile.createTempDirectory is all that's needed, but it's currently not an "exposed" module. However, the comments in the TempFile.hs source lead me to think that the best approach here might actually be to replace the module by adding a dependency on "temporary" instead. Any thoughts on this? Running in parallel, running all the tests takes 5.5 seconds on my machine. All previous PackageTests have been converted to the new test framework -- and I think this validates this approach. There's a lot less boilerplate and it's much easier to see exactly what commands get run. The overall strategy for running the tests is now this:
Review and comments would be very welcome. Hopefully, I'll be able to submit a final pull request this weekend. |
I tested it in parallel on Windows. Everything passed after I fixed the four issues that I encountered with the previous version (#2664 (comment)). I really like how quickly I can modify and run the tests. |
Ship it!!! |
Pushed RC2 (I guess) with a few updates. Still not quite sure about "sh" vs. "/bin/sh" and need to handle the Unix/Windows line endings somehow. |
Pull request submitted: #2864 Let's see what Travis says... |
Migrate integration tests to shell based format, fixed travis
Currently, the process for package tests in Cabal is something like:
This really sucks, for a few reasons:
The way GHC does these tests is that it just has a test runner which runs some shell scripts (really a Makefile) which runs a test, and the diffs it against some expected output. There's a bit of fuzz to make it all work out, but it is quite nice. Cabal should grow something similar; maybe that would encourage more tests! (Maybe there is a good out-of-the-box thing we can use here.)
The text was updated successfully, but these errors were encountered: