Use CPU time for --warn-long and a low default limit for non-long tests #32981

orlitzky · 2021-12-06T12:58:24Z

As discussed in #32973, the default value of --warn-long is roughly a minute, independent of the --long flag, and is only enabled when timing information from previous test runs is available. Our developers' guide suggests that even a "long time" test should complete in under five seconds:

https://doc.sagemath.org/html/en/developer/doctesting.html#optional-arguments

This ticket addresses two problems:

The timing information is often unavailable. We should still warn about long tests in that case.
The default of 60 CPU-equivalent seconds is much too high when --long is not in use.

Both of these issues stem from using the "wall time" to measure how long a test has run. Wall time is inherently unstable; if you're watching a movie, the tests will take longer. On a slower computer, the tests will take longer. Et cetera. The use of timing information from previous runs and the obscenely high --warn-long limit partially address that.

So in targeting our two issues, we first aim to switch the timing reports and --warn-long flag to use CPU time instead of wall time. CPU time is more reliable, since it only reports time actually spent working on the problem. (It's still sensitive to things like physical CPU time and CFLAGS; those will be addressed in #33022). With CPU time being used, the timings will be much more consistent, allowing us to lower the default value of --warn-long to 10 CPU-seconds when --long was not given.

The main difficulty is in accounting for CPU time spent in pexpect subprocesses. We don't wait() for those and they may not terminate, so the OS's usual methods of retrieving child process statistics don't work for them. At present we have a PID-based solution that works quickly on Linux/BSD, but something similar for macOS would be nice.

CC: @vbraun @tornaria

Component: doctest framework

Work Issues: cputime by PID on macOS

Author: Michael Orlitzky

Branch/Commit: u/mjo/ticket/32981 @ d7d2cff

Reviewer: https://github.com/sagemath/sage/actions/runs/1992568379

Issue created by migration from https://trac.sagemath.org/ticket/32981

The text was updated successfully, but these errors were encountered:

orlitzky · 2021-12-06T13:18:14Z

comment:1

Volker, any idea why we used wall time for the limit rather than CPU time? It's making it a bit awkward to pick a multiplier when no timing information is available (which is usually the case for me).

orlitzky · 2021-12-06T13:18:14Z

Branch: u/mjo/ticket/32981

orlitzky · 2021-12-06T13:18:14Z

Commit: 6c68187

orlitzky · 2021-12-06T13:18:14Z

Author: Michael Orlitzky

sagetrac-git · 2021-12-08T14:37:50Z

Changed commit from 6c68187 to 2959d09

sagetrac-git · 2021-12-08T14:37:50Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`2959d09`	`Trac #32981: decrease --warn-long limit for non-long tests.`

orlitzky · 2021-12-08T14:38:45Z

comment:3

This is a bit too lenient IMO... allowing 15s for a non-long test when no timing information is available. Still, it's an improvement.

orlitzky · 2021-12-08T17:42:19Z

comment:4

I'm just going to put in the effort to switch this to CPU time, which makes a lot more sense for a time measurement that needs to be consistent across machines.

sagetrac-git · 2021-12-08T21:08:57Z

Changed commit from 2959d09 to 66c4cc8

sagetrac-git · 2021-12-08T21:08:57Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`0de97ac`	`Trac #32981: use CPU time for --warn-long.`
`4b9b07b`	`Trac #32981: decrease --warn-long limit for non-long tests.`
`66c4cc8`	`Trac #32981: use a faster file to test the doctest framework.`

orlitzky · 2021-12-08T21:09:43Z

New commits:

`0de97ac`	`Trac #32981: use CPU time for --warn-long.`
`4b9b07b`	`Trac #32981: decrease --warn-long limit for non-long tests.`
`66c4cc8`	`Trac #32981: use a faster file to test the doctest framework.`

mwageringel · 2022-02-08T21:47:17Z

comment:9

One reason for using wall time instead of CPU time might be that it works better for computations that use external processes like Pexpect:

sage: R = PolynomialRing(GF(101), 'x', 8)
....: J = sage.rings.ideal.Cyclic(R)
....: %time gb = J.groebner_basis(algorithm='singular')
CPU times: user 917 ms, sys: 52.6 ms, total: 969 ms
Wall time: 33.2 s

jhpalmieri · 2022-02-08T22:05:41Z

comment:10

Maybe measure both and pay attention to wall time only if it is much longer than CPU time. I don't know if "much longer" should be a raw time (5 seconds longer) or a multiple (10 times as long). We want informative messages, so this doesn't have to be flawless.

orlitzky · 2022-02-08T22:15:24Z

comment:11

Well.. that's embarrassing. I'll have to think about it.

Our --warn-long flag used during testing currently takes "wall time" as its argument. This is somewhat nice from a user perspective, since a normal user is quite likely to know what what wall time is, but it's also a bit nonsensical given the purpose of the flag. The flag's intent is to warn developers about tests that take too long, so whether or not it triggers should not depend on the developer's machine, or what else he's doing while the tests are running -- both of which do affect the wall time. CPU time provides a more reliable measurement. (In fact, when no value for --wall-time was given, we were already using a heuristic to mimic CPU time.) This commit changes the --warn-long flag to use CPU time instead of wall time, obviating the need for the heuristic. Issue: sagemathGH-32981

Update the documentation to reflect that we are now using CPU time (and not wall time) to measure the length of a test. Issue: sagemathGH-32981

The --warn-long flag now takes CPU seconds (not wall seconds) as its argument. Here we update some command-line output to be explicit about this fact. Issue: sagemathGH-32981

The default value of 60 cpu seconds for a normal test is too high: the documentation suggests that a non-long test should complete in about five seconds. So, in this commit we change the default --warn-long value to 10 cpu seconds if the --long flag was not given. Issue: sagemathGH-32981

Since we're now warning about tests that take too many cpu-seconds (as opposed to too many wall-seconds), it makes sense to report the time taken in cpu-seconds as well. This allows you to better judge how close a test is to being a "long" test. The output format remains unchanged, for example, ok [1.02 s] but now the "s" refers to cpu-seconds instead of wall-seconds. A subsequent commit will disambiguate the two types of "s". Issue: sagemathGH-32981

After switching the doctests to report cputime instead of walltime, the running-time measurements for tests using pexpect interfaces are off because the time spent in subprocesses is not accounted for. Fixing this turns out not to be easy: the cputime() function in sage.misc.timing can try to account for them, but it is essentially asking the subprocess to compute its own cputime. This has faults: 1. Not every pexpect interface implemented (or can implement) such a method. 2. The pexpect interface needs to be in good working condition before we can ask the subprocess to do anything. That's often not the case at the end of a "sage: ..." line in a doctest. Particularly it's not true in all of the tests for low-level details of our pexpect interface itself. Instead, this commit parses /proc/<pid>/stat to obtain the cputime numbers for pexpext processes. This works well with one caveat: that information is only available on Linux and BSD. Having a solution that only works on some platforms is not actually too problematic. The motivation for measuring these times in the doctest framework is that someone should notice an outrageously long test and fix or report it. So, it's enough that the measurements be accurate on only some platforms -- particularly where the CI is run. Furthermore, the information is only both missing and potentially useful for long-running pexpect tests, which are relatively rare. It would be possible to implement the same thing for macOS using its own platform-specific interface if anyone is motivated to do so. It would also make sense to use psutil to obtain this information once we have upgraded to a version of ipykernel that reintroduces it as a dependency (see sagemathGH-33772). Issue: sagemathGH-32981

We need to report the CPU time for tests that run afoul of --warn-long, because --warn-long is now specified in CPU time. But, we are still reporting the wall time in several places, such as the per-file summary. So to keep things as consistent as possible, we should report the wall time for --warn-long failures as well. This changes the output of a too-long test from e.g. Test ran for 1.23 cpu seconds, check ran for 2.34 cpu seconds to, Test ran for 1.23s cpu, 2.50s wall Check ran for 2.34s cpu, 3.12s wall Issue: sagemathGH-32981

Now that cpu seconds are used for --warn-long, we should disambiguate wherever wall seconds are still used. One such place is in the doctest runner's per-file report, which outputs (for example) [14 tests, 0.26 s] after testing a file with 14 tests that took 0.26 wall seconds. This commit changes that output to [14 tests, 0.26s wall] which is explicit about the type of measurement used. Issue: sagemathGH-32981

Now that cpu seconds are used for --warn-long, we should disambiguate wherever wall seconds are still used. One such place is in the verbose test runner output, which outputs (for example) ok [3.83 s] after a test that took 3.83 wall seconds. This commit changes that output to ok [3.83s wall] which is unmistakable. Closes: sagemathGH-32981