-
-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use CPU time for --warn-long and a low default limit for non-long tests #32981
Comments
comment:1
Volker, any idea why we used wall time for the limit rather than CPU time? It's making it a bit awkward to pick a multiplier when no timing information is available (which is usually the case for me). |
Branch: u/mjo/ticket/32981 |
Commit: |
Author: Michael Orlitzky |
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
|
comment:3
This is a bit too lenient IMO... allowing 15s for a non-long test when no timing information is available. Still, it's an improvement. |
comment:4
I'm just going to put in the effort to switch this to CPU time, which makes a lot more sense for a time measurement that needs to be consistent across machines. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
comment:9
One reason for using wall time instead of CPU time might be that it works better for computations that use external processes like Pexpect:
|
comment:10
Maybe measure both and pay attention to wall time only if it is much longer than CPU time. I don't know if "much longer" should be a raw time (5 seconds longer) or a multiple (10 times as long). We want informative messages, so this doesn't have to be flawless. |
comment:11
Well.. that's embarrassing. I'll have to think about it. |
Our --warn-long flag used during testing currently takes "wall time" as its argument. This is somewhat nice from a user perspective, since a normal user is quite likely to know what what wall time is, but it's also a bit nonsensical given the purpose of the flag. The flag's intent is to warn developers about tests that take too long, so whether or not it triggers should not depend on the developer's machine, or what else he's doing while the tests are running -- both of which do affect the wall time. CPU time provides a more reliable measurement. (In fact, when no value for --wall-time was given, we were already using a heuristic to mimic CPU time.) This commit changes the --warn-long flag to use CPU time instead of wall time, obviating the need for the heuristic. Issue: sagemathGH-32981
Update the documentation to reflect that we are now using CPU time (and not wall time) to measure the length of a test. Issue: sagemathGH-32981
The --warn-long flag now takes CPU seconds (not wall seconds) as its argument. Here we update some command-line output to be explicit about this fact. Issue: sagemathGH-32981
The default value of 60 cpu seconds for a normal test is too high: the documentation suggests that a non-long test should complete in about five seconds. So, in this commit we change the default --warn-long value to 10 cpu seconds if the --long flag was not given. Issue: sagemathGH-32981
Since we're now warning about tests that take too many cpu-seconds (as opposed to too many wall-seconds), it makes sense to report the time taken in cpu-seconds as well. This allows you to better judge how close a test is to being a "long" test. The output format remains unchanged, for example, ok [1.02 s] but now the "s" refers to cpu-seconds instead of wall-seconds. A subsequent commit will disambiguate the two types of "s". Issue: sagemathGH-32981
After switching the doctests to report cputime instead of walltime, the running-time measurements for tests using pexpect interfaces are off because the time spent in subprocesses is not accounted for. Fixing this turns out not to be easy: the cputime() function in sage.misc.timing can try to account for them, but it is essentially asking the subprocess to compute its own cputime. This has faults: 1. Not every pexpect interface implemented (or can implement) such a method. 2. The pexpect interface needs to be in good working condition before we can ask the subprocess to do anything. That's often not the case at the end of a "sage: ..." line in a doctest. Particularly it's not true in all of the tests for low-level details of our pexpect interface itself. Instead, this commit parses /proc/<pid>/stat to obtain the cputime numbers for pexpext processes. This works well with one caveat: that information is only available on Linux and BSD. Having a solution that only works on some platforms is not actually too problematic. The motivation for measuring these times in the doctest framework is that someone should notice an outrageously long test and fix or report it. So, it's enough that the measurements be accurate on only some platforms -- particularly where the CI is run. Furthermore, the information is only both missing and potentially useful for long-running pexpect tests, which are relatively rare. It would be possible to implement the same thing for macOS using its own platform-specific interface if anyone is motivated to do so. It would also make sense to use psutil to obtain this information once we have upgraded to a version of ipykernel that reintroduces it as a dependency (see sagemathGH-33772). Issue: sagemathGH-32981
We need to report the CPU time for tests that run afoul of --warn-long, because --warn-long is now specified in CPU time. But, we are still reporting the wall time in several places, such as the per-file summary. So to keep things as consistent as possible, we should report the wall time for --warn-long failures as well. This changes the output of a too-long test from e.g. Test ran for 1.23 cpu seconds, check ran for 2.34 cpu seconds to, Test ran for 1.23s cpu, 2.50s wall Check ran for 2.34s cpu, 3.12s wall Issue: sagemathGH-32981
Now that cpu seconds are used for --warn-long, we should disambiguate wherever wall seconds are still used. One such place is in the doctest runner's per-file report, which outputs (for example) [14 tests, 0.26 s] after testing a file with 14 tests that took 0.26 wall seconds. This commit changes that output to [14 tests, 0.26s wall] which is explicit about the type of measurement used. Issue: sagemathGH-32981
Now that cpu seconds are used for --warn-long, we should disambiguate wherever wall seconds are still used. One such place is in the verbose test runner output, which outputs (for example) ok [3.83 s] after a test that took 3.83 wall seconds. This commit changes that output to ok [3.83s wall] which is unmistakable. Closes: sagemathGH-32981
Our --warn-long flag used during testing currently takes "wall time" as its argument. This is somewhat nice from a user perspective, since a normal user is quite likely to know what what wall time is, but it's also a bit nonsensical given the purpose of the flag. The flag's intent is to warn developers about tests that take too long, so whether or not it triggers should not depend on the developer's machine, or what else he's doing while the tests are running -- both of which do affect the wall time. CPU time provides a more reliable measurement. (In fact, when no value for --wall-time was given, we were already using a heuristic to mimic CPU time.) This commit changes the --warn-long flag to use CPU time instead of wall time, obviating the need for the heuristic. Issue: sagemathGH-32981
Update the documentation to reflect that we are now using CPU time (and not wall time) to measure the length of a test. Issue: sagemathGH-32981
The --warn-long flag now takes CPU seconds (not wall seconds) as its argument. Here we update some command-line output to be explicit about this fact. Issue: sagemathGH-32981
The default value of 60 cpu seconds for a normal test is too high: the documentation suggests that a non-long test should complete in about five seconds. So, in this commit we change the default --warn-long value to 10 cpu seconds if the --long flag was not given. Issue: sagemathGH-32981
Since we're now warning about tests that take too many cpu-seconds (as opposed to too many wall-seconds), it makes sense to report the time taken in cpu-seconds as well. This allows you to better judge how close a test is to being a "long" test. The output format remains unchanged, for example, ok [1.02 s] but now the "s" refers to cpu-seconds instead of wall-seconds. A subsequent commit will disambiguate the two types of "s". Issue: sagemathGH-32981
After switching the doctests to report cputime instead of walltime, the running-time measurements for tests using pexpect interfaces are off because the time spent in subprocesses is not accounted for. Fixing this turns out not to be easy: the cputime() function in sage.misc.timing can try to account for them, but it is essentially asking the subprocess to compute its own cputime. This has faults: 1. Not every pexpect interface implemented (or can implement) such a method. 2. The pexpect interface needs to be in good working condition before we can ask the subprocess to do anything. That's often not the case at the end of a "sage: ..." line in a doctest. Particularly it's not true in all of the tests for low-level details of our pexpect interface itself. Instead, this commit parses /proc/<pid>/stat to obtain the cputime numbers for pexpext processes. This works well with one caveat: that information is only available on Linux and BSD. Having a solution that only works on some platforms is not actually too problematic. The motivation for measuring these times in the doctest framework is that someone should notice an outrageously long test and fix or report it. So, it's enough that the measurements be accurate on only some platforms -- particularly where the CI is run. Furthermore, the information is only both missing and potentially useful for long-running pexpect tests, which are relatively rare. It would be possible to implement the same thing for macOS using its own platform-specific interface if anyone is motivated to do so. It would also make sense to use psutil to obtain this information once we have upgraded to a version of ipykernel that reintroduces it as a dependency (see sagemathGH-33772). Issue: sagemathGH-32981
We need to report the CPU time for tests that run afoul of --warn-long, because --warn-long is now specified in CPU time. But, we are still reporting the wall time in several places, such as the per-file summary. So to keep things as consistent as possible, we should report the wall time for --warn-long failures as well. This changes the output of a too-long test from e.g. Test ran for 1.23 cpu seconds, check ran for 2.34 cpu seconds to, Test ran for 1.23s cpu, 2.50s wall Check ran for 2.34s cpu, 3.12s wall Issue: sagemathGH-32981
Now that cpu seconds are used for --warn-long, we should disambiguate wherever wall seconds are still used. One such place is in the doctest runner's per-file report, which outputs (for example) [14 tests, 0.26 s] after testing a file with 14 tests that took 0.26 wall seconds. This commit changes that output to [14 tests, 0.26s wall] which is explicit about the type of measurement used. Issue: sagemathGH-32981
Now that cpu seconds are used for --warn-long, we should disambiguate wherever wall seconds are still used. One such place is in the verbose test runner output, which outputs (for example) ok [3.83 s] after a test that took 3.83 wall seconds. This commit changes that output to ok [3.83s wall] which is unmistakable. Closes: sagemathGH-32981
Our --warn-long flag used during testing currently takes "wall time" as its argument. This is somewhat nice from a user perspective, since a normal user is quite likely to know what what wall time is, but it's also a bit nonsensical given the purpose of the flag. The flag's intent is to warn developers about tests that take too long, so whether or not it triggers should not depend on the developer's machine, or what else he's doing while the tests are running -- both of which do affect the wall time. CPU time provides a more reliable measurement. (In fact, when no value for --wall-time was given, we were already using a heuristic to mimic CPU time.) This commit changes the --warn-long flag to use CPU time instead of wall time, obviating the need for the heuristic. Issue: sagemathGH-32981
Update the documentation to reflect that we are now using CPU time (and not wall time) to measure the length of a test. Issue: sagemathGH-32981
The --warn-long flag now takes CPU seconds (not wall seconds) as its argument. Here we update some command-line output to be explicit about this fact. Issue: sagemathGH-32981
The default value of 60 cpu seconds for a normal test is too high: the documentation suggests that a non-long test should complete in about five seconds. So, in this commit we change the default --warn-long value to 10 cpu seconds if the --long flag was not given. Issue: sagemathGH-32981
Since we're now warning about tests that take too many cpu-seconds (as opposed to too many wall-seconds), it makes sense to report the time taken in cpu-seconds as well. This allows you to better judge how close a test is to being a "long" test. The output format remains unchanged, for example, ok [1.02 s] but now the "s" refers to cpu-seconds instead of wall-seconds. A subsequent commit will disambiguate the two types of "s". Issue: sagemathGH-32981
After switching the doctests to report cputime instead of walltime, the running-time measurements for tests using pexpect interfaces are off because the time spent in subprocesses is not accounted for. Fixing this turns out not to be easy: the cputime() function in sage.misc.timing can try to account for them, but it is essentially asking the subprocess to compute its own cputime. This has faults: 1. Not every pexpect interface implemented (or can implement) such a method. 2. The pexpect interface needs to be in good working condition before we can ask the subprocess to do anything. That's often not the case at the end of a "sage: ..." line in a doctest. Particularly it's not true in all of the tests for low-level details of our pexpect interface itself. Instead, this commit parses /proc/<pid>/stat to obtain the cputime numbers for pexpext processes. This works well with one caveat: that information is only available on Linux and BSD. Having a solution that only works on some platforms is not actually too problematic. The motivation for measuring these times in the doctest framework is that someone should notice an outrageously long test and fix or report it. So, it's enough that the measurements be accurate on only some platforms -- particularly where the CI is run. Furthermore, the information is only both missing and potentially useful for long-running pexpect tests, which are relatively rare. It would be possible to implement the same thing for macOS using its own platform-specific interface if anyone is motivated to do so. It would also make sense to use psutil to obtain this information once we have upgraded to a version of ipykernel that reintroduces it as a dependency (see sagemathGH-33772). Issue: sagemathGH-32981
We need to report the CPU time for tests that run afoul of --warn-long, because --warn-long is now specified in CPU time. But, we are still reporting the wall time in several places, such as the per-file summary. So to keep things as consistent as possible, we should report the wall time for --warn-long failures as well. This changes the output of a too-long test from e.g. Test ran for 1.23 cpu seconds, check ran for 2.34 cpu seconds to, Test ran for 1.23s cpu, 2.50s wall Check ran for 2.34s cpu, 3.12s wall Issue: sagemathGH-32981
Now that cpu seconds are used for --warn-long, we should disambiguate wherever wall seconds are still used. One such place is in the doctest runner's per-file report, which outputs (for example) [14 tests, 0.26 s] after testing a file with 14 tests that took 0.26 wall seconds. This commit changes that output to [14 tests, 0.26s wall] which is explicit about the type of measurement used. Issue: sagemathGH-32981
Now that cpu seconds are used for --warn-long, we should disambiguate wherever wall seconds are still used. One such place is in the verbose test runner output, which outputs (for example) ok [3.83 s] after a test that took 3.83 wall seconds. This commit changes that output to ok [3.83s wall] which is unmistakable. Closes: sagemathGH-32981
Finally address sagemath#32981. Using wall time to measure which tests are "long" doesn't make much sense. Wall time is subject to every conflating factor: 1. Faster hardware runs tests in less wall time 2. Faster compiler flags makes code that runs in less wall time 3. A busy machine will run tests slower than an unburdened one 4. Operating system priority can increase or decrease the time taken by a process 5. etc. Ideally, everyone would agree on how long is too long for a test to run. Attaining that ideal is not so easy, but the first step is to switch`--warn-long` away from using wall time and towards using CPU time. A second phase (sagemath#33022) will normalize the CPU time. ### Examples #### --warn-long output The user-facing portion of this can be tested with a low `--warn-long` value, ``` $ sage -t --warn-long=1 src/sage/rings/qqbar.py ... Doctesting 1 file. sage -t --warn-long --random- seed=309742757431591512726158538133919135994 src/sage/rings/qqbar.py ********************************************************************** File "src/sage/rings/qqbar.py", line 511, in sage.rings.qqbar Warning, slow doctest: z, = [r for r in p4.roots(QQbar, False) if r in ival] Test ran for 1.51s cpu, 1.51s wall Check ran for 0.00s cpu, 0.00s wall ********************************************************************** File "src/sage/rings/qqbar.py", line 524, in sage.rings.qqbar Warning, slow doctest: z2 = QQbar.polynomial_root(p4, ival) Test ran for 1.07s cpu, 1.07s wall Check ran for 0.00s cpu, 0.00s wall ********************************************************************** File "src/sage/rings/qqbar.py", line 808, in sage.rings.qqbar.AlgebraicField_common._factor_multivariate_polynomial Warning, slow doctest: F = QQbar._factor_multivariate_polynomial(p) Test ran for 2.38s cpu, 2.42s wall Check ran for 0.00s cpu, 0.00s wall ... ``` #### pexpect accounting Two different methods are used to account for the CPU time, one for linux, and one for everything else. This GAP command uses pexpect and should chew up a good bit of CPU time: ``` sage: from sage.doctest.util import Timer sage: t = Timer() sage: _ = t.start(); t.stop() # no CPU time used between start/stop {'cputime': 0.0, 'walltime': 7.605552673339844e-05} sage: _ = t.start(); gap.eval('a:=List([0..10000],i->WordAlp("a",i));; IsSortedList(a);'); t.stop() # lots of CPU time used 'true' {'cputime': 30.089999999999996, 'walltime': 36.292088747024536} ``` URL: sagemath#36226 Reported by: Michael Orlitzky Reviewer(s): Gonzalo Tornaría, John H. Palmieri, Michael Orlitzky, Tobias Diez
As discussed in #32973, the default value of
--warn-long
is roughly a minute, independent of the--long
flag, and is only enabled when timing information from previous test runs is available. Our developers' guide suggests that even a "long time" test should complete in under five seconds:https://doc.sagemath.org/html/en/developer/doctesting.html#optional-arguments
This ticket addresses two problems:
--long
is not in use.Both of these issues stem from using the "wall time" to measure how long a test has run. Wall time is inherently unstable; if you're watching a movie, the tests will take longer. On a slower computer, the tests will take longer. Et cetera. The use of timing information from previous runs and the obscenely high
--warn-long
limit partially address that.So in targeting our two issues, we first aim to switch the timing reports and
--warn-long
flag to use CPU time instead of wall time. CPU time is more reliable, since it only reports time actually spent working on the problem. (It's still sensitive to things like physical CPU time andCFLAGS
; those will be addressed in #33022). With CPU time being used, the timings will be much more consistent, allowing us to lower the default value of--warn-long
to 10 CPU-seconds when--long
was not given.The main difficulty is in accounting for CPU time spent in pexpect subprocesses. We don't
wait()
for those and they may not terminate, so the OS's usual methods of retrieving child process statistics don't work for them. At present we have a PID-based solution that works quickly on Linux/BSD, but something similar for macOS would be nice.CC: @vbraun @tornaria
Component: doctest framework
Work Issues: cputime by PID on macOS
Author: Michael Orlitzky
Branch/Commit: u/mjo/ticket/32981 @
d7d2cff
Reviewer: https://github.com/sagemath/sage/actions/runs/1992568379
Issue created by migration from https://trac.sagemath.org/ticket/32981
The text was updated successfully, but these errors were encountered: