Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CPU time for --warn-long and a low default limit for non-long tests #32981

Closed
orlitzky opened this issue Dec 6, 2021 · 57 comments
Closed

Comments

@orlitzky
Copy link
Contributor

orlitzky commented Dec 6, 2021

As discussed in #32973, the default value of --warn-long is roughly a minute, independent of the --long flag, and is only enabled when timing information from previous test runs is available. Our developers' guide suggests that even a "long time" test should complete in under five seconds:

https://doc.sagemath.org/html/en/developer/doctesting.html#optional-arguments

This ticket addresses two problems:

  1. The timing information is often unavailable. We should still warn about long tests in that case.
  2. The default of 60 CPU-equivalent seconds is much too high when --long is not in use.

Both of these issues stem from using the "wall time" to measure how long a test has run. Wall time is inherently unstable; if you're watching a movie, the tests will take longer. On a slower computer, the tests will take longer. Et cetera. The use of timing information from previous runs and the obscenely high --warn-long limit partially address that.

So in targeting our two issues, we first aim to switch the timing reports and --warn-long flag to use CPU time instead of wall time. CPU time is more reliable, since it only reports time actually spent working on the problem. (It's still sensitive to things like physical CPU time and CFLAGS; those will be addressed in #33022). With CPU time being used, the timings will be much more consistent, allowing us to lower the default value of --warn-long to 10 CPU-seconds when --long was not given.

The main difficulty is in accounting for CPU time spent in pexpect subprocesses. We don't wait() for those and they may not terminate, so the OS's usual methods of retrieving child process statistics don't work for them. At present we have a PID-based solution that works quickly on Linux/BSD, but something similar for macOS would be nice.

CC: @vbraun @tornaria

Component: doctest framework

Work Issues: cputime by PID on macOS

Author: Michael Orlitzky

Branch/Commit: u/mjo/ticket/32981 @ d7d2cff

Reviewer: https://github.com/sagemath/sage/actions/runs/1992568379

Issue created by migration from https://trac.sagemath.org/ticket/32981

@orlitzky orlitzky added this to the sage-9.5 milestone Dec 6, 2021
@orlitzky
Copy link
Contributor Author

orlitzky commented Dec 6, 2021

comment:1

Volker, any idea why we used wall time for the limit rather than CPU time? It's making it a bit awkward to pick a multiplier when no timing information is available (which is usually the case for me).

@orlitzky
Copy link
Contributor Author

orlitzky commented Dec 6, 2021

Branch: u/mjo/ticket/32981

@orlitzky
Copy link
Contributor Author

orlitzky commented Dec 6, 2021

Commit: 6c68187

@orlitzky
Copy link
Contributor Author

orlitzky commented Dec 6, 2021

Author: Michael Orlitzky

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 8, 2021

Changed commit from 6c68187 to 2959d09

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 8, 2021

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

2959d09Trac #32981: decrease --warn-long limit for non-long tests.

@orlitzky
Copy link
Contributor Author

orlitzky commented Dec 8, 2021

comment:3

This is a bit too lenient IMO... allowing 15s for a non-long test when no timing information is available. Still, it's an improvement.

@orlitzky
Copy link
Contributor Author

orlitzky commented Dec 8, 2021

comment:4

I'm just going to put in the effort to switch this to CPU time, which makes a lot more sense for a time measurement that needs to be consistent across machines.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 8, 2021

Changed commit from 2959d09 to 66c4cc8

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 8, 2021

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

0de97acTrac #32981: use CPU time for --warn-long.
4b9b07bTrac #32981: decrease --warn-long limit for non-long tests.
66c4cc8Trac #32981: use a faster file to test the doctest framework.

@orlitzky

This comment has been minimized.

@orlitzky
Copy link
Contributor Author

orlitzky commented Dec 8, 2021

New commits:

0de97acTrac #32981: use CPU time for --warn-long.
4b9b07bTrac #32981: decrease --warn-long limit for non-long tests.
66c4cc8Trac #32981: use a faster file to test the doctest framework.

@orlitzky orlitzky changed the title Reduce --warn-long limit for normal (not long-time) test runs Use CPU time for --warn-long and a low default limit for non-long tests Dec 8, 2021
@orlitzky

This comment has been minimized.

@mkoeppe mkoeppe modified the milestones: sage-9.5, sage-9.6 Jan 10, 2022
@mwageringel
Copy link

comment:9

One reason for using wall time instead of CPU time might be that it works better for computations that use external processes like Pexpect:

sage: R = PolynomialRing(GF(101), 'x', 8)
....: J = sage.rings.ideal.Cyclic(R)
....: %time gb = J.groebner_basis(algorithm='singular')
CPU times: user 917 ms, sys: 52.6 ms, total: 969 ms
Wall time: 33.2 s

@jhpalmieri
Copy link
Member

comment:10

Maybe measure both and pay attention to wall time only if it is much longer than CPU time. I don't know if "much longer" should be a raw time (5 seconds longer) or a multiple (10 times as long). We want informative messages, so this doesn't have to be flawless.

@orlitzky
Copy link
Contributor Author

orlitzky commented Feb 8, 2022

comment:11

Well.. that's embarrassing. I'll have to think about it.

orlitzky added a commit to orlitzky/sage that referenced this issue Apr 16, 2024
Our --warn-long flag used during testing currently takes "wall time"
as its argument. This is somewhat nice from a user perspective, since
a normal user is quite likely to know what what wall time is, but it's
also a bit nonsensical given the purpose of the flag. The flag's
intent is to warn developers about tests that take too long, so
whether or not it triggers should not depend on the developer's
machine, or what else he's doing while the tests are running -- both
of which do affect the wall time.

CPU time provides a more reliable measurement. (In fact, when no value
for --wall-time was given, we were already using a heuristic to mimic
CPU time.) This commit changes the --warn-long flag to use CPU time
instead of wall time, obviating the need for the heuristic.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Apr 16, 2024
Update the documentation to reflect that we are now using CPU time
(and not wall time) to measure the length of a test.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Apr 16, 2024
The --warn-long flag now takes CPU seconds (not wall seconds) as its
argument. Here we update some command-line output to be explicit about
this fact.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Apr 16, 2024
The default value of 60 cpu seconds for a normal test is too high: the
documentation suggests that a non-long test should complete in about
five seconds. So, in this commit we change the default --warn-long
value to 10 cpu seconds if the --long flag was not given.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Apr 16, 2024
Since we're now warning about tests that take too many cpu-seconds (as
opposed to too many wall-seconds), it makes sense to report the time
taken in cpu-seconds as well. This allows you to better judge how
close a test is to being a "long" test. The output format remains
unchanged, for example,

  ok [1.02 s]

but now the "s" refers to cpu-seconds instead of wall-seconds. A
subsequent commit will disambiguate the two types of "s".

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Apr 16, 2024
After switching the doctests to report cputime instead of walltime,
the running-time measurements for tests using pexpect interfaces are
off because the time spent in subprocesses is not accounted
for. Fixing this turns out not to be easy: the cputime() function in
sage.misc.timing can try to account for them, but it is essentially
asking the subprocess to compute its own cputime. This has faults:

  1. Not every pexpect interface implemented (or can implement)
     such a method.

  2. The pexpect interface needs to be in good working condition
     before we can ask the subprocess to do anything. That's often
     not the case at the end of a "sage: ..." line in a doctest.
     Particularly it's not true in all of the tests for low-level
     details of our pexpect interface itself.

Instead, this commit parses /proc/<pid>/stat to obtain the cputime
numbers for pexpext processes. This works well with one caveat: that
information is only available on Linux and BSD.

Having a solution that only works on some platforms is not actually
too problematic. The motivation for measuring these times in the
doctest framework is that someone should notice an outrageously long
test and fix or report it. So, it's enough that the measurements be
accurate on only some platforms -- particularly where the CI is
run. Furthermore, the information is only both missing and potentially
useful for long-running pexpect tests, which are relatively rare.

It would be possible to implement the same thing for macOS using its
own platform-specific interface if anyone is motivated to do so. It
would also make sense to use psutil to obtain this information once
we have upgraded to a version of ipykernel that reintroduces it as
a dependency (see sagemathGH-33772).

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Apr 16, 2024
We need to report the CPU time for tests that run afoul of
--warn-long, because --warn-long is now specified in CPU time. But, we
are still reporting the wall time in several places, such as the
per-file summary. So to keep things as consistent as possible, we
should report the wall time for --warn-long failures as well. This
changes the output of a too-long test from e.g.

  Test ran for 1.23 cpu seconds, check ran for 2.34 cpu seconds

to,

  Test ran for 1.23s cpu, 2.50s wall
  Check ran for 2.34s cpu, 3.12s wall

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Apr 16, 2024
Now that cpu seconds are used for --warn-long, we should disambiguate
wherever wall seconds are still used. One such place is in the doctest
runner's per-file report, which outputs (for example)

  [14 tests, 0.26 s]

after testing a file with 14 tests that took 0.26 wall seconds. This
commit changes that output to

    [14 tests, 0.26s wall]

which is explicit about the type of measurement used.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Apr 16, 2024
Now that cpu seconds are used for --warn-long, we should disambiguate
wherever wall seconds are still used. One such place is in the verbose
test runner output, which outputs (for example)

  ok [3.83 s]

after a test that took 3.83 wall seconds. This commit changes that
output to

  ok [3.83s wall]

which is unmistakable.

Closes: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 11, 2024
Our --warn-long flag used during testing currently takes "wall time"
as its argument. This is somewhat nice from a user perspective, since
a normal user is quite likely to know what what wall time is, but it's
also a bit nonsensical given the purpose of the flag. The flag's
intent is to warn developers about tests that take too long, so
whether or not it triggers should not depend on the developer's
machine, or what else he's doing while the tests are running -- both
of which do affect the wall time.

CPU time provides a more reliable measurement. (In fact, when no value
for --wall-time was given, we were already using a heuristic to mimic
CPU time.) This commit changes the --warn-long flag to use CPU time
instead of wall time, obviating the need for the heuristic.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 11, 2024
Update the documentation to reflect that we are now using CPU time
(and not wall time) to measure the length of a test.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 11, 2024
The --warn-long flag now takes CPU seconds (not wall seconds) as its
argument. Here we update some command-line output to be explicit about
this fact.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 11, 2024
The default value of 60 cpu seconds for a normal test is too high: the
documentation suggests that a non-long test should complete in about
five seconds. So, in this commit we change the default --warn-long
value to 10 cpu seconds if the --long flag was not given.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 11, 2024
Since we're now warning about tests that take too many cpu-seconds (as
opposed to too many wall-seconds), it makes sense to report the time
taken in cpu-seconds as well. This allows you to better judge how
close a test is to being a "long" test. The output format remains
unchanged, for example,

  ok [1.02 s]

but now the "s" refers to cpu-seconds instead of wall-seconds. A
subsequent commit will disambiguate the two types of "s".

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 11, 2024
After switching the doctests to report cputime instead of walltime,
the running-time measurements for tests using pexpect interfaces are
off because the time spent in subprocesses is not accounted
for. Fixing this turns out not to be easy: the cputime() function in
sage.misc.timing can try to account for them, but it is essentially
asking the subprocess to compute its own cputime. This has faults:

  1. Not every pexpect interface implemented (or can implement)
     such a method.

  2. The pexpect interface needs to be in good working condition
     before we can ask the subprocess to do anything. That's often
     not the case at the end of a "sage: ..." line in a doctest.
     Particularly it's not true in all of the tests for low-level
     details of our pexpect interface itself.

Instead, this commit parses /proc/<pid>/stat to obtain the cputime
numbers for pexpext processes. This works well with one caveat: that
information is only available on Linux and BSD.

Having a solution that only works on some platforms is not actually
too problematic. The motivation for measuring these times in the
doctest framework is that someone should notice an outrageously long
test and fix or report it. So, it's enough that the measurements be
accurate on only some platforms -- particularly where the CI is
run. Furthermore, the information is only both missing and potentially
useful for long-running pexpect tests, which are relatively rare.

It would be possible to implement the same thing for macOS using its
own platform-specific interface if anyone is motivated to do so. It
would also make sense to use psutil to obtain this information once
we have upgraded to a version of ipykernel that reintroduces it as
a dependency (see sagemathGH-33772).

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 11, 2024
We need to report the CPU time for tests that run afoul of
--warn-long, because --warn-long is now specified in CPU time. But, we
are still reporting the wall time in several places, such as the
per-file summary. So to keep things as consistent as possible, we
should report the wall time for --warn-long failures as well. This
changes the output of a too-long test from e.g.

  Test ran for 1.23 cpu seconds, check ran for 2.34 cpu seconds

to,

  Test ran for 1.23s cpu, 2.50s wall
  Check ran for 2.34s cpu, 3.12s wall

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 11, 2024
Now that cpu seconds are used for --warn-long, we should disambiguate
wherever wall seconds are still used. One such place is in the doctest
runner's per-file report, which outputs (for example)

  [14 tests, 0.26 s]

after testing a file with 14 tests that took 0.26 wall seconds. This
commit changes that output to

    [14 tests, 0.26s wall]

which is explicit about the type of measurement used.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 11, 2024
Now that cpu seconds are used for --warn-long, we should disambiguate
wherever wall seconds are still used. One such place is in the verbose
test runner output, which outputs (for example)

  ok [3.83 s]

after a test that took 3.83 wall seconds. This commit changes that
output to

  ok [3.83s wall]

which is unmistakable.

Closes: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 12, 2024
Our --warn-long flag used during testing currently takes "wall time"
as its argument. This is somewhat nice from a user perspective, since
a normal user is quite likely to know what what wall time is, but it's
also a bit nonsensical given the purpose of the flag. The flag's
intent is to warn developers about tests that take too long, so
whether or not it triggers should not depend on the developer's
machine, or what else he's doing while the tests are running -- both
of which do affect the wall time.

CPU time provides a more reliable measurement. (In fact, when no value
for --wall-time was given, we were already using a heuristic to mimic
CPU time.) This commit changes the --warn-long flag to use CPU time
instead of wall time, obviating the need for the heuristic.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 12, 2024
Update the documentation to reflect that we are now using CPU time
(and not wall time) to measure the length of a test.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 12, 2024
The --warn-long flag now takes CPU seconds (not wall seconds) as its
argument. Here we update some command-line output to be explicit about
this fact.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 12, 2024
The default value of 60 cpu seconds for a normal test is too high: the
documentation suggests that a non-long test should complete in about
five seconds. So, in this commit we change the default --warn-long
value to 10 cpu seconds if the --long flag was not given.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 12, 2024
Since we're now warning about tests that take too many cpu-seconds (as
opposed to too many wall-seconds), it makes sense to report the time
taken in cpu-seconds as well. This allows you to better judge how
close a test is to being a "long" test. The output format remains
unchanged, for example,

  ok [1.02 s]

but now the "s" refers to cpu-seconds instead of wall-seconds. A
subsequent commit will disambiguate the two types of "s".

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 12, 2024
After switching the doctests to report cputime instead of walltime,
the running-time measurements for tests using pexpect interfaces are
off because the time spent in subprocesses is not accounted
for. Fixing this turns out not to be easy: the cputime() function in
sage.misc.timing can try to account for them, but it is essentially
asking the subprocess to compute its own cputime. This has faults:

  1. Not every pexpect interface implemented (or can implement)
     such a method.

  2. The pexpect interface needs to be in good working condition
     before we can ask the subprocess to do anything. That's often
     not the case at the end of a "sage: ..." line in a doctest.
     Particularly it's not true in all of the tests for low-level
     details of our pexpect interface itself.

Instead, this commit parses /proc/<pid>/stat to obtain the cputime
numbers for pexpext processes. This works well with one caveat: that
information is only available on Linux and BSD.

Having a solution that only works on some platforms is not actually
too problematic. The motivation for measuring these times in the
doctest framework is that someone should notice an outrageously long
test and fix or report it. So, it's enough that the measurements be
accurate on only some platforms -- particularly where the CI is
run. Furthermore, the information is only both missing and potentially
useful for long-running pexpect tests, which are relatively rare.

It would be possible to implement the same thing for macOS using its
own platform-specific interface if anyone is motivated to do so. It
would also make sense to use psutil to obtain this information once
we have upgraded to a version of ipykernel that reintroduces it as
a dependency (see sagemathGH-33772).

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 12, 2024
We need to report the CPU time for tests that run afoul of
--warn-long, because --warn-long is now specified in CPU time. But, we
are still reporting the wall time in several places, such as the
per-file summary. So to keep things as consistent as possible, we
should report the wall time for --warn-long failures as well. This
changes the output of a too-long test from e.g.

  Test ran for 1.23 cpu seconds, check ran for 2.34 cpu seconds

to,

  Test ran for 1.23s cpu, 2.50s wall
  Check ran for 2.34s cpu, 3.12s wall

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 12, 2024
Now that cpu seconds are used for --warn-long, we should disambiguate
wherever wall seconds are still used. One such place is in the doctest
runner's per-file report, which outputs (for example)

  [14 tests, 0.26 s]

after testing a file with 14 tests that took 0.26 wall seconds. This
commit changes that output to

    [14 tests, 0.26s wall]

which is explicit about the type of measurement used.

Issue: sagemathGH-32981
orlitzky added a commit to orlitzky/sage that referenced this issue Oct 12, 2024
Now that cpu seconds are used for --warn-long, we should disambiguate
wherever wall seconds are still used. One such place is in the verbose
test runner output, which outputs (for example)

  ok [3.83 s]

after a test that took 3.83 wall seconds. This commit changes that
output to

  ok [3.83s wall]

which is unmistakable.

Closes: sagemathGH-32981
vbraun pushed a commit to vbraun/sage that referenced this issue Oct 26, 2024
    
Finally address sagemath#32981.

Using wall time to measure which tests are "long" doesn't make much
sense. Wall time is subject to every conflating factor:

1. Faster hardware runs tests in less wall time
2. Faster compiler flags makes code that runs in less wall time
3. A busy machine will run tests slower than an unburdened one
4. Operating system priority can increase or decrease the time taken by
a process
5. etc.

Ideally, everyone would agree on how long is too long for a test to run.
Attaining that ideal is not so easy, but the first step is to
switch`--warn-long` away from using wall time and towards using CPU
time. A second phase (sagemath#33022)
will normalize the CPU time.


### Examples

#### --warn-long output

The user-facing portion of this can be tested with a low `--warn-long`
value,

```
$ sage -t --warn-long=1 src/sage/rings/qqbar.py
...
Doctesting 1 file.
sage -t --warn-long --random-
seed=309742757431591512726158538133919135994 src/sage/rings/qqbar.py
**********************************************************************
File "src/sage/rings/qqbar.py", line 511, in sage.rings.qqbar
Warning, slow doctest:
    z, = [r for r in p4.roots(QQbar, False) if r in ival]
Test ran for 1.51s cpu, 1.51s wall
Check ran for 0.00s cpu, 0.00s wall
**********************************************************************
File "src/sage/rings/qqbar.py", line 524, in sage.rings.qqbar
Warning, slow doctest:
    z2 = QQbar.polynomial_root(p4, ival)
Test ran for 1.07s cpu, 1.07s wall
Check ran for 0.00s cpu, 0.00s wall
**********************************************************************
File "src/sage/rings/qqbar.py", line 808, in
sage.rings.qqbar.AlgebraicField_common._factor_multivariate_polynomial
Warning, slow doctest:
    F = QQbar._factor_multivariate_polynomial(p)
Test ran for 2.38s cpu, 2.42s wall
Check ran for 0.00s cpu, 0.00s wall

...
```

#### pexpect accounting

Two different methods are used to account for the CPU time, one for
linux, and one for everything else. This GAP command uses pexpect and
should chew up a good bit of CPU time:

```
sage: from sage.doctest.util import Timer
sage: t = Timer()
sage: _ = t.start(); t.stop()    # no CPU time used between start/stop
{'cputime': 0.0, 'walltime': 7.605552673339844e-05}

sage: _ = t.start(); gap.eval('a:=List([0..10000],i->WordAlp("a",i));;
IsSortedList(a);'); t.stop()    # lots of CPU time used
'true'
{'cputime': 30.089999999999996, 'walltime': 36.292088747024536}
```
    
URL: sagemath#36226
Reported by: Michael Orlitzky
Reviewer(s): Gonzalo Tornaría, John H. Palmieri, Michael Orlitzky, Tobias Diez
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants