Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running tests in parallel on Windows quits too soon #95027

Closed
terryjreedy opened this issue Jul 19, 2022 · 34 comments
Closed

Running tests in parallel on Windows quits too soon #95027

terryjreedy opened this issue Jul 19, 2022 · 34 comments
Labels
3.11 only security fixes 3.12 bugs and security fixes OS-windows release-blocker tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@terryjreedy
Copy link
Member

On my Win10 the test suite completes when run serially. But with main and 3.11, but not 3.10, it quits too soon with -j0. This has occured with both repository debug builds and installed 3.11.0b4. Failure is currently deterministic with variable details. Presence of -ugui or -uall has no apparent effect.

What happens is that roughly about 100 tests before the end, a 'regrtest worker thread' fails ('warning') with UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position <variable>: invalid start byte. The 14 worker processes are stopped and the usual summary is given, but with an additional list of ' tests omitted <list of test names'. The test is called a 'SUCCESS'. This is followed by a traceback for SystemExit(0), followed by 1 or more tracebacks for PermissionError because a temporary test file is supposedly used by another process.

Attaching a file with output starting with the initial warning fails. Will paste separately.

If this is not limited to my system, I think it should be a release blocker.

@terryjreedy terryjreedy added type-bug An unexpected behavior, bug, or error tests Tests in the Lib/test dir 3.11 only security fixes 3.12 bugs and security fixes labels Jul 19, 2022
@terryjreedy
Copy link
Member Author

Sample failure.

  -- running: test_asyncio (1 min 20 sec), test_distutils (1 min 2 sec),
  test_concurrent_futures (1 min 9 sec), test_multiprocessing_spawn (32.3 sec)
Warning -- regrtest worker thread failed: Traceback (most recent call last):
Warning --   File "C:\Programs\Python311\Lib\test\libregrtest\runtest_mp.py", line 305, in run
Warning --     mp_result = self._runtest(test_name)
Warning --                 ^^^^^^^^^^^^^^^^^^^^^^^^
Warning --   File "C:\Programs\Python311\Lib\test\libregrtest\runtest_mp.py", line 272, in _runtest
Warning --     stdout = stdout_fh.read().strip()
Warning --              ^^^^^^^^^^^^^^^^
Warning --   File "C:\Programs\Python311\Lib\tempfile.py", line 483, in func_wrapper
Warning --     return func(*args, **kwargs)
Warning --            ^^^^^^^^^^^^^^^^^^^^^
Warning --   File "<frozen codecs>", line 322, in decode
Warning -- UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 1332: invalid start byte
Kill <TestWorkerProcess #1 running test=test_ssl pid=15188 time=10.9 sec>
Kill <TestWorkerProcess #2 running test=test_tomllib pid=11616 time=782 ms>
Kill <TestWorkerProcess #3 running test=test_socket pid=17624 time=11.5 sec>
Kill <TestWorkerProcess #4 running test=test_asyncio pid=3644 time=1 min 21 sec>
Kill <TestWorkerProcess #5 running test=test_tools pid=8556 time=422 ms>
Kill <TestWorkerProcess #7 running test=test_concurrent_futures pid=17068 time=1 min 10 sec>
Kill <TestWorkerProcess #8 running test=test_tk pid=18116 time=1.8 sec>
Kill <TestWorkerProcess #9 running test=test_tarfile pid=444 time=7.3 sec>
Kill <TestWorkerProcess #10 running test=test_trace pid=9376 time=297 ms>
Kill <TestWorkerProcess #11 running test=test_threading pid=11716 time=5.0 sec>
Kill <TestWorkerProcess #12 running test=test_multiprocessing_spawn pid=2664 time=32.6 sec>
Kill <TestWorkerProcess #13 running test=test_subprocess pid=14384 time=8.9 sec>
Kill <TestWorkerProcess #14 running test=test_telnetlib pid=11752 time=6.9 sec>

== Tests result: SUCCESS ==

79 tests omitted:
    test_asyncio test_concurrent_futures test_distutils
    test_multiprocessing_spawn test_socket test_ssl test_subprocess
    test_tarfile test_telnetlib test_threading test_tk test_tomllib
    test_tools test_trace test_traceback test_tracemalloc
    test_ttk_guionly test_ttk_textonly test_tuple test_turtle
    test_type_annotations test_type_cache test_type_comments
    test_typechecks test_typing test_ucn test_unary test_unicode
    test_unicode_file test_unicode_file_functions
    test_unicode_identifiers test_unicodedata test_univnewlines
    test_unpack test_unpack_ex test_unparse test_urllib test_urllib2
    test_urllib2_localnet test_urllib2net test_urllib_response
    test_urllibnet test_urlparse test_userdict test_userlist
    test_userstring test_utf8_mode test_utf8source test_uu test_uuid
    test_venv test_wait3 test_wait4 test_warnings test_wave
    test_weakref test_weakset test_webbrowser test_winconsoleio
    test_winreg test_winsound test_with test_wsgiref test_xdrlib
    test_xml_dom_minicompat test_xml_etree test_xml_etree_c
    test_xmlrpc test_xmlrpc_net test_xxlimited test_xxtestfuzz
    test_yield_from test_zipapp test_zipfile test_zipfile64
    test_zipimport test_zipimport_support test_zlib test_zoneinfo

326 tests OK.

31 tests skipped:
    test_asdl_parser test_check_c_globals test_clinic test_curses
    test_dbm_gnu test_dbm_ndbm test_devpoll test_epoll test_fcntl
    test_fork1 test_gdb test_grp test_ioctl test_kqueue
    test_multiprocessing_fork test_multiprocessing_forkserver test_nis
    test_openpty test_ossaudiodev test_pipes test_poll test_posix
    test_pty test_pwd test_readline test_resource test_smtpnet
    test_socketserver test_spwd test_syslog test_threadsignals

Total duration: 1 min 22 sec
Tests result: SUCCESS
Traceback (most recent call last):
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 478, in temp_dir
    yield path
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 531, in temp_cwd
    yield cwd_dir
  File "C:\Programs\Python311\Lib\test\libregrtest\main.py", line 701, in main
    self._main(tests, kwargs)
  File "C:\Programs\Python311\Lib\test\libregrtest\main.py", line 758, in _main
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Programs\Python311\Lib\test\support\__init__.py", line 201, in _force_run
    return func(*args)
           ^^^^^^^^^^^
PermissionError: [WinError 32] The process cannot access the file because it is
 being used by another process:
 'C:\\Users\\Terry\\AppData\\Local\\Temp\\test_python_6764æ\\test_python_worker_17068æ'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Programs\Python311\Lib\test\__main__.py", line 2, in <module>
    main()
  File "C:\Programs\Python311\Lib\test\libregrtest\main.py", line 763, in main
    Regrtest().main(tests=tests, **kwargs)
  File "C:\Programs\Python311\Lib\test\libregrtest\main.py", line 695, in main
    with os_helper.temp_cwd(test_cwd, quiet=True):
  File "C:\Programs\Python311\Lib\contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 529, in temp_cwd
    with temp_dir(path=name, quiet=quiet) as temp_path:
  File "C:\Programs\Python311\Lib\contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 483, in temp_dir
    rmtree(path)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 440, in rmtree
    _rmtree(path)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 383, in _rmtree
    _waitfor(_rmtree_inner, path, waitall=True)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 328, in _waitfor
    func(pathname)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 380, in _rmtree_inner
    _force_run(fullname, os.rmdir, fullname)
  File "C:\Programs\Python311\Lib\test\support\__init__.py", line 212, in _force_run
    return func(*args)
           ^^^^^^^^^^^
PermissionError: [WinError 32] The process cannot access the file because it is
 being used by another process:
 'C:\\Users\\Terry\\AppData\\Local\\Temp\\test_python_6764æ\\test_python_worker_17068æ'

@neonene
Copy link
Contributor

neonene commented Jul 28, 2022

Probably cp1252 is used as system locale on your OS which emits an error message with non-ascii characters such as 0x91 (left single quotation). If so, switching to utf-8 locale is the easiest way to finish running tests after a few retries:

https://websiteforstudents.com/how-to-change-system-locale-in-windows-11/

Another workaround would be using 'wb+' mode in TestWorkerProcess._runtest() to account for non-utf8 text from subprocess:

def _runtest(self, test_name: str) -> MultiprocessResult:
# gh-94026: Write stdout+stderr to a tempfile as workaround for
# non-blocking pipes on Emscripten with NodeJS.
with tempfile.TemporaryFile(
'w+', encoding=sys.stdout.encoding
) as stdout_fh:

I'm not sure the root cause of the race condition when running test_asyncio and test_concurrent_futures.

@neonene
Copy link
Contributor

neonene commented Jul 29, 2022

Maybe related: gh-91227

@neonene
Copy link
Contributor

neonene commented Jul 29, 2022

See also: gh-91323 (specific to 3.11 and main)

@terryjreedy
Copy link
Member Author

terryjreedy commented Aug 18, 2022

Multiprocessing people: due to some regression in 3.11/2, parallel tests on ma updated by otherwise pretty stock American Win10 started failing 29 days ago. They still fail today with essentially the same traceback. I believe they ran not too many months before.

@pablogsal Today, sequential tests also fail by hanging for hours in test_winconsoleio, after taking 81 minutes to get that far
1:21:51 [415/435] test_winconsoleio. (The suite once ran in 18 minutes on my machine.) test_repl alone took 34 minutes.

EDIT: This is with plain python -m test (with -j0 added for parallel).
EDIT2: On rerun, sequential tests ran OK in 55 minutes with 40 skipped. Will rerun again to make sure.

@pablogsal
Copy link
Member

@pablogsal Today, sequential tests also fail by hanging for hours in test_winconsoleio, after taking 81 minutes to get that far
1:21:51 [415/435] test_winconsoleio. (The suite once ran in 18 minutes on my machine.) test_repl alone took 34 minutes.
Unfortunately I don't have a windows machine :( Could you bisect to find the commit that introduced the multiprocessing regression? I don't think that area changed a lot during 3.11/3.12

@terryjreedy
Copy link
Member Author

The second sequential run was again fine, so forget winconsoleio. (I have no idea why the first run could have gone so badly.) Do the tests run in parallel on your non-windows machine?
@zooba Can you try -j0 on your Windows machine to verify that this is not a local-only problem?

To me, the output pasted above reveals two bugs in the testing program.

  1. The premature shutdown is called a success rather than a failure.
  2. Exiting after (wrongly) reporting success results in more exceptions.

git bisect wants a command that returns a 0/non-0 exit code. Though it would not help here, due to the fake 'success', is there a way to run regrtest and suppress printing and get an exit code instead? I looked at the 'Special runs;' options can could not find anything.

I do not know git beyond the devguide chapter, so I would need some coaching even to find a good version for bisect (other than by manually downloading and re-installing earlier releases). What would be a good way to get an exit-code command/script?

@pablogsal
Copy link
Member

You can run git bisect manually and inspect every commit and then run git bisect bad/good depending if it works or not. Is slower but more reliable.

@zooba
Copy link
Member

zooba commented Aug 18, 2022

It's unlikely to be UTF-8 vs mbcs, but could well be that it's decoding with 'strict' instead of fsdecode (which uses 'surrogateescape' to avoid exactly this issue). Though if the change to write via a temp file instead of stdout directly is recent, I agree that's an interesting place to start looking.

The additional error appears to be in the rmtree helper in the test library, which is supposed to handle these issues and retry them:

  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 483, in temp_dir
    rmtree(path)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 440, in rmtree
    _rmtree(path)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 383, in _rmtree
    _waitfor(_rmtree_inner, path, waitall=True)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 328, in _waitfor
    func(pathname)
  File "C:\Programs\Python311\Lib\test\support\os_helper.py", line 380, in _rmtree_inner
    _force_run(fullname, os.rmdir, fullname)
  File "C:\Programs\Python311\Lib\test\support\__init__.py", line 212, in _force_run
    return func(*args)

@gvanrossum gvanrossum changed the title Running tests in parallel quits too soon Running tests in parallel on Windows quits too soon Sep 6, 2022
@terryjreedy
Copy link
Member Author

@zooba Did you or anyone else verify this problem on a machine other than mine? Windows or otherwise?

At this point, I do not think that this should be a release blocker. If we are really concerned about tests running on installations, we need a more concerted effort to test installers. I installed the rc1 on my Macbook Air, ran the test suite both serial and parallel, and it failed both times with a malloc error. I ran test_idle alone and it hangs on the second call to a IDLE test help function. It does the same in the current 3.10 and 3.9 (prior to anything released today). (There is something odd in the debug print behavior, so I still do not know exactly where the failure is.) These two failures are different from the one reported here.

@zooba
Copy link
Member

zooba commented Sep 7, 2022

Sorry for the delay, I do get the same result as you when testing locally, even when running tests from the build directory.

Haven't got any more details than my analysis above - I'm trying again with less strict encoding errors to see what happens.

zooba added a commit to zooba/cpython that referenced this issue Sep 7, 2022
zooba added a commit to zooba/cpython that referenced this issue Sep 7, 2022
@zooba
Copy link
Member

zooba commented Sep 7, 2022

There's one possible fix. There are likely other ways of doing it, so if someone has an approach they'd really rather see, feel free to send a different PR.

@terryjreedy
Copy link
Member Author

Steve's patch seems to fix the premature quit problem in repository main (see PR). There is a separate issue that test_io fails when run in a subprocess (python -m test -j0 test_io) but not when not (python -m test test_io). Also, test_distutils and test_peg_generator changed the execution environment on my machine, and test_wmi ran for 9 minutes, which is terrible, especially in parallel mode and everything else is done in 5.

@tim-one
Copy link
Member

tim-one commented Sep 8, 2022

Here's the proximate cause of the original failure report. It took weeks for me to figure this out in the background. Because, when all the tests are running, this failure shows up at an unpredictable time, and there's nothing in the Warning messages produced that says anything about which test caused it. Instead the whole test run just dies abruptly, and cascades of irrelevant errors are also produced because the runner is trying to shut down cleanly but keeps bashing into trying to delete files that are still open (a no-no on Windows) due to whatever other tests are running concurrently.

It's test_distutils, which has been failing for me on Windows for about a month, under any form of parallel running, even with just 1 process:

C:\Code\Python\PCbuild>rt -q -j1 test_distutils

C:\Code\Python\PCbuild>"C:\Code\Python\PCbuild\amd64\python.exe"  -u -Wd -E -bb -m test  -j1 test_distutils
0:00:00 Run tests in parallel using 1 child processes
0:00:30 load avg: 0.00 running: test_distutils (30.0 sec)
0:01:00 load avg: 0.00 running: test_distutils (1 min)
Warning -- regrtest worker thread failed: Traceback (most recent call last):
Warning --   File "C:\Code\Python\Lib\test\libregrtest\runtest_mp.py", line 334, in run
Warning --     mp_result = self._runtest(test_name)
Warning --                 ^^^^^^^^^^^^^^^^^^^^^^^^
Warning --   File "C:\Code\Python\Lib\test\libregrtest\runtest_mp.py", line 293, in _runtest
Warning --     stdout = stdout_fh.read().strip()
Warning --              ^^^^^^^^^^^^^^^^
Warning --   File "C:\Code\Python\Lib\tempfile.py", line 483, in func_wrapper
Warning --     return func(*args, **kwargs)
Warning --            ^^^^^^^^^^^^^^^^^^^^^
Warning --   File "<frozen codecs>", line 322, in decode
Warning -- UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 2829: invalid start byte

== Tests result: NO TEST RUN ==

1 test omitted:
    test_distutils

Total duration: 1 min 6 sec
Tests result: NO TEST RUN

Changing the encoding on stdout_h (in runtest_mp.py) to latin-1 when it's created makes that go away, but no idea whether that's appropriate. There's no problem here if -j isn't used. If you want to use -j, add -x test_distutils and it all runs fine.

test_wmi also took a loooong time for me today.

@vstinner
Copy link
Member

vstinner commented Sep 8, 2022

Regression introduced recently by #94253 if I followed correctly.

@zooba
Copy link
Member

zooba commented Sep 8, 2022

It's test_distutils, which has been failing for me on Windows for about a month

Good news, we can stop running it now 😄

I'm not sure why it's still in there. I guess the PR to remove distutils stalled on something...

@zooba
Copy link
Member

zooba commented Sep 8, 2022

Regression introduced recently by #94253 if I followed correctly.

@vstinner I believe so - see my #96669 patch for one workaround, but feel free to propose something else if you prefer. Possibly UTF-8 mode is sufficient and can be set with xoption? But I think we want to override the errors as well.

@felixxm
Copy link
Contributor

felixxm commented Sep 27, 2022

I'm not sure if this is related but we have a similar regression when running Django tests in parallel on Windows and Python 3.11 🤔 :

 Traceback (most recent call last):
  File "D:\a\django\django\tests\runtests.py", line 773, in <module>
    failures = django_tests(
               ^^^^^^^^^^^^^
  File "D:\a\django\django\tests\runtests.py", line 432, in django_tests
    failures = test_runner.run_tests(test_labels)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\a\django\django\django\test\runner.py", line 1057, in run_tests
    self.teardown_databases(old_config)
  File "D:\a\django\django\django\test\runner.py", line 980, in teardown_databases
    _teardown_databases(
  File "D:\a\django\django\django\test\utils.py", line 378, in teardown_databases
    connection.creation.destroy_test_db(
  File "D:\a\django\django\django\db\backends\base\creation.py", line 307, in destroy_test_db
    self._destroy_test_db(test_database_name, verbosity)
  File "D:\a\django\django\django\db\backends\sqlite3\creation.py", line 111, in _destroy_test_db
    os.remove(test_database_name)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'other_1.sqlite3'
Error: Process completed with exit code 1.

See logs.

@vstinner
Copy link
Member

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'other_1.sqlite3'

This bug tracker is for Python issue. Your issue is unrelated. Please open an issue to: https://github.com/django/django

@felixxm
Copy link
Contributor

felixxm commented Sep 27, 2022

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'other_1.sqlite3'

This bug tracker is for Python issue. Your issue is unrelated. Please open an issue to: https://github.com/django/django

Hi Victor,

We're maintaining Django 😄 The same tests work fine on Python 3.8, 3.9, and 3.10, so I assumed (finding this issue) that it might be related 🤔 We will try to provide more details later. Sorry for the lack of a preliminary investigation on our side.

@carltongibson
Copy link
Contributor

carltongibson commented Sep 27, 2022

Hi Victor,

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process:

This is the exact error from the opening pair of comments. As @felixxm says, we're reporting an apparent regression in a pre-release version, as we've been asked to do. Sorry if that wasn't clear from the comments.

@zooba
Copy link
Member

zooba commented Sep 28, 2022

Thanks, but we've narrowed this one down to our own test suite. If you're reusing libregrtest in your tests, then you should get the fix automatically, but if you aren't (and I assume you're not, because pytest is miles better than libregrtest 😉) it'll be a different issue and should get a new bug.

@carltongibson
Copy link
Contributor

OK, thanks @zooba — I was hoping to get to it today, but it's on my list for tomorrow now to work out exactly which release introduced the change, and we'll open a new issue with at least reproduce steps for your consideration.

@vstinner
Copy link
Member

I assumed (finding this issue) that it might be related thinking

You report an issue about a PermissionError on a sqlite database, whereas this issue is about an UnicodeDecodeError. I don't see how they could be related. Moreover, Django uses its own test runner (django\test\runner.py), and Python uses its own test runner (libregrtest).

Since the error message is different, please open a separated issue. Yes, it's possible that it's a Python regression, but someone has to analyze the issue to make sure that it's a Python regression, and not something else.

@carltongibson
Copy link
Contributor

OK, it'll be with you in the morning. Thanks @vstinner. 🎁

@vstinner
Copy link
Member

The test is called a 'SUCCESS'. This is followed by a traceback for SystemExit(0), followed by 1 or more tracebacks for PermissionError because a temporary test file is supposedly used by another process.

I created issue #98219 about this annoying PermissionError.

@vstinner
Copy link
Member

I can reproduce this issue on Windows on the main branch with:

python -m test  -j1 test_distutils -m test_spawn 

Output:

vstinner@WIN C:\victor\python\main>python -m test  -j1 test_distutils -m test_spawn 
Running Debug|x64 interpreter...
0:00:00 Run tests in parallel using 1 child processes
Warning -- regrtest worker thread failed: Traceback (most recent call last):
Warning --   File "C:\victor\python\main\Lib\test\libregrtest\runtest_mp.py", line 334, in run
Warning --     mp_result = self._runtest(test_name)
Warning --                 ^^^^^^^^^^^^^^^^^^^^^^^^
Warning --   File "C:\victor\python\main\Lib\test\libregrtest\runtest_mp.py", line 293, in _runtest
Warning --     stdout = stdout_fh.read().strip()
Warning --              ^^^^^^^^^^^^^^^^
Warning --   File "C:\victor\python\main\Lib\tempfile.py", line 483, in func_wrapper
Warning --     return func(*args, **kwargs)
Warning --            ^^^^^^^^^^^^^^^^^^^^^
Warning --   File "C:\victor\python\main\Lib\codecs.py", line 322, in decode
Warning --     (result, consumed) = self._buffer_decode(data, self.errors, final)
Warning --                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Warning -- UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 264: invalid start byte

== Tests result: NO TEST RUN ==

1 test omitted:
    test_distutils

Total duration: 954 ms
Tests result: NO TEST RUN

@vstinner
Copy link
Member

@vstinner:

Regression introduced recently by #94253 if I followed correctly.

I confirm:

  • before commit 199ba23 (parent of this commit), I cannot reproduce the issue
  • at commit 199ba23, I reproduce the issue

@vstinner
Copy link
Member

Encodings used by libregrtest on Windows.

Before commit 199ba23:

  • no -jN option: sys.stdout.encoding: UTF-8

  • -j1:

    • parent process: Popen.stdout (pipe) encoding: cp1252
    • child process: sys.stdout.encoding: cp1252

At commit 199ba23:

  • no -jN option: sys.stdout.encoding: UTF-8 (same)

  • -j1:

    • parent process: stdout temporary file: UTF-8 (was cp1252) <===== WHAT CHANGED
    • child process: sys.stdout.encoding: cp1252 (same)

libregrtest now uses sys.stdout.encoding for the stdout temporary file encoding:

  • On Linux, sys.stdout.encoding is the same if stdout is a terminal or not (if sys.stdout.isatty() is true or false).
  • On Windows, sys.stdout.encoding is UTF-8 if stdout is a terminal, or the ANSI code page (ex: cp1252) otherwise (ex: if stdout is redirected to a program or a file)

@vstinner
Copy link
Member

Another difference on Windows:

  • When stdout is a terminal (isatty() is true), WindowsConsoleIO is used.
  • When stdout is not a terminal (isatty() is false), FileIO is used.
C:\victor\python\main>python -c "import sys; out=sys.stdout; print(type(out.buffer.raw), out.isatty(), out.encoding)"     
<class '_io._WindowsConsoleIO'> True utf-8

C:\victor\python\main>python -c "import sys; out=sys.stdout; print(type(out.buffer.raw), out.isatty(), out.encoding)"|more
<class '_io.FileIO'> False cp1252

@vstinner
Copy link
Member

I'm working on a fix, but first I'm trying to add a test to test_regrtest which reproduces the issue ;-)

@vstinner
Copy link
Member

I wrote PR #98492 to fix the issue.

vstinner added a commit that referenced this issue Oct 21, 2022
On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 21, 2022
On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.
(cherry picked from commit ec1f6f5)

Co-authored-by: Victor Stinner <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 21, 2022
On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.
(cherry picked from commit ec1f6f5)

Co-authored-by: Victor Stinner <[email protected]>
miss-islington added a commit that referenced this issue Oct 21, 2022
On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.
(cherry picked from commit ec1f6f5)

Co-authored-by: Victor Stinner <[email protected]>
@vstinner
Copy link
Member

I would prefer a formal review of my PR, but I merged my PR just to unblock the 3.11.0 final release (scheduled next Monday). Maybe if something can be enhanced, it can be done later. IMO this fix is better than the current situation. In short, it just restores the old behavior: encodings used before 199ba23

pablogsal pushed a commit that referenced this issue Oct 24, 2022
On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.
(cherry picked from commit ec1f6f5)

Co-authored-by: Victor Stinner <[email protected]>
pablogsal pushed a commit that referenced this issue Oct 24, 2022
On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.
(cherry picked from commit ec1f6f5)

Co-authored-by: Victor Stinner <[email protected]>
vstinner added a commit to vstinner/cpython that referenced this issue Sep 2, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Sep 2, 2023
On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.

(cherry picked from commit ec1f6f5)
vstinner added a commit that referenced this issue Sep 3, 2023
…108820)

* Revert "[3.11] gh-101634: regrtest reports decoding error as failed test (#106169) (#106175)"

This reverts commit d5418e9.

* Revert "[3.11] bpo-46523: fix tests rerun when `setUp[Class|Module]` fails (GH-30895) (GH-103342)"

This reverts commit ecb09a8.

* Revert "gh-95027: Fix regrtest stdout encoding on Windows (GH-98492)"

This reverts commit b2aa28e.

* Revert "[3.11] gh-94026: Buffer regrtest worker stdout in temporary file (GH-94253) (GH-94408)"

This reverts commit 0122ab2.

* Revert "Run Tools/scripts/reindent.py (GH-94225)"

This reverts commit f0f3a42.

* Revert "gh-94052: Don't re-run failed tests with --python option (GH-94054)"

This reverts commit 1347607.

* Revert "[3.11] gh-84461: Fix Emscripten umask and permission issues (GH-94002) (GH-94006)"

This reverts commit 1073184.

* gh-93353: regrtest checks for leaked temporary files (#93776)

When running tests with -jN, create a temporary directory per process
and mark a test as "environment changed" if a test leaks a temporary
file or directory.

(cherry picked from commit e566ce5)

* gh-93353: Fix regrtest for -jN with N >= 2 (GH-93813)

(cherry picked from commit 36934a1)

* gh-93353: regrtest supports checking tmp files with -j2 (#93909)

regrtest now also implements checking for leaked temporary files and
directories when using -jN for N >= 2. Use tempfile.mkdtemp() to
create the temporary directory. Skip this check on WASI.

(cherry picked from commit 4f85cec)

* gh-84461: Fix Emscripten umask and permission issues (GH-94002)

- Emscripten's default umask is too strict, see
  emscripten-core/emscripten#17269
- getuid/getgid and geteuid/getegid are stubs that always return 0
  (root). Disable effective uid/gid syscalls and fix tests that use
  chmod() current user.
- Cannot drop X bit from directory.

(cherry picked from commit 2702e40)

* gh-94052: Don't re-run failed tests with --python option (#94054)

(cherry picked from commit 0ff7b99)

* Run Tools/scripts/reindent.py (#94225)

Reindent files which were not properly formatted (PEP 8: 4 spaces).

Remove also some trailing spaces.

(cherry picked from commit e87ada4)

* gh-94026: Buffer regrtest worker stdout in temporary file (GH-94253)

Co-authored-by: Victor Stinner <[email protected]>
(cherry picked from commit 199ba23)

* gh-96465: Clear fractions hash lru_cache under refleak testing (GH-96689)

Automerge-Triggered-By: GH:zware
(cherry picked from commit 9c8f379)

* gh-95027: Fix regrtest stdout encoding on Windows (#98492)

On Windows, when the Python test suite is run with the -jN option,
the ANSI code page is now used as the encoding for the stdout
temporary file, rather than using UTF-8 which can lead to decoding
errors.

(cherry picked from commit ec1f6f5)

* gh-98903: Test suite fails with exit code 4 if no tests ran (#98904)

The Python test suite now fails wit exit code 4 if no tests ran. It
should help detecting typos in test names and test methods.

* Add "EXITCODE_" constants to Lib/test/libregrtest/main.py.
* Fix a typo: "NO TEST RUN" becomes "NO TESTS RAN"

(cherry picked from commit c76db37)

* gh-100086: Add build info to test.libregrtest (#100093)

The Python test runner (libregrtest) now logs Python build information like
"debug" vs "release" build, or LTO and PGO optimizations.

(cherry picked from commit 3c89202)

* bpo-46523: fix tests rerun when `setUp[Class|Module]` fails (#30895)

Co-authored-by: Jelle Zijlstra <[email protected]>
Co-authored-by: Łukasz Langa <[email protected]>
(cherry picked from commit 9953860)

* gh-82054: allow test runner to split test_asyncio to execute in parallel by sharding. (#103927)

This runs test_asyncio sub-tests in parallel using sharding from Cinder. This suite is typically the longest-pole in runs because it is a test package with a lot of further sub-tests otherwise run serially. By breaking out the sub-tests as independent modules we can run a lot more in parallel.

After porting we can see the direct impact on a multicore system.

Without this change:
  Running make test is 5 min 26 seconds
With this change:
  Running make test takes 3 min 39 seconds

That'll vary based on system and parallelism. On a `-j 4` run similar to what CI and buildbot systems often do, it reduced the overall test suite completion latency by 10%.

The drawbacks are that this implementation is hacky and due to the sorting of the tests it obscures when the asyncio tests occur and involves changing CPython test infrastructure but, the wall time saved it is worth it, especially in low-core count CI runs as it pulls a long tail. The win for productivity and reserved CI resource usage is significant.

Future tests that deserve to be refactored into split up suites to benefit from are test_concurrent_futures and the way the _test_multiprocessing suite gets run for all start methods. As exposed by passing the -o flag to python -m test to get a list of the 10 longest running tests.

---------

Co-authored-by: Carl Meyer <[email protected]>
Co-authored-by: Gregory P. Smith <[email protected]> [Google, LLC]
(cherry picked from commit 9e011e7)

* Display the sanitizer config in the regrtest header. (#105301)

Display the sanitizers present in libregrtest.

Having this in the CI output for tests with the relevant environment
variable displayed will help make it easier to do what we need to
create an equivalent local test run.

(cherry picked from commit 852348a)

* gh-101634: regrtest reports decoding error as failed test (#106169)

When running the Python test suite with -jN option, if a worker stdout
cannot be decoded from the locale encoding report a failed testn so the
exitcode is non-zero.

(cherry picked from commit 2ac3eec)

* gh-108223: test.pythoninfo and libregrtest log Py_NOGIL (#108238)

Enable with --disable-gil --without-pydebug:

    $ make pythoninfo|grep NOGIL
    sysconfig[Py_NOGIL]: 1

    $ ./python -m test
    ...
    == Python build: nogil debug
    ...

(cherry picked from commit 5afe0c1)

* gh-90791: test.pythoninfo logs ASAN_OPTIONS env var (#108289)

* Cleanup libregrtest code logging ASAN_OPTIONS.
* Fix a typo on "ASAN_OPTIONS" vs "MSAN_OPTIONS".

(cherry picked from commit 3a1ac87)

* gh-108388: regrtest splits test_asyncio package (#108393)

Currently, test_asyncio package is only splitted into sub-tests when
using command "./python -m test". With this change, it's also
splitted when passing it on the command line:
"./python -m test test_asyncio".

Remove the concept of "STDTESTS". Python is now mature enough to not
have to bother with that anymore. Removing STDTESTS simplify the
code.

(cherry picked from commit 174e9da)

* regrtest computes statistics (#108793)

test_netrc, test_pep646_syntax and test_xml_etree now return results
in the test_main() function.

Changes:

* Rewrite TestResult as a dataclass with a new State class.
* Add test.support.TestStats class and Regrtest.stats_dict attribute.
* libregrtest.runtest functions now modify a TestResult instance
  in-place.
* libregrtest summary lists the number of run tests and skipped
  tests, and denied resources.
* Add TestResult.has_meaningful_duration() method.
* Compute TestResult duration in the upper function.
* Use time.perf_counter() instead of time.monotonic().
* Regrtest: rename 'resource_denieds' attribute to 'resource_denied'.
* Rename CHILD_ERROR to MULTIPROCESSING_ERROR.
* Use match/case syntadx to have different code depending on the
  test state.

Co-authored-by: Alex Waygood <[email protected]>
(cherry picked from commit d4e534c)

* gh-108822: Add Changelog entry for regrtest statistics (#108821)

---------

Co-authored-by: Christian Heimes <[email protected]>
Co-authored-by: Zachary Ware <[email protected]>
Co-authored-by: Nikita Sobolev <[email protected]>
Co-authored-by: Joshua Herman <[email protected]>
Co-authored-by: Gregory P. Smith <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes OS-windows release-blocker tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

9 participants