Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tests failing with The process cannot access the file because it is being used by another process by force-killing child processes #101601

Open
arhadthedev opened this issue Feb 6, 2023 · 1 comment
Labels
tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@arhadthedev
Copy link
Member

Bug report

GitHub Actions continue to sporadically fail with PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\cpython\\cpython\\build\\test_python_5740�\\test_python_worker_5604�'.

The failure happens when a test creates a child process, shares a test directory with it, and the child outlives its parent.

#99464 was the previous attempt. Now we should try to implement #98219 (comment).


Note: this problem was reported a few times before:

@arhadthedev arhadthedev added type-bug An unexpected behavior, bug, or error tests Tests in the Lib/test dir labels Feb 6, 2023
@eryksun
Copy link
Contributor

eryksun commented Feb 8, 2023

I'd prefer for job and process group support to be integrated into subprocess and multiprocessing.

In test.libregrtest.runtest_mp, the way I envision this in practice is as follows. In run_test_in_subprocess(), the subprocess.Popen arguments would be extended to create a job on Windows, which would be implemented to also create a new process group.

    if sys.platform == 'win32':
        kw['createjob'] = True

Killing the process in TestWorkerProcess._kill() would be extended for both POSIX and Windows. On POSIX, use the common combination of SIGTERM and SIGKILL. On Windows, send SIGBREAK to the process group, and then terminate the job.

        try:
            if USE_PROCESS_GROUP:
                try:
                    popen.send_signal_pg(signal.SIGTERM)
                    try:
                        popen.wait(5)
                    except subprocess.TimeoutExpired:
                        pass
                finally:
                    popen.send_signal_pg(signal.SIGKILL)
            elif sys.platform == 'win32':
                try:
                    popen.send_signal_pg(signal.SIGBREAK)
                    try:
                        popen.wait(5)
                    except subprocess.TimeoutExpired:
                        pass
                finally:
                    popen.terminate_job()
            else:
                popen.kill()
        except ProcessLookupError:
            pass
        except OSError as exc:
            print_warning(f"Failed to kill {what}: {exc!r}")

Regarding subprocess.Popen, the above depends on the addition of a new createjob=False parameter and a terminate_job() method, as well as a new cross-platform send_signal_pg() method.

Popen.__init__() can set up the job as follows:

        if createjob:
            self._job_handle = Handle(_winapi.CreateJobObject())
            info = {'BasicLimitInformation': {'LimitFlags':
                        _winapi.JOB_OBJECT_LIMIT_BREAKAWAY_OK}}
            _winapi.SetInformationJobObject(self._job_handle,
                _winapi.JobObjectExtendedLimitInformation, info)
            creationflags |= _winapi.CREATE_SUSPENDED | CREATE_NEW_PROCESS_GROUP
        elif _mswindows:
            self._job_handle = None

Configuring the job with the limit flag JOB_OBJECT_LIMIT_BREAKAWAY_OK allows child processes to be created outside of the job by specifying the process creation flag CREATE_BREAKAWAY_FROM_JOB.

The implementation of _winapi.SetInformationJobObject() only has to support setting JobObjectExtendedLimitInformation. The only limit flag that needs to be supported is JOB_OBJECT_LIMIT_BREAKAWAY_OK, but it can also easily support the flags JOB_OBJECT_LIMIT_SILENT_BREAKAWAY_OK, JOB_OBJECT_LIMIT_DIE_ON_UNHANDLED_EXCEPTION, and JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE. For everything else it can raise NotImplementedError.

After spawning the process in Popen._execute_child(), add it to the job and resume the initial thread as follows:

            if creationflags & CREATE_NEW_PROCESS_GROUP:
                self.pgid = pid
            else:
                self.pgid = None
            if self._job_handle is not None:
                _winapi.AssignProcessToJobObject(self._job_handle, hp)
                _winapi.ResumeThread(ht)
            _winapi.CloseHandle(ht)

The new pgid attribute (process group ID) would be used by the new send_signal_pg() method. This sends SIGINT or SIGBREAK more reliably and clearly by ensuring that the child process was created with a new process group and that the current process and child process are in the same console session. Ensuring the latter avoids longstanding bugs in WinAPI GenerateConsoleCtrlEvent().

        def send_signal_pg(self, sig):
            """Send a signal to the process group."""
            if self.pgid is None:
                raise SubprocessError('no process group assigned')
            try:
                processes = _winapi.GetConsoleProcessList()
            except OSError as e:
                if e.winerror == _winapi.ERROR_INVALID_HANDLE:
                    raise SubprocessError('a console session is required')
                raise
            if self.pid not in processes:
                 raise SubprocessError('the process must be in the current '
                                       'console session')
            if sig == signal.SIGINT:
                ctrl_event = _winapi.CTRL_C_EVENT
            elif sig == signal.SIGBREAK:
                ctrl_event = _winapi.CTRL_BREAK_EVENT
            else:
                raise ValueError("Unsupported signal: {}".format(sig))
            _winapi.GenerateConsoleCtrlEvent(ctrl_event, self.pgid)

A method is needed to forcefully terminate all active processes in the job.

        def terminate_job(self):
            if self._job_handle is None:
                raise SubprocessError('no job assigned')
            _winapi.TerminateJobObject(self._job_handle, 1)

WinAPI TerminateJobObject() succeeds if the job has no active processes.

It would be nice to also implement a method that returns basic accounting information about the job, such as the number of active processes, how much user/kernel time the job has used, and info about the job's I/O operations. For example:

        def get_job_info(self):
            if self._job_handle is None:
                raise SubprocessError('no job assigned')
            return _winapi.QueryInformationJobObject(self._job_handle,
                    _winapi.JobObjectBasicAndIoAccountingInformation)

The above would return a dict containing BasicInfo and IoInfo dicts, based on JOBOBJECT_BASIC_AND_IO_ACCOUNTING_INFORMATION. For example:

>>> p.get_job_info()
{'BasicInfo': {'TotalUserTime': 1250000, 'TotalKernelTime': 625000, 'ThisPeriodTotalUserTime': 1250000, 'ThisPeriodTotalKernelTime': 625000, 'TotalPageFaultCount': 5163, 'TotalProcesses': 2, 'ActiveProcesses': 2, 'TotalTerminatedProcesses': 0}, 'IoInfo': {'ReadOperationCount': 64, 'WriteOperationCount': 2, 'OtherOperationCount': 995, 'ReadTransferCount': 768540, 'WriteTransferCount': 172, 'OtherTransferCount': 99152}}

The implementation of _winapi.QueryInformationJobObject() only has to support querying JobObjectBasicAndIoAccountingInformation. For everything else it can raise NotImplementedError.

On POSIX, pgid would be set after creating the child process, depending on the process_group and start_new_session arguments.

                    if process_group != -1:
                        self.pgid = process_group
                    elif start_new_session:
                        self.pgid = self.pid
                    else:
                        self.pgid = None

send_signal_pg() should be relatively simple on POSIX.

        def send_signal_pg(self, sig):
            """Send a signal to the process group."""
            if self.pgid is None:
                raise SubprocessError('no process group assigned')
            try:
                os.killpg(self.pgid, sig)
            except ProcessLookupError:
                pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants