Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change shutil.rmtree and os.walk to support very deep hierarchies #89727

Open
AlexanderPatrakov mannequin opened this issue Oct 21, 2021 · 59 comments
Open

Change shutil.rmtree and os.walk to support very deep hierarchies #89727

AlexanderPatrakov mannequin opened this issue Oct 21, 2021 · 59 comments
Labels
3.13 bugs and security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@AlexanderPatrakov
Copy link
Mannequin

AlexanderPatrakov mannequin commented Oct 21, 2021

BPO 45564

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-10-21.22:21:02.146>
labels = ['library', '3.9', 'type-crash']
title = 'shutil.rmtree and os.walk are implemented using recursion, fail on deep hierarchies'
updated_at = <Date 2021-10-21.22:21:02.146>
user = 'https://bugs.python.org/AlexanderPatrakov'

bugs.python.org fields:

activity = <Date 2021-10-21.22:21:02.146>
actor = 'Alexander.Patrakov'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2021-10-21.22:21:02.146>
creator = 'Alexander.Patrakov'
dependencies = []
files = []
hgrepos = []
issue_num = 45564
keywords = []
message_count = 1.0
messages = ['404687']
nosy_count = 1.0
nosy_names = ['Alexander.Patrakov']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue45564'
versions = ['Python 3.9']

Linked PRs

@AlexanderPatrakov
Copy link
Mannequin Author

AlexanderPatrakov mannequin commented Oct 21, 2021

It is possible to create deep directory hierarchies that cannot be removed via shutil.rmtree or walked via os.walk, because these functions exceed the interpreter recursion limit. This may have security implications for web services (e.g. various webdisks) that have to clean up user-created mess or walk through it.

[aep@aep-haswell ~]$ mkdir /tmp/badstuff
[aep@aep-haswell ~]$ cd /tmp/badstuff
[aep@aep-haswell badstuff]$ for x in `seq 2048` ; do mkdir $x ; cd $x ; done
[aep@aep-haswell 103]$ cd
[aep@aep-haswell ~]$ python
Python 3.9.7 (default, Oct 10 2021, 15:13:22) 
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> shutil.rmtree('/tmp/badstuff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/shutil.py", line 726, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib/python3.9/shutil.py", line 663, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  [Previous line repeated 992 more times]
  File "/usr/lib/python3.9/shutil.py", line 642, in _rmtree_safe_fd
    fullname = os.path.join(path, entry.name)
  File "/usr/lib/python3.9/posixpath.py", line 77, in join
    sep = _get_sep(a)
  File "/usr/lib/python3.9/posixpath.py", line 42, in _get_sep
    if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
>>> import os
>>> list(os.walk('/tmp/badstuff'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/os.py", line 418, in _walk
    yield from _walk(new_path, topdown, onerror, followlinks)
  File "/usr/lib/python3.9/os.py", line 418, in _walk
    yield from _walk(new_path, topdown, onerror, followlinks)
  File "/usr/lib/python3.9/os.py", line 418, in _walk
    yield from _walk(new_path, topdown, onerror, followlinks)
  [Previous line repeated 993 more times]
  File "/usr/lib/python3.9/os.py", line 412, in _walk
    new_path = join(top, dirname)
  File "/usr/lib/python3.9/posixpath.py", line 77, in join
    sep = _get_sep(a)
  File "/usr/lib/python3.9/posixpath.py", line 42, in _get_sep
    if isinstance(path, bytes):
RecursionError: maximum recursion depth exceeded while calling a Python object
>>>

@AlexanderPatrakov AlexanderPatrakov mannequin added 3.9 only security fixes stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 21, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@AlexWaygood AlexWaygood added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 10, 2022
jonburdo added a commit to jonburdo/cpython that referenced this issue Nov 26, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
jonburdo added a commit to jonburdo/cpython that referenced this issue Nov 27, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
@barneygale
Copy link
Contributor

This also affects pathlib.Path.walk()

jonburdo added a commit to jonburdo/cpython that referenced this issue Nov 30, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
jonburdo added a commit to jonburdo/cpython that referenced this issue Dec 15, 2022
Use a stack to implement os.walk iteratively instead of recursively to
avoid hitting recursion limits on deeply nested trees.
zmievsa added a commit to zmievsa/cpython that referenced this issue Dec 16, 2022
zmievsa added a commit to zmievsa/cpython that referenced this issue Dec 16, 2022
…f github.com:ovsyanka83/cpython into pythongh-89727/fix-pathlib.Path.walk-recursion-depth
@merwok merwok added 3.12 bugs and security fixes and removed 3.9 only security fixes labels Dec 16, 2022
@merwok
Copy link
Member

merwok commented Dec 16, 2022

I am unsure on the bug vs feature classification here. On one side it seems like a clear problem and the fixes don’t change function signatures or the way they are used, but on the other side there is no guarantee that extremely deep hierachies are supported, and I wonder if there could be negative effects from the ocde changes (for example, can the stack list/deque grow extremely large during traversal?).

(Adding PR reviewers: @carljm @brettcannon @serhiy-storchaka)

@carljm
Copy link
Member

carljm commented Dec 16, 2022

I don't have strong feelings about calling it a bug vs a feature; I would tend to call it a bug because the function just fails in a subset of cases that aren't obviously outside its domain. I agree that it's a bug that we could just document as a known limitation, though I don't see any reason to do that when the fix is not that difficult. I guess the only real impact of how it's classified is whether the fix would be backported?

I wonder if there could be negative effects from the ocde changes (for example, can the stack list/deque grow extremely large during traversal?)

Sure, but in the current code the Python stack would instead grow very large in the same scenario. And the Python stack frames (plus everything they reference) are almost certainly larger than the stack elements tracked in the iterative version, so if anything I would expect the iterative version to also save memory in the case of a very deep traversal.

@merwok
Copy link
Member

merwok commented Dec 16, 2022

Yes exactly, the impact of my question is backporting or not.
Thanks for the reply about the memory concern!

I take it you feel ok about backporting the fix; let’s wait to see what a core dev thinks.

@brettcannon
Copy link
Member

I would classify this as a feature request. There are plenty of things in CPython for which you can you run out of stack space for and it isn't considered a bug in those instances (and that's just part of general limitations that CPython has).

@merwok merwok added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Dec 16, 2022
@merwok merwok changed the title shutil.rmtree and os.walk are implemented using recursion, fail on deep hierarchies Change shutil.rmtree and os.walk to support very deep hierarchies Dec 16, 2022
@carljm
Copy link
Member

carljm commented Dec 16, 2022

(To be clear, I don't think there's a strong case for backporting this, so reasoning backwards from the conclusion I'm quite happy to call it a feature :P )

@barneygale
Copy link
Contributor

Are glob.glob(recursive=True) and pathlib.Path.rglob() also affected?

@jonburdo
Copy link
Contributor

Are glob.glob(recursive=True) and pathlib.Path.rglob() also affected?

Yes, it looks like it. _rlistdir in Lib/os.py and _RecursiveWildcardSelector in Lib/pathlib.py could be changed to fix it

barneygale added a commit that referenced this issue May 30, 2024
…19638) (#119764)

GH-89727: Fix `os.fwalk()` recursion error on deep trees (GH-119638)

Implement `os.fwalk()` using a list as a stack to avoid emitting recursion
errors on deeply nested trees.
(cherry picked from commit 3c890b5)

Co-authored-by: Barney Gale <[email protected]>
barneygale added a commit that referenced this issue May 30, 2024
…19638) (#119765)

GH-89727: Fix `os.fwalk()` recursion error on deep trees (GH-119638)

Implement `os.fwalk()` using a list as a stack to avoid emitting recursion
errors on deeply nested trees.
(cherry picked from commit 3c890b5)

Co-authored-by: Barney Gale <[email protected]>
barneygale added a commit that referenced this issue May 30, 2024
Follow-up to 3c890b5. Ensure we `os.close()` open file descriptors when
the `os.fwalk()` generator is finalized.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 30, 2024
…ythonGH-119766)

Follow-up to 3c890b5. Ensure we `os.close()` open file descriptors when
the `os.fwalk()` generator is finalized.
(cherry picked from commit a5fef80)

Co-authored-by: Barney Gale <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 30, 2024
…ythonGH-119766)

Follow-up to 3c890b5. Ensure we `os.close()` open file descriptors when
the `os.fwalk()` generator is finalized.
(cherry picked from commit a5fef80)

Co-authored-by: Barney Gale <[email protected]>
barneygale added a commit that referenced this issue May 30, 2024
…H-119766) (#119768)

GH-89727: Fix FD leak on `os.fwalk()` generator finalization. (GH-119766)

Follow-up to 3c890b5. Ensure we `os.close()` open file descriptors when
the `os.fwalk()` generator is finalized.
(cherry picked from commit a5fef80)

Co-authored-by: Barney Gale <[email protected]>
barneygale added a commit that referenced this issue May 30, 2024
…H-119766) (#119767)

GH-89727: Fix FD leak on `os.fwalk()` generator finalization. (GH-119766)

Follow-up to 3c890b5. Ensure we `os.close()` open file descriptors when
the `os.fwalk()` generator is finalized.
(cherry picked from commit a5fef80)

Co-authored-by: Barney Gale <[email protected]>
@barneygale
Copy link
Contributor

One more for the pile: shutil.copytree() also calls itself recursively.

barneygale added a commit that referenced this issue Jun 1, 2024
Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.

`shutil._rmtree_unsafe()` was fixed in a150679.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 1, 2024
…ythonGH-119808)

Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.

`shutil._rmtree_unsafe()` was fixed in a150679.
(cherry picked from commit 53b1981)

Co-authored-by: Barney Gale <[email protected]>
barneygale added a commit to barneygale/cpython that referenced this issue Jun 1, 2024
…ython#119808)

Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.

`shutil._rmtree_unsafe()` was fixed in a150679.

(cherry picked from commit 53b1981)
barneygale added a commit to barneygale/cpython that referenced this issue Jun 1, 2024
…trees (pythonGH-119808)

Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.

`shutil._rmtree_unsafe()` was fixed in a150679..
(cherry picked from commit 53b1981)

Co-authored-by: Barney Gale <[email protected]>
barneygale added a commit that referenced this issue Jun 1, 2024
…H-119808) (#119918)

Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.

`shutil._rmtree_unsafe()` was fixed in a150679.

(cherry picked from commit 53b1981)

Co-authored-by: Barney Gale <[email protected]>
barneygale added a commit that referenced this issue Jun 1, 2024
…H-119808) (#119919)

Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.

`shutil._rmtree_unsafe()` was fixed in a150679.

(cherry picked from commit 53b1981)
barneygale added a commit to barneygale/cpython that referenced this issue Jun 5, 2024
…ython#119808)

Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.

`shutil._rmtree_unsafe()` was fixed in a150679.
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…ep trees (python#119634)

Make `shutil._rmtree_unsafe()` call `os.walk()`, which is implemented
without recursion.

`shutil._rmtree_safe_fd()` is not affected and can still raise a recursion
error.

Co-authored-by: Jelle Zijlstra <[email protected]>
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…n#119638)

Implement `os.fwalk()` using a list as a stack to avoid emitting recursion
errors on deeply nested trees.
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…ython#119766)

Follow-up to 3c890b5. Ensure we `os.close()` open file descriptors when
the `os.fwalk()` generator is finalized.
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…ython#119808)

Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.

`shutil._rmtree_unsafe()` was fixed in a150679.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…ep trees (python#119634)

Make `shutil._rmtree_unsafe()` call `os.walk()`, which is implemented
without recursion.

`shutil._rmtree_safe_fd()` is not affected and can still raise a recursion
error.

Co-authored-by: Jelle Zijlstra <[email protected]>
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…n#119638)

Implement `os.fwalk()` using a list as a stack to avoid emitting recursion
errors on deeply nested trees.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…ython#119766)

Follow-up to 3c890b5. Ensure we `os.close()` open file descriptors when
the `os.fwalk()` generator is finalized.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…ython#119808)

Implement `shutil._rmtree_safe_fd()` using a list as a stack to avoid emitting recursion errors on deeply nested trees.

`shutil._rmtree_unsafe()` was fixed in a150679.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests