Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.walk() no longer supports late edits to dirnames #102932

Open
barneygale opened this issue Mar 22, 2023 · 1 comment
Open

os.walk() no longer supports late edits to dirnames #102932

barneygale opened this issue Mar 22, 2023 · 1 comment
Labels
3.12 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@barneygale
Copy link
Contributor

Prior to the fix for #89727, it was possible influence which subdirectories were visited by os.walk() even after the walk had descended into siblings of those subdirectories. Such "late" modifications no longer have any effect.

This was reported by @pochmann:

I think you changed os.walk's behavior. The recursive one supported late modifications of subdirs lists, due to being lazier. The iterative one doesn't (if I'm not mistaken... I can't test).

With topdown, you can modify a directory's subdirs, for example remove one and it won't be visited. Usually you'd do that while your walk is on that parent directory. But with the recursive one, it was possible to do it later. With the iterative one, I don't think that works anymore.

Here's a demo where I remove the second subdir after having walked onto the first:

import os

# Create demo dir with three subdirs
os.makedirs('demo/a')
os.makedirs('demo/b')
os.makedirs('demo/c')

# Walk onto "demo"
walk = os.walk('demo')
demo = next(walk)
print(demo)

# Walk onto first subdir
first = next(walk)
print(first)

# Remove the second subdir
del demo[1][1]

# Walk onto the remaining subdirs
for x in walk:
    print(x)

Output (Try it online!), note that the second subdir wasn't walked:

('demo', ['c', 'a', 'b'], [])
('demo/c', [], [])
('demo/b', [], [])

Excerpt from the recursive one:

        for dirname in dirs:
            new_path = join(top, dirname)
            yield from _walk(new_path, topdown, onerror, followlinks)

The iteration of dirs and the walking of the subdirs are intertwined. After walking a subdir, the paused iteration of dirs resumes. That allows the late modifications of dirs to have an effect.

Excerpt from the iterative one:

            for dirname in reversed(dirs):
                new_path = join(top, dirname)
                stack.append(new_path)

This eagerly puts all subdirs onto the stack, before they're getting walked, and then dirs is never used again. So modifying dirs during the subdirs walking doesn't have an effect anymore.

@barneygale barneygale added type-bug An unexpected behavior, bug, or error 3.12 bugs and security fixes labels Mar 22, 2023
@jonburdo
Copy link
Contributor

jonburdo commented Mar 24, 2023

If we do want to revert to the old behavior, this PR should do it (or at least shows how it could be done, including a unit test): #100703

Currently, the latest commit splits the implementation for topdown=True and topdown=False into two functions, but a previous commit also shows how it could be done without splitting:
https://github.com/python/cpython/pull/100703/files/1011e199529ccdbd778226574c379403708b2e5d

@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants