-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-89727: Fix os.walk to handle late editing of dirs #100703
base: main
Are you sure you want to change the base?
Conversation
…tion over it has begun
to do:
|
The |
Lib/os.py
Outdated
# We may not have read permission for top, in which case we can't | ||
# get a list of the files the directory contains. | ||
# We suppress the exception here, rather than blow up for a | ||
# minor reason when (say) a thousand readable directories are still | ||
# left to visit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment belongs with the try: scandir
block, which is moved from here and now exists in three separate locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It can probably just go on the first instance of this block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually in the latest commit where this block appears in separate functions, I decided to just remove this comment. The behavior is described in the docstring and I think the try-except block is pretty clear. Happy to add it or part of it back if deemed necessary
Lib/os.py
Outdated
@@ -408,27 +404,61 @@ def walk(top, topdown=True, onerror=None, followlinks=False): | |||
|
|||
if walk_into: | |||
walk_dirs.append(entry.path) | |||
if cont: | |||
continue | |||
|
|||
if topdown: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a high percentage of code in this function is now under either if topdown
or the reverse that I'm kind of curious what it would look like to just have separate (internal) functions for topdown vs bottom-up. But this would still increase code duplication significantly, and move the duplicated code/structure further apart, so I suspect it's still better this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking about this. I just gave it a shot to see. This seems to me like one of those cases where there are enough little differences in logic that it's much cleaner to separate the two. If we didn't support dir modification for topdown or didn't care about performance it might be simpler, but when you have a lot of the same logic interspersed by a lot of little differences it gets messy. In these kinds of situations I also tend to prefer some duplication between functions over one big function with conditions inside of loops.
I also find it easier to look at the two sets of logic separately, but either way works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the split version seems fine. Will wait a bit and see if some core devs have opinions.
A few more notes for context: The first commit has more minimal changes similar to this suggestion. In that commit, because we're using an iterator, the implementation is a bit awkward and inefficient (i.e. only ever calling
The subsequent changes also separate the top down and bottom up logic further (even before splitting into separate functions). Note that even before this PR, there was a fair amount of code that only applied when we have
|
Do we really want to introduce this much complexity for a completely undocumented and possibly unused feature? |
Maybe not. I made this PR to see what it would look like, but am not attached to it. If I were designing But I strongly suspect it is used out there (maybe even by accident), and consistent behavior is valuable. If we don't support this, it'd be nice to have some sort of warning about the behavior change but I also don't see a reasonable way to do that. (Checking if dirs was modified would be complex or expensive too and kind of weird). I'd suggest at least noting the change in behavior in release notes if we don't support the old behavior. I also do prefer separating the top down and bottom up logic as in this PR, although that could be a separate matter. Other considerations:
|
pathlib's version of Mutating |
Allow modification of the dirs returned by an
os.walk
entry to affect which subdirectories are walked, even when the modification occurs after iteration over dirs has begun. This was the behavior for a long time before it was changed in #100703