Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pathlib.glob('**') returns only directories #70303

Closed
jitterman mannequin opened this issue Jan 14, 2016 · 6 comments
Closed

pathlib.glob('**') returns only directories #70303

jitterman mannequin opened this issue Jan 14, 2016 · 6 comments
Labels
stdlib Python modules in the Lib dir topic-pathlib type-bug An unexpected behavior, bug, or error

Comments

@jitterman
Copy link
Mannequin

jitterman mannequin commented Jan 14, 2016

BPO 26115
Nosy @pitrou

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-01-14.22:26:27.607>
created_at = <Date 2016-01-14.21:20:32.100>
labels = ['type-bug', 'invalid']
title = "pathlib.glob('**') returns only directories"
updated_at = <Date 2016-02-04.07:25:04.510>
user = 'https://bugs.python.org/jitterman'

bugs.python.org fields:

activity = <Date 2016-02-04.07:25:04.510>
actor = 'SilentGhost'
assignee = 'none'
closed = True
closed_date = <Date 2016-01-14.22:26:27.607>
closer = 'SilentGhost'
components = []
creation = <Date 2016-01-14.21:20:32.100>
creator = 'jitterman'
dependencies = []
files = []
hgrepos = []
issue_num = 26115
keywords = []
message_count = 3.0
messages = ['258223', '258237', '259537']
nosy_count = 3.0
nosy_names = ['pitrou', 'SilentGhost', 'jitterman']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue26115'
versions = ['Python 3.5']

Linked PRs

@jitterman
Copy link
Mannequin Author

jitterman mannequin commented Jan 14, 2016

The title says it all.

The shell version of '*' and '**' return both directories and files.
Path('.').glob('*') returns both directories and files, but Path('.').glob('**') returns only directories. That seems wrong to me.

@jitterman jitterman mannequin added the type-bug An unexpected behavior, bug, or error label Jan 14, 2016
@SilentGhost
Copy link
Mannequin

SilentGhost mannequin commented Jan 14, 2016

It is, however, exactly what documentation says it should do:

The “**” pattern means “this directory and all subdirectories, recursively”.

@SilentGhost SilentGhost mannequin closed this as completed Jan 14, 2016
@SilentGhost SilentGhost mannequin added the invalid label Jan 14, 2016
@jitterman
Copy link
Mannequin Author

jitterman mannequin commented Feb 4, 2016

It may be what the documentation says it will do, but is not what it should do. I believe that because:

  1. Currently ** in pathlib matches only directories, but **.py matches files. That seems inconsistent.
  2. In bash, and csh, ** matches files and directories. To get the same in pathlib one must use **/*, which is inconsistent with what we have used for many decades.
  3. With the traditional meaning of **, it is easy to constrain the match to directories by adding slash to the end of the glob (just use **/).
  4. There is considerable value in supporting the traditional meaning of glob strings. Globbing is a very powerful feature, and it is often offered to the end user in shell-like situations. For example, sftp offers globbing. When offering globbing to the end users it is best to be consistent the globbing they are already familiar with.
  5. There is no significant advantage to the difference between pathlib globbing and traditional globbing.

Globbing in pathlib is different from traditional globbing in another important way. pathlib does not distinguish between hidden files and directories and normal files and directories. There may be isolated cases where that is preferred, but generally that is not true. Indeed, the primary characteristic of being hidden is that it should not be included in globbing. One marks a file or directory to be hidden specifically to mean 'do not include this one when selecting groups of files or directories'. Once the glob string has been expanded, it is possible to filter out the hidden files and directories, but it very difficult to do if there are several levels of directories because when weeding out the matches that should not be be included you have to look for hidden items at all levels of the path.

Globbing has been available and largely unchanged for almost 50 years. I encourage you to strongly consider making pathlib globbing more consistent with what we have all grown up with.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@barneygale
Copy link
Contributor

barneygale commented May 29, 2023

Re-opening - this is a valid issue. As the previous poster notes, pathlib's ** is incompatible with every other glob implementation out there, including Python's own glob.glob():

If recursive is true, the pattern “**” will match any files and zero or more directories, subdirectories and symbolic links to directories. If the pattern is followed by an os.sep or os.altsep then files will not match.

It's easy enough to solve - just delete these lines:

cpython/Lib/pathlib.py

Lines 1046 to 1048 in 24af451

if pattern_parts[-1] == '**':
# GH-70303: '**' only matches directories. Add trailing slash.
pattern_parts.append('')

However we'll need to go through a deprecation period first. nope!

@barneygale barneygale reopened this May 29, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Jun 6, 2023
…h `**`.

In a future Python release, patterns with this ending will match both files
and directories. Users may add a trailing slash to remove the warning.
@terryjreedy
Copy link
Member

terryjreedy commented Jun 17, 2023

The Path.glob says that patterns are the same as for fnmatch, except that ** is always recursive. The fnmatch doc says nothing about **. The glob dob says glob is essentially scandir+fnmatch, and that the behavior of ** depends on options selected.

If recursive is true, the pattern ** will match any files and zero or more directories, subdirectories and symbolic links to directories. If the pattern is followed by an os.sep or os.altsep then files will not match.

If include_hidden is true, ** pattern will match hidden directories.)

To me, the current Path.glob ** is a behavior that should be fixed.

barneygale added a commit that referenced this issue Aug 4, 2023
…GH-105413)

In a future Python release, patterns with this ending will match both files
and directories. Users may add a trailing slash to remove the warning.
@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 23, 2023
barneygale added a commit to barneygale/cpython that referenced this issue Jan 28, 2024
…directories

Return files and directories from `pathlib.Path.glob()` if the pattern ends
with `**`. This is more compatible with `PurePath.full_match()` and with
other glob implementations such as bash and `glob.glob()`. Users can add a
trailing slash to match only directories.

In my previous patch I added a `FutureWarning` with the intention of fixing
this in Python 3.15. Upon further reflection I think this was an
unnecessarily cautious remedy to a clear bug.
barneygale added a commit that referenced this issue Jan 30, 2024
…ories (#114684)

Return files and directories from `pathlib.Path.glob()` if the pattern ends
with `**`. This is more compatible with `PurePath.full_match()` and with
other glob implementations such as bash and `glob.glob()`. Users can add a
trailing slash to match only directories.

In my previous patch I added a `FutureWarning` with the intention of fixing
this in Python 3.15. Upon further reflection I think this was an
unnecessarily cautious remedy to a clear bug.
@barneygale
Copy link
Contributor

Fixed in 3.13 / fda7445 / #114684

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…directories (python#114684)

Return files and directories from `pathlib.Path.glob()` if the pattern ends
with `**`. This is more compatible with `PurePath.full_match()` and with
other glob implementations such as bash and `glob.glob()`. Users can add a
trailing slash to match only directories.

In my previous patch I added a `FutureWarning` with the intention of fixing
this in Python 3.15. Upon further reflection I think this was an
unnecessarily cautious remedy to a clear bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-pathlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants