Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsspec.open() doesn't work with square brackets in a filename #1476

Closed
anmyachev opened this issue Dec 20, 2023 · 10 comments · Fixed by #1536
Closed

fsspec.open() doesn't work with square brackets in a filename #1476

anmyachev opened this issue Dec 20, 2023 · 10 comments · Fixed by #1536

Comments

@anmyachev
Copy link

For example, fsspec.open("filename[test].csv").

Is this by design or a bug? (python builtin open function works in this case, at least for python 3.11)

@martindurant
Copy link
Member

fsspec.open uses the same code path as open_files, and allows glob patterns in the string. These are not generally used in filepaths.... So in this case, the [test] is interpreted as a range of possible characters, but only one.
Interestingly, the following does work:

fsspec.open("filename*.csv")

There is probably a way to escape the special characters, but we are laying glob on regex, so I'm not sure how. Also, passing a list containing just the one match (with open_files) doesn't work either, since fsspec also considers a list of glob-strings.

@anmyachev
Copy link
Author

@martindurant thanks for the quick answer!

fsspec.open uses the same code path as open_files, and allows glob patterns in the string. These are not generally used in filepaths...

Indeed, this is rare. Then I assume the current implementation will remain as is?

There is probably a way to escape the special characters, but we are laying glob on regex, so I'm not sure how.

Unfortunately me too.

@martindurant
Copy link
Member

I'd be happy to find a way through, especially since open() is supposed to work on single paths, never return a list.

@Skylion007
Copy link
Contributor

@martindurant Doesn't re.escape do everything you want here?

@martindurant
Copy link
Member

@Skylion007 , you can try, but we are converting from glob-strings to re, and [..] is valid wildcarding in both.

@Skylion007
Copy link
Contributor

@martindurant Do we actually want to support globbing in this code path, seems to odd to me to want to do that. Easy solution seems to be use glob.escape and re.escape if we want to match both filenames literally, right? Or at least we can add a flag to do so?

It seems to me like this codepath is designed to work the same as builtins.open so we should probably match call convention. Also, it's not listed anywhere in the doc that open is suppose to support globs https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.open

@martindurant
Copy link
Member

True, but at the moment, open() calls open_files(...)[0], and the latter does support globbing. It is not totally unreasonable to want to open("file*") and get back the first file that matches, although this is not explicitly documented.

@martindurant
Copy link
Member

Besides which, I am not sure that the escaped character would not be caught by the contains_magic checks anyway; and don't forget that they would need to be un-escaped again before passing to the actual backend.

@Skylion007
Copy link
Contributor

But it's also totally reasonable to want to get file[final].csv. Currently there is not a way to access this type of filename at all though... It also prevents fsspec open from otherwise being a nearly perfect replacement of builtins.open() so that you can just monkeypatch it in.

Skylion007 added a commit to Skylion007/filesystem_spec that referenced this issue Mar 6, 2024
Skylion007 added a commit to Skylion007/filesystem_spec that referenced this issue Mar 6, 2024
Skylion007 added a commit to Skylion007/filesystem_spec that referenced this issue Mar 6, 2024
Skylion007 added a commit to Skylion007/filesystem_spec that referenced this issue Mar 6, 2024
@Skylion007
Copy link
Contributor

@martindurant Figured it out! It was a bug in an arg not being respected in open_files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants