Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError when try convert URI to pathlib.Path #91504

Closed
bw7715 opened this issue Apr 13, 2022 · 18 comments
Closed

OSError when try convert URI to pathlib.Path #91504

bw7715 opened this issue Apr 13, 2022 · 18 comments

Comments

@bw7715
Copy link

bw7715 commented Apr 13, 2022

When try to convert URI to pathlib.Path I get an error:

E:\Work\Venvs\p39\Scripts\python.exe "E:/uri_test/pathlib_test.py"
D:\Tests\samples\images\3.bmp exists=True
file:///D:/Tests/samples/images/3.bmp
Traceback (most recent call last):
  File "E:\uri_test\pathlib_test.py", line 9, in <module>
    from_uri_path = Path(uri_path).resolve()
  File "C:\pythons\Python39\lib\pathlib.py", line 1215, in resolve
    s = self._flavour.resolve(self, strict=strict)
  File "C:\pythons\Python39\lib\pathlib.py", line 215, in resolve
    s = self._ext_to_normal(_getfinalpathname(s))
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'file:\\D:\\Tests\\samples\\images\\3.bmp'

Code to reproduce:

from pathlib import Path

img_path = Path(r"D:\Tests\samples\images\3.bmp").resolve()
print(f"{img_path} exists={img_path.exists()}")

uri_path = img_path.as_uri()
print(uri_path)

from_uri_path = Path(uri_path).resolve()
print(f"{from_uri_path} exists={from_uri_path.exists()}")

Python version: 3.9

With 3.10 works.

@ericvsmith
Copy link
Member

Please show what the output in 3.10 is.

@bw7715
Copy link
Author

bw7715 commented Apr 13, 2022

Python 3.10 output:

E:\Work\Venvs\p310\Scripts\python.exe "E:/uri_test/pathlib_test.py"
D:\Tests\samples\images\3.bmp exists=True
file:///D:/Tests/samples/images/3.bmp
D:Tests\samples\images\3.bmp exists=True

Process finished with exit code 0

Now I see that one '\' is missing in path after drive letter:
D:Tests\samples\images\3.bmp exists=True

@eryksun
Copy link
Contributor

eryksun commented Apr 13, 2022

A pathlib.Path cannot be a URI or be constructed from a URI. For example:

>>> p = pathlib.Path('file:///C:/Temp/test.txt')
>>> p.drive
''
>>> p.root
''
>>> os.fspath(p)
'file:\\C:\\Temp\\test.txt

The path was parsed as an invalid relative path. Perhaps there's a case for adding a from_uri() constructor.

In 3.10+, resolve() 'succeeds' with an invalid path such as this because it defaults to calling realpath() in non-strict mode, which in this case is garbage-in garbage-out. For example:

>>> p.resolve()
WindowsPath('C:Temp/test.txt')
>>> os.path.realpath(p)
'C:Temp\\test.txt'

The above result is a drive-relative path, i.e. it has no root path and thus depends on the working directory on drive "C:".

@bw7715
Copy link
Author

bw7715 commented Apr 13, 2022

I think this method from_uri() is a good idea.

>>> p = pathlib.Path().from_uri("file:///D:/Tests/samples/images/3.bmp")
>>> p.resolve()
'D:\Tests\samples\images\3.bmp'

@bw7715
Copy link
Author

bw7715 commented Apr 13, 2022

There is a problem with URI -> pathlib.Path -> URI conversion:

>>>p = Path(r"file:///D:/Tests/samples/images/3.bmp").resolve()
>>>p
WindowsPath('D:Tests/samples/images/3.bmp')
>>>p.as_uri()
Traceback (most recent call last):
  File "C:\pythons\Python310\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "C:\pythons\Python310\lib\pathlib.py", line 649, in as_uri
    raise ValueError("relative path can't be expressed as a file URI")
ValueError: relative path can't be expressed as a file URI

@ericvsmith
Copy link
Member

As @eryksun said: "A pathlib.Path cannot be a URI or be constructed from a URI".

@barneygale
Copy link
Contributor

barneygale commented Apr 15, 2022

A solution that works today:

from urllib.request import url2pathname
from pathlib import Path

def path_from_uri(uri):
    return Path(url2pathname(uri.removeprefix('file:')))

I'd support the addition of a Path.from_uri() classmethod, on the condition that we make from_uri() and as_uri() call through to url2pathname and pathname2url in urllib.request, rather than adding another duplicate implementation.

@barneygale
Copy link
Contributor

barneygale commented Apr 15, 2022

I suspect that the rules for parsing/emitting URLs will vary depending on whether the path uses Windows or POSIX semantics. If that's the case, I'd like to propose the following:

  • A new os.path.fileuri() function, implemented in posixpath and ntpath.
    • nturl2path.pathname2uri() calls ntpath.fileuri()
    • urllib.request.pathname2uri() calls os.path.fileuri()
    • pathlib.PurePosixPath.as_uri() calls posixpath.fileuri()
    • pathlib.PureWindowsPath.as_uri() calls ntpath.fileuri()
  • A new os.path.parsefileuri() function (unsure of naming). Similarly:
    • nturl2path.uri2pathname() calls ntpath.parsefileuri()
    • urllib.request.uri2pathname() calls os.path.parsefileuri()
    • pathlib.PurePosixPath.from_uri() calls posixpath.parsefileuri()
    • pathlib.PureWindowsPath.from_uri() calls ntpath.parsefileuri().

From a pathlib perspective this makes sense to me, as I'm hoping to move towards wrapping os.path more directly - see #31691.

Thoughts?

@eryksun
Copy link
Contributor

eryksun commented May 3, 2022

I'm closing this issue because the pathlib.Path constructor isn't intended to support file URIs.

@barneygale, I think the suggested changes to os.path should be discussed in a forum, even if just to bike shed the function names.

@eryksun eryksun closed this as completed May 3, 2022
@indigo-jay
Copy link

If this is true, then the Path constructor should raise an exception when it's given a file URI, so that it doesn't give the appearance of supporting file URIs when in fact it doesn't.

@barneygale
Copy link
Contributor

It's a good idea, but file URIs are also valid POSIX paths! You can mkdir -p file:///foo/bar/baz from your terminal today.

@ZanderBrown
Copy link

Seems relatively unlikely someone would actually want that though?

@barneygale
Copy link
Contributor

As a user, I want to be able to throw any string at pathlib.Path and have it efficiently parse it as a path. It doesn't currently perform any validation beyond checking for bytes. If we attempt to detect and reject file: URIs, it will slow things down and open the door to validating other things, which will further complicate and slow pathlib.

@indigo-jay
Copy link

indigo-jay commented Apr 7, 2023

The current implementation is, IMO, the worst option: leave file: as the first element in the path. That is utterly useless (is_file() will always return False). As I see it, there are two relatively simple options that would make it at least somewhat useful:

  1. Remove the file: part and just create a path based on the rest. This would be the most useful since it would generate a potentially valid/useful path. Thinking about it, this is probably my personal preference and further supports the "throw any string" at Path instances and have it parse usable paths.
  2. Raise an exception - since a Path instance that includes file: as the first element is utterly useless, this would at least notify users that it's not going to generate a valid Path.

@barneygale
Copy link
Contributor

barneygale commented Apr 7, 2023

That is utterly useless (is_file() will always return False)

It works fine for me on Linux:

$ touch 'file:'
$ ./python 
>>> import pathlib
>>> pathlib.Path('file:').is_file()
True

@ericvsmith
Copy link
Member

I agree with @barneygale : There's no reason to prohibit filenames starting with "file:". They're perfectly valid, at least on POSIX systems.

@ericvsmith
Copy link
Member

Also, I don't think we'd want to implement any "strip of leading 'file:' characters" logic just on Windows.

@eryksun
Copy link
Contributor

eryksun commented Apr 7, 2023

Raise an exception - since a Path instance that includes file: as the first element is utterly useless

To clarify the case on Windows, the NTFS and ReFS filesystems support file streams of the form "filename:streamname:streamtype", "filename:streamname" ("$DATA" stream type), and "filename::streamtype" (anonymous or default stream name for the given stream type). But just "file:" without a stream name or stream type is invalid, and "file:///foo/bar/baz" is invalid because "///foo/bar/baz" is parsed as the stream name, and stream names reserve "/", ":", and NUL as invalid name characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants