Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No support for Windows network file://// URIs. (pip-1.2.1) #783

Closed
nlstay opened this issue Jan 28, 2013 · 22 comments
Closed

No support for Windows network file://// URIs. (pip-1.2.1) #783

nlstay opened this issue Jan 28, 2013 · 22 comments
Labels
auto-locked Outdated issues that have been locked by automation OS: windows Windows specific

Comments

@nlstay
Copy link

nlstay commented Jan 28, 2013

We have a shared network directory that we would like to point pip --index-url to in Windows. e.g. \network\share...

urllib2 accepts file:// URIs that look like: file:////network/share/.../file.tar.gz

However, pip does not properly process these. With 2 simple changes, I was able to get file://// type URIs working in pip, but it was hacky. Somebody who actually knows the code base could probably do a better job.

My first change was in index.py in _sort_locations()

for url in locations:
        if url.startswith('file:////'):
            netdir = url[len('file://'):]
            if os.path.isdir(netdir):  
                for item in os.listdir(netdir):
                    url = path_to_url2(os.path.join(netdir,item))
                    # path_to_url2() wants to normalize to 3 slashes, make sure
                    # it stays at 4 slashes for a Windows network path
                    url = 'file:////' + url[len('file:///'):]
                    files.append(url)
        elif url.startswith('file:'):
                ....

My second change was in download.py in url_to_path():

def url_to_path(url):
"""
Convert a file: URL to a path.     
"""
assert url.startswith('file:'), (
    "You can only turn file: urls into filenames (not %r)" % url)
path = url[len('file:'):].lstrip('/')
path = urllib.unquote(path)
if _url_drive_re.match(path):
    path = path[0] + ':' + path[2:]
else:
    if url.startswith('file:////'):
        # This was intended to be a Windows network path, make it start
        # with '//' again
        path = '//' + path
    else:
        # Otherwise, it was just a "normal" path
        path = '/' + path
return path

Again, this isn't the right way to do it.

@pfmoore
Copy link
Member

pfmoore commented Jan 28, 2013

I'll have a look at it. One thing that would be useful, would be if you could put together a test (that will fail currently, of course) which could be used to confirm that the bug gets fixed.

@qwcode
Copy link
Contributor

qwcode commented Jan 29, 2013

btw, I just confirmed what does currently work on windows (in the pip develop branch)
(neither of these is the non-drive network file://// url you're speaking of though)

pip install --find-links=file:///c:/users/me/my_archives/ <some project>
pip install --find-links=c:\users\me\my_archives <some project>

@pfmoore
Copy link
Member

pfmoore commented Jan 29, 2013

Actually, I'm curious as to why you say this isn't the "right way" to do this. Can you clarify?

Certainly, in an ideal world, the Python stdlib would have something like a url_to_path function that coped with the various platform variations involved in file: URL manipulation. But as far as I know, it doesn't, and that's why pip has its own code for this.

So I'd be reasonably happy to include something like the code you suggest.

@pfmoore
Copy link
Member

pfmoore commented Jan 29, 2013

OK, I've looked into this a bit more. Even patching the finder, there are still issues. For example, pip.index.HTMLPage doesn't handle network share URLs properly, so a test based on test_install_from_file_index_hash_link but using a network share still fails.

Fundamentally, there are serious inconsistencies in Python's handling of file URLs, so that without completely avoiding the stdlib we could still get issues. See the following:

>>> from urllib.request import url2pathname
>>> from urllib.parse import urlparse
>>> url2pathname(urlparse('file://C:/a/file')[2])
'\\a\\file'
>>> url2pathname(urlparse('file://D:/a/file')[2])
'\\a\\file'
>>> url2pathname(urlparse('file:////server/share/a/file')[2])
'\\\\server\\share\\a\\file'
>>> urlparse('file://D:/a/file')
ParseResult(scheme='file', netloc='D:', path='/a/file', params='', query='', fragment='')
>>> urlparse('file:////server/share/a/file')
ParseResult(scheme='file', netloc='', path='//server/share/a/file', params='', query='', fragment='')

Looking at the definition of the file URI scheme from Wikipedia, at http://en.wikipedia.org/wiki/File_URI_scheme, there are a number of possible forms for Windows files. But http://blogs.msdn.com/b/ie/archive/2006/12/06/file-uris-in-windows.aspx (referenced in that article) gives what looks to me like a reasonable set of rules.

Unfortunately, if the Python standard library doesn't apply these rules, pip is always going to have issues with corner cases (such as network shares).

Some suggestions:

  1. I believe that using filesystem paths rather than file: URLs works everywhere (if it doesn't, please raise a separate bug and I'll look at it!) so I'd recommend using those instead.
  2. I don't believe that pip should try to work around all the oddities in the stdlib implementation. Raising bugs against the stdlib would be a reasonable thing to do, though.
  3. Maybe pip could write better wrappers based on the Windows PathCreateFromUrl and UrlCreateFromPath APIs, using ctypes, but it would be a lot of work for relatively little benefit.

@qwcode - what is your view? Applying the download.py patch is easy enough, and fixes some cases (the index.py patch doesn't seem needed any more as _sort_locations has been refactored and now uses url_to_path). Is it worth applying a partial fix like this anyway?

@nlstay
Copy link
Author

nlstay commented Jan 29, 2013

  1. I have not had issues if I map a network drive, and then use file:///x:/..., but there are still times when we need to go directly to a network share like file:////network/share. I'm using file://// since this is what urllib2 seems to be ok with. There may be other corner cases, but urllib2 seems to work just fine with file:////network/share/to/file.tar.gz. It is pip that seems to deconstruct and rebuild file://// URIs that don't work with urllib2.
  2. pip tries to determine file locations on it's own before using the stdlib, and this is not working correctly (but as you say, maybe this is fixed on the dev branch).

As far as a test that fails, do you mean one that fails for anybody, or just for me? It's not easy to write a general test that points to a network share since the share can only be accessed by me.

@pfmoore
Copy link
Member

pfmoore commented Jan 29, 2013

I was after a general test that can be added to the pip test suite. But don't worry, I discovered that it's possible to use \localhost\C$\a\file as an alias for a local file, so the various local file tests can be reused to check network shares.

You're right that pip (needs to) split and reconstruct filenames. It's the file splitting/reconstructing functions in the stdlib that are causing the issues.

"using file://// since this is what urllib2 seems to be ok with" is probably not the best reason. After all, as you see it has issues for pip. Just user //server/share and you should be fine. Or is there a reason you have to use the same string in both pip and urllib?

(Please be aware, I'm not against fixing this - far from it. But there seems to be a high cost to "doing it right" and I want to understand what the cost is to you if (1) we do nothing, or (2) we just apply your download.py patch as is. By the way, what version of pip are you looking at? _sort_locations is very different in the latest code...)

@nlstay
Copy link
Author

nlstay commented Jan 29, 2013

Yes, I see now that the stdlib is causing problems.

I am using pip-1.2.1, but we could switch to a development version.

I think you're right that (1) is overkill if this isn't bothering anyone else. I think we could just apply the simple download.py patch, and that would make us happy.

@nlstay
Copy link
Author

nlstay commented Jan 29, 2013

Oops, I meant to say that fixing it right is probably overkill. If we do nothing (1), we'll probably continue to patch it over here, but it would be nice to see it in the main line.

@pfmoore
Copy link
Member

pfmoore commented Jan 29, 2013

Thanks. I'm working on the development version, and on that the _sort_locations fix is no longer needed (and indeed, it may be that using pathnames rather than file URLs is only in the dev version - my apologies for not checking that).

I see no reason then not to apply the download.py fix in trunk, assuming @qwcode doesn't have any objections. Adding a test to the test suite may be a bit too fragile to be worth it, though, so I doubt I'll bother with that (that won't affect you, it's more of an internal process point).

@qwcode - what's the process from here? I can create a pull request for this.

@nlstay
Copy link
Author

nlstay commented Jan 29, 2013

Thanks for all of your help and responses.

Yes, I am only able to use file: URLs, not pathnames in the 1.2.1 released version.

Assuming this fix is made, what would be the targeted release?

Thanks.

@qwcode
Copy link
Contributor

qwcode commented Jan 29, 2013

@pfmoore sure, create a pull request so can see all tests pass in travis before merging. I could also confirm it with my windows jenkins jobs. I'm hesitant to merge this in prior to 1.3 release (since betas are so close) until you're confident we've backfilled any tests that cover the different forms for find-links.

  • local absolute path
  • local relative path
  • normal local file url (with drive segment on windows)
  • network file url form, simulating in our test with the alias you speak of: '\localhost\C$\a\file'

(btw, It came up recently to support relative file: urls, but that's not included at this pt)

@qwcode
Copy link
Contributor

qwcode commented Jan 29, 2013

also, since we support file:// index urls, that should be confirmed to be work as well with the network and non-network forms.

@pfmoore
Copy link
Member

pfmoore commented Jan 29, 2013

@qwcode the only problem I have with this is that I definitely haven't covered the testing side of it. I can put in a test (probably based on test_sort_locations_file_[not_]find_link, which exercise the package finder) but it's certainly not exercising anything like all of the places that will be affected by the change (I don't even know all those places).

If @nlstay can propose tests that demonstrate the failure he's seeing, I'm more than happy to validate the fix and if necessary extend it to make those tests work. But I appreciate that that's no easier for him than it is for me. I'm also OK with adding the tests I've suggested and firing the pull request at Travis and the Jenkins Windows jobs (although it might need the tests to be skipped except on Windows - what's the canonical way to do that?) At the moment, the tests don't run clean on my PC (not sure why :-() so I can't easily check that all is OK before submitting to the bots, though...

It certainly sounds like it's too risky for 1.3...

@nlstay
Copy link
Author

nlstay commented Jan 29, 2013

I am now working with the latest dev code. I notice that _sort_locations( ) does not automatically take file:// URLs that point to directories and populate the files list with everything in that directory. I have a directory structure that looks like:

  • /index
    • /pkg1
    • /pkg2
    • ...

I want to point either the -i or the -f at /index and let pip "discover" all of the subdirectories, but _sort_locations( ) doesn't add pkg1, pkg2, ... to the files list. It seems the intent of the old 1.2.1 code was to do this. Is this no longer the intent?

When I point to a local directory like --index-url=file:///c:/some/dir, pip no longer works for me the way it used to.

@nlstay
Copy link
Author

nlstay commented Jan 29, 2013

I guess a test would be to install a package with pip using a --index-url=file:////localhost/c$/some/package/index where /index contains subdirectories of projects:

  • /index
    • /pkg
      • /pkg-0.0.1.tar.gz

The pip command: pip install --index-url=file:////localhost/c$/some/package/index pkg should find and install pkg-0.0.1.tar.gz.

@pfmoore
Copy link
Member

pfmoore commented Jan 29, 2013

I had a horrible feeling you'd say that (because that seems to be the path that needs Python's stdlib code to work better :-()

I'll have a look into whether it's possible to get something working for that.

Regarding --index-url=file:///c:/some/dir, I've never used that form, so I don't know what the expected behaviour is. Shouldn't you have an index.html file? (I've only ever used --find-links for local caches, as that just requires a directory full of sdists, which is easier to set up than a proper index...)

@nlstay
Copy link
Author

nlstay commented Jan 29, 2013

I suppose I'm using --index-url in a bit of an odd way. I was trying to figure out how to arrange my packages in a static directory structure (since I can't really host a web server). Then I looked at the pip source code, and sure it enough, the directory structure I showed above was supported.

However, I could just dump all of the tarballs into one directory and use --no-index -f file:////network/share/index

So I guess the --index-url functionality I described above is not a must have if there's a workaround with -f, it just doesn't work the way it used to work. I'm probably the only oddball who tried to do this. I'll leave it up to you guys what functionality you want to support.

@qwcode
Copy link
Contributor

qwcode commented Jan 29, 2013

I want to point either the -i or the -f at /index and let pip "discover" all of the subdirectories
[...] It seems the intent of the old 1.2.1 code was to do this. Is this no longer the intent?

I don't think we want to consider changing local -f, --find-links to be recursive. --find-links is a carryover from setuptools/distribute, and it doesn't do that, right?. that would be a big change. I'm not seeing that 1.2.1 used to be recursive.

Regarding --index-url=file:///c:/some/dir, I've never used that form

we have 2 or 3 tests that use a local file:// index to simulate certain index scenarios

since we support file:// index urls, that should be confirmed to be work as well

I guess we could just make a change to what --find-links supports (and not --index-url), but this would be the opportunity more explicitly document the forms supported in each case in the docs. (in "Internal Details/Finding Packages" maybe)

@nlstay
Copy link
Author

nlstay commented Jan 29, 2013

I'm ok either way. I can put all of my sdists into one directory instead of having subdirectories if that's the direction that pip is heading.

@pfmoore
Copy link
Member

pfmoore commented Jan 29, 2013

Sounds like everyone is pretty much OK with whatever ends up being agreed, but it would be good to clarify it and document it explicitly. I'll look at putting together a proper proposal, and we can take it from there. Given that it's not going to be in 1.3, I'm happy to take the time and do it right.

I'm still OK with attempting to get the basic patch into 1.3 if it helps, but from the sound of it, it's not going to make a huge difference, so maybe it's better to leave that.

Regardless, thanks @nlstay for raising the question in the first place!

@nlstay
Copy link
Author

nlstay commented Jan 29, 2013

Ok. I can continue to patch 1.2.1 on my side until this is fixed right. Note that the download.py patch alone is not enough because pip (along with the stdlib) is "correcting" -f file:////network/share or -f \\network\share paths down to file:///network/share URL. Therefore, download.py cannot detect when it was supposed to be a \\network\share style path originally.

@dstufft
Copy link
Member

dstufft commented Mar 22, 2017

Closing this, the latest referenced issue seems to indicate that this is working now. Since this issue was opened the HTTP handling code has been rewritten so it's possible it was fixed then. If this is still an issue, feel free to reopen.

@dstufft dstufft closed this as completed Mar 22, 2017
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 3, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation OS: windows Windows specific
Projects
None yet
Development

No branches or pull requests

4 participants