-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IOError: File name too long #1634
Comments
It turns out it's because I'm using eCryptFS which has a limit of 143 characters: http://stackoverflow.com/questions/6571435/limit-on-file-name-length-in-bash The only solution I can think is to shorten the file names. Maybe don't store the entire url in the filename? Or add exception handling for this error and just skip caching for this case. Let me know which you prefer, and I will write a patch for either. |
I thought of a better solution. Instead of storing the full url as the filename, hash it. I'm thinking of doing a structure similar to how git stores its object files:
The hash is on the file's full url. I forget why git does separate directories for the beginning of the SHAs, but I'm sure there is a good reason. Finding matching files for a particular host would be easy because then you can just do:
This also has the advantage that you don't have to encode the url. For backwards compatibility, you can just check for files in the old format and then convert them to the new format without having to redownload any files. I'd be happy to build this, but I would like some approval of the design from a maintainer before starting. It's terrible spending a ton of time writing and testing a patch just to have it ignored or rejected. Let me know. |
Because of? |
The reason for the prefix is to limit the number of directories/files in a single directory. Some bikeshedding here, i'd prefer the path to be more broken up, something like...
That is using sha224 (these urls might be coming from different locations, a collision attack may be plausible so using sha224 should make that harder). It also uses 5 directory deep prefix instead of a single one. I don't have a particular reason for that except I prefer it. It also includes the full hash in the final directory instead of just the rest of it. Once you locate the filename it should verify that the url file associated with it matches the url we are looking for, and should treat a failure as a cache miss. This should also solve the same issue for #1287 |
Because the url is kept in a file named The url This creates the following directories and files:
|
@dstufft I see no reason to use a stronger hash function as there are no known attacks on sha1 (only a theoretical attack that is unproven) and even the attack on MD5 requires a large piece of data to be effective. URLs are short compared to the size requirements for an MD5 attack. Also the target controls the URLs so the attack space isn't very large. I think git made the right choice in balancing number of directories and hash length, I would prefer to defer to their expertise. |
Sorry, but you need to argue a reason why using a weaker hash is more appropriate. The default in any software I'm willing to accept should be the strongest available. In this case sha1, sha224, sha256, sha384, and sha512 have a 40, 56, 64, 96, and 128 byte hex digest respectively. There are two filesystems where the difference will matter, that is FATX and MINIX V3 FS and MINIX V3 FS will function perfectly fine with sha224 too. So there's no technical reason afaict to prefer the weaker just cargo culting what git has done. As far as two letter prefix vs my scheme, that's just a style thing, I find the multiple nested and a full hash at the end to be nicer to work with in general. |
sha1 is python native. I shouldn't need more reason than that and the reasons I said earlier. You're not building a crypto library here. Keep your requirements simple.
Please don't insult me. |
Python sha-2 is native to Python as well via the hashlib module. There's literally no more complexity from using something in the sha-2 family over using sha1. |
Ah, I thought hashlib wasn't in the standard library. Fine, sha224 it is. |
Also your reasons only address why using sha1 isn't inherently broken, they don't provide any reasoning as to why this should use sha1 over something in the sha-2 family. |
Ah cool :) That explains it then :) |
I wonder if we ought to use a hardcoded name for the filename inside of the package directory too... as of right now someone could have a filename longer than 143 (there's no limit in PyPI etc). Although I've never seen anyone ever have a problem with that so it probably doesn't matter. |
Oh incase it wasn't obvious besides the bikeshedding I would absolutely accept this and I doubt any of the other maintainers would object. |
I thought about that, but decided to leave the full file name to make it easier to search and navigate. I've never seen a 143 character library name, and that would be perverse and silly. |
Yea I'm happy punting on that. |
As for putting the full sha as the subdirectory, that's just duplicating information. Once the hash is generated there is only one place it could possibly be and the python code for it is:
|
if you want a compromise between path length and hash family, imo |
MD5 would work fine too. I'm really not worried about collisions or security here as they're not really within the scope of this use case. That being said, it's taboo to crop a hashed result, so I will just use the full hash. 64 characters should be fine for all known systems. |
Would love to know who says so. |
I'll leave that as an exercise for you to learn more about cryptographic hashing. |
@joehillen as a matter of fact, that's absolute bollocks. It coincidentally stops message extension attacks on plain merkle-damgard constructions like sha1 and sha256, and is exactly how sha224 and sha384 are computed. |
k |
I recently hit this on OSX, which has a 255 filename limit. I'm also using an internal PyPI proxy, which results in pip adding a long'?remote=... target_url ....' string to the filename. Hashing the cache directory structure sounds like an ideal fix. |
Yeah, I want to work on this, but I've been far too busy the last month. I'm hoping I will get some downtime soon to work on this. |
What is the status of this Issue? I want to add if you install Ubuntu or a Linux Distribution that encrypt your home, the filename restriction raises. |
@mariocesar see dstuff's PR #1748 above, it should solve these when pulled |
Got this error while using pip caching. I had to turn off caching in order for install to finish successfully. This is on 1.5.4. This is on Ubuntu 12.04 64-bit.
What's weird is the string it's failing on is only 166 characters.
The text was updated successfully, but these errors were encountered: