-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cargo packages duplicate files on case-insensitive file systems #13722
Comments
In the same vein, if there's a
|
@rustbot label Command-package |
While that sounds lovely, in what locale? For the languages I speak it is relatively straight forward, but my understanding is that case handling is lossy in some languages, such as German (ẞ is Ss in upper case I think?) and Turkish (I believe they have the letter "i" both with and without a dot, and the uppper/lower case there isn't straight forward, but don't ask me how exactly). As a Swedish/English speaker this is all hearsay though, and I don't know how e.g. Windows or Mac OS handle these, though I think I heard that NTFS store a case normalisation table at file system creation time based on the locale set at that point? |
On Windows, the NTFS up case table is initialized when the drive is first formatted. So it'll depend on the Windows version that did that. It is however language neutral and only acts on the Basic Multilingual Plane. Also, depending on the configuration, NTFS can be case sensitive. In Windows this can even be set differently for each directory. |
Hm, maybe I'm thinking of FAT and Windows 9x then? Pretty sure things differed depending on code pages and such there. Not sure how modern OSes interacting with FAT32/exFAT works with that. Hopefully it is somewhat sane on any Windows version Rust still supports. |
Ah yes FAT32 is indeed a mess. But then I'm also not sure how well Cargo and rustc support it as it lacks a lot of filesystem features that may be expected. Probably it does at least work if it's only read from (e.g. the target directory is on another drive). |
It is a messy problem, but fortunately the detection algorithm doesn't need to produce user-facing text, so it doesn't need to be perfect from linguistic perspective. It only needs to detect potential collisions between file names. Crates that work with only a specific combination of Windows locale and NTFS vintage are not generally useful, so the detection can also err on the side of over-normalizing (e.g. normalize all dotless but for a start, even a simple |
Problem
It seems that Cargo is excluding an already-packaged files using exact name comparison, which doesn't always match how the file system sees name equality.
Example crate:
https://docs.rs/crate/rosu/0.6.0/source/
Steps
The same applies to
license-file
,cargo.lock
.Possible Solution(s)
Theoretically there could be other gotchas of this kind, e.g. HFS+ file system on macOS forces file names to use NFD Unicode form, while most text has NFC form, which makes codepoint-by-codepoint comparisons not equal. However HFS+ is on its way out, so perhaps a simple case-insensitive comparison will suffice.
Notes
No response
Version
The text was updated successfully, but these errors were encountered: