-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MacOs and Windows lowercases filenames #3318
Comments
So, turns out that by default, Windows and MacOs do not support case sensitive file names. It seems that this filesystem setting, so one should be able to configure case sensitive support. |
We could probably warn the user about this behavior, though the detection of such use cases could be resource consuming. |
When such a situation might happen:
|
Some research shows that we are not the only one affected by this. |
if there is no good way to solve it we should at least document it. Probably a separate article about But need to look into the git first, @pared the link that you've shared seems to note that it now forks fine in git. Might be missing something. |
Yes, it seems so, though situation described there is about moving the names, and not coexisting same (from case insensitive perspective) names. I will check how git behaves when we check out repository with "same" filenames. |
Ok, so after discussion with @efiop I write down some summary of what happend to better pinpoint the problem.
The problem was that the number of data files did not match (for MacOs with default setting We could improve the detection of such cases if we would include filenames in the Exception that is thrown upon linking. In this case, the warning could hint us but was too generic, we should probably prepare our own error including both filenames, and print it upon |
Btw, in case of |
@efiop, but in case of a case-insensitive system, you cannot create |
@pared Did you close it intentionally? |
I don't recall closing this, don't know what happend, sorry. |
Ok, so it seems that git is handling this problem by warning the user about collision.
I guess it would be reasonable for us to do the same, though we should think on how to do that to not slow down any checkout because simply checking all filenames on checkout is O(n^2) |
@pared Amazing research! Thank you so much! 🙏 As to a reasonable solution, I suppose that git detects that collision while performing the checkout, right? From the wording, it seems like the ordering is completely random, so git doesn't care about it too much after all. We could do the same thing in dvc, for example, by playing around that defensive |
I suppose it depends on the system. As of current state of research, default settings of Windows and MacOs make them "Case insensitive, but case aware", so |
Some further research:
passes on my Windows computer. [EDIT] |
So, it turns out that git is checking if files collide by accessing the inode of given file. Collision is detected, when the inode has been already taken by another file. EDIT: EDIT 2: EDIT 3: |
I think we could proceed with this one by implementing our own EDIT: My proposed solution on how to improve detection of collision when linking: @efiop Do you think I missed something? |
Dunno, you are the one researching 🙂
So current |
One more summary to grasp current state of this issue: Glossary:case sensitive system: system that is able to:
case insensitive system:
In some cases, macOS is also considered case insensitive, because usually its filesystem is configured to be ProblemAs of today (
This situation happens when we create our data folder on case sensitive system and try to work with the project on case insensitive system.
What do we want?Make the user aware that the situation with overlapping paths occurred. But wait, we already check in
|
@pared thank you for such a deep investigation! This is an important problem and it would be nice if we can fix it but I’d like to understand the priority of the issue. When we discuss pros/cons of this solution we should keep in mind another option - keep this responsibility on users shoulders. In many cases, this problem has an easy workaround for users (rename some files). The same happens in many multi-platform programming frameworks (even plain C) - you need to follow some conventions to make a framework work in different platforms (and filename is on of the conventions). For now, the priority does not look like p0. Can we make it p1 or p2? |
@dmpetrov p0 was the research, as we really needed to fully understand what is going on. The solution is lower in priority. |
Closing as stale. |
Context: https://discordapp.com/channels/485586884165107732/485596304961962003/677178126916124702
The issue:
User created data dir which contained files of the same name, some were uppercase, some were lowercase.
Other user cloned the repository on MacOs and could not get images containing uppercase letters,
when those names ovelapped with other images having lowercase (the same) name.
Pulling on linux was fine.
Reproduction script:
after pull, we should have
file
andFile
in target repo, which seems to not happen on MacOs.The text was updated successfully, but these errors were encountered: