-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix duplicated 8+3 filenames for long file name entries #71
base: master
Are you sure you want to change the base?
Conversation
To be clear, you basically are addressing this todo ? |
Exactly. The issue at hand is that if I write a raspberry pi boot folder then there are many dtbs with similar prefixes (bmc27...-). This causes those to be written out with the same short name and The wiki page has a few nice pointers on how to solve this. The The basic hashmap of all entries should be less controversal and easy to add, I might extract the collision avoiding SFN search from the rest of the method body, that should make it well contained. |
I didn't understand what you wrote here. It looks (from the code) like you do the following:
To what do you refer? |
I am referring to the fact that
The hashing of NT and onwards is unknown, and the wikipedia page does not list an avoidance algorithm, but there must be one as collisions starts to become likely for a 16 bit number if you go beyond a few hundred files. |
What I think? Pretty simple, actually.
In all seriousness, this would take us from a guaranteed collision with >1 file, to a guaranteed collision with >9 file (an order of magnitude improvement) to a possible collision with >hundreds of files (a 2+ order of magnitude improvement). It isn't perfect? So what. Now that I understand it, I like most everything in here. I have some suggestions on the coding style, but will put those inline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some stylistic questions, but this really is quite good.
@@ -413,11 +416,47 @@ func convertLfnSfn(name string) (string, string, bool, bool) { | |||
isLFN = true | |||
} | |||
|
|||
truncate := func(name string, n int) string { | |||
return name[0:6] + "~" + strconv.Itoa(n) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not madly in love with inline anonymous functions here, as they will get declared every time we call convertLfnSfn
, and they are harder to understand and test.
Can we pull these three out into named functions, and give them more representative names?
|
||
random := func() string { | ||
return fmt.Sprintf("%06X", (rand.Int63()%0x1000000)) + "~" + strconv.Itoa(1+rand.Intn(8)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random
is a bit too generic of a name, and probably will confuse. We should pull this out too, but make it more specifically-named.
for i < 9 && cache[truncate(shortName, i)+"."+extension] { | ||
i++ | ||
} | ||
shortName = truncate((shortName), i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calls truncate()
duplicatively several times. Can we tighten it up a bit?
Also, why is this:
i := 1
for i < 9 && cache[truncate(shortName, i)+"."+extension] {
i++
}
and not the simpler and more idiomatic:
var found string
for i := i; i++; i < 9 {
found = truncate(shortName, i)+"."+extension
if _, ok := cache[found]; ok {
break
}
}
shortName = found
or similar
We need two pieces to generate unique 8+3 names from arbitrary long file names: a fast lookup map of existing names to find collisions and a way to generate names in the case of a collision.
The lookup map is lazy generated on first entry addition. This means the normal loading logic does not degrade in performance.
Several existing truncation approaches are listed on https://en.wikipedia.org/wiki/8.3_filename#VFAT_and_Computer-generated_8.3_filenames:
~1
..~9
~10
..~99
An example of a problematic folder would be the Big Buck Bunny PNG download (see https://media.xiph.org/BBB/BBB-360-png/). All files have the same prefix and numbering those has quadratic cost.
The hashing is usually used as a runtime complexity limiter.
This PR implements a variation of
1.+3.+random
, where random is a simple random number based file name generation.WIP: I am still not truly happy with the collision avoidance method, I would like to get some feedback on better / easier avoidance method.
After applying this patch and the other open PRs
fsck.vfat
finds only one minor issue on a complex filled disk: the go-diskfs fat32 implementation does not track the number of free clusters. Everything else looks good from a on-disk layout perspective.