-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
md5 hash collision in annual climatology files #65
Comments
I don't know why this problem only affects annual files; as I understand the cause, there's no reason these collisions shouldn't happen between seasonal and monthly files as well. Some climatology files from the HadGEM-CC and HadGEM-ES have md5 hash collisions when the only parameter that differs between the files is the model, as well. I haven't looked into this. |
Just FYI:
|
This bug has been accidentally resolved, hooray! I recalculated some of the climatologies affected, for unrelated reasons as part of updating climdex variables that represent minimums and maximums. (pacificclimate/climate-explorer-data-prep#81) With the most recent versions of |
Some annual-resolution climdex climatology files that have the same model, variable, period, and run but different emissions scenarios are hashing to the same md5sum when only the first MB is hashed. This seems to be due to reserved space in the netCDF header in excess of 1MB. The file structure appears to be:
Hashing the first MB of the file means that only dimension declarations, variable declarations, and empty padding go into the hash, and these can be identical across files that differ only in emissions scenario.
Attempting to index a file whose md5 hash collides with a file already in the database yields this error message:
Short term, we've removed the reserved space from the un-indexable files needed for a current project (72 out of the >500 affected) and been able to index them, but this problem will recur and needs a more permanent solution, possibly either
The text was updated successfully, but these errors were encountered: