-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: The hash_cache fails with filenames including surrogates (#492)
Cases encountered in practice show that filenames can include unicode surrogates. This changes the hash_cache functionality to preserve them by using the "surrogatepass" handler for them. Another issue encountered is that S3 metadata only allows ASCII, but the code was placing unicode paths in it. * Modify the hash_cache sqlite schema to use a BLOB instead of TEXT for the file_path, and then explicitly encode/decode it using the 'utf-8' encoding and 'surrogatepass' error handler. * In the canonical json code, use the 'surrogatepass' error handler in the canonical file path sorting. Add the new test case data to the test manifest file. * In the error logging of the exception that revealed this bug, include the exception context to improve debugging of future similar issues. * Add a test case for the hash_cache based on an example that was failing in practice. * Add the "asset-root-json" metadata containing the path encoded to ASCII via a JSON string. If the file path is not ASCII, it adds this metadata instead. Signed-off-by: Mark Wiebe <[email protected]>
- Loading branch information
Showing
9 changed files
with
56 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
test/unit/deadline_job_attachments/data/manifest_v2023_03_03.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
{"hashAlg":"xxh128","manifestVersion":"2023-03-03","paths":[{"hash":"CarriageReturn","mtime":1679079744833848,"path":"\r","size":1},{"hash":"One","mtime":1679079344833868,"path":"1","size":1},{"hash":"c","mtime":1675079344833848,"path":"another_test_file","size":1},{"hash":"b","mtime":1479079344833848,"path":"test_dir/test_file","size":1},{"hash":"a","mtime":167907934333848,"path":"test_file","size":1},{"hash":"Control","mtime":1679079344833348,"path":"\u0080","size":1},{"hash":"LatinSmallLetterOWithDiaeresis","mtime":1679079344833848,"path":"\u00f6","size":1},{"hash":"EuroSign","mtime":1679079344836848,"path":"\u20ac","size":1},{"hash":"EmojiGrinningFace","mtime":1679579344833848,"path":"\ud83d\ude00","size":1},{"hash":"HebrewLetterDaletWithDagesh","mtime":1679039344833848,"path":"\ufb33","size":1}],"totalSize":10} | ||
{"hashAlg":"xxh128","manifestVersion":"2023-03-03","paths":[{"hash":"CarriageReturn","mtime":1679079744833848,"path":"\r","size":1},{"hash":"One","mtime":1679079344833868,"path":"1","size":1},{"hash":"c","mtime":1675079344833848,"path":"another_test_file","size":1},{"hash":"b","mtime":1479079344833848,"path":"test_dir/test_file","size":1},{"hash":"a","mtime":167907934333848,"path":"test_file","size":1},{"hash":"Control","mtime":1679079344833348,"path":"\u0080","size":1},{"hash":"UserTestCase","mtime":1679579344833848,"path":"\u00c3\u00b1","size":1},{"hash":"LatinSmallLetterOWithDiaeresis","mtime":1679079344833848,"path":"\u00f6","size":1},{"hash":"EuroSign","mtime":1679079344836848,"path":"\u20ac","size":1},{"hash":"EmojiGrinningFace","mtime":1679579344833848,"path":"\ud83d\ude00","size":1},{"hash":"EmojiTestCase","mtime":1679579344833848,"path":"\ude0a","size":1},{"hash":"HebrewLetterDaletWithDagesh","mtime":1679039344833848,"path":"\ufb33","size":1}],"totalSize":10} |