-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.x.x] simplify txhashset zip creation and extraction #2908
Conversation
Sample output receiving a zip. We only look for these exact files in the zip and we expect the paths in the zip to match exactly. No attempt will be made to extract anything not matching any of these exact paths.
|
cleanup create_zip (was compress) use explicit list of files when creating zip archive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't test at the moment, but we'll want to make sure this gets tested in Windows too before merging. It seems every change we make to this code breaks windows due to the file system differences (path separators, allowed filenames, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// These are the *only* files we will attempt to extract from the zip file. | ||
// If any of these are missing we will attempt to continue as some are potentially optional. | ||
zip::extract_files(txhashset_data, &txhashset_path, files)?; | ||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember we discussed it before, still, why not to remove ?
and the last line?:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just personal preference.
{
one()?;
two()?;
three()?;
Ok(())
}
reads better to me than -
{
one()?;
two()?;
three()
}
And if you need to reorder those lines or add one at the end you don't need to go reintroducing ?
(or forgetting to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been experimenting with this though recently -
Ok(())
.and_then(one())
.and_then(two())
.and_then(three())?
let res = thread::spawn(move || { | ||
let mut archive = zip_rs::ZipArchive::new(from_archive).expect("archive file exists"); | ||
for x in files { | ||
if let Ok(file) = archive.by_name(x.to_str().expect("valid path")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: valid path
and all below may look confusing in the log file, looks good in the source code though
Going to merge this into the 2.x.x branch now. |
This PR aims to simplify and improve a couple of implementation details around our txhashset zip handling.
Primary motivation here was to introduce some flexibility in the set of acceptable/expected files in the
txhashset.zip
archive.We don't need the kernel hash file (we can rebuild it from the kernel data file) and we can save approx 45MB by excluding it.
Currently it is hard to exclude it when building the zip file without introducing a lot of code.
This PR makes the list of files more explicit so we could modify this list for protocol version 2 for example.
We currently do the following -
decompress
logic inpanic::catch_unwind
and register a handler viapanic::set_hook
to handle unexpectedpanic
scenarios.check_and_remove_files
via a regex pattern to clean up unwanted files.decompressing
we filter files again viacheck_and_remove_files
./
and\\
path separators.Proposed Approach
txhashset.zip
. The only variable part of this is the<hash>
prefix on the "rewound" leaf set files for the output and rangeproof MMR.We do not need to craft a regex to support these. We can simply define a list of file paths. These are the only files that will be included in the zip when creating it. These are the only files that will be extracted from the zip file when receiving it.
panic
when extracting files from the zip by wrapping the extraction logic in a separate thread and simply checking the join handle result viajoin()
. https://doc.rust-lang.org/std/thread/Additional improvements -
zip-rs
to latest0.5.2
/
or\\
as path separator in the names of files included in the zip file. For portability we can limit this and only use '/' for both Windows and Unix. We do not need to be permissive in terms of handling a variety of path separators. We simply assume '/' and fail if the paths in the zip file do not meet these assumptions.start_file_from_path
when creating the zip to ensure paths are handled safely.BufReader
andBufWriter
for IO operations involving reading/writing zip files.TODO -