Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.x.x] simplify txhashset zip creation and extraction #2908

Merged
merged 5 commits into from
Jul 12, 2019

Conversation

antiochp
Copy link
Member

@antiochp antiochp commented Jun 21, 2019

This PR aims to simplify and improve a couple of implementation details around our txhashset zip handling.

Primary motivation here was to introduce some flexibility in the set of acceptable/expected files in the txhashset.zip archive.

We don't need the kernel hash file (we can rebuild it from the kernel data file) and we can save approx 45MB by excluding it.

Currently it is hard to exclude it when building the zip file without introducing a lot of code.

This PR makes the list of files more explicit so we could modify this list for protocol version 2 for example.


We currently do the following -

  • Wrap the decompress logic in panic::catch_unwind and register a handler via panic::set_hook to handle unexpected panic scenarios.
  • walk the src dir and zip everything up, being very permissive in terms of what we allow in the zip, this is recursive and handles arbitrary file paths.
  • Prior to this we check_and_remove_files via a regex pattern to clean up unwanted files.
  • When decompressing we filter files again via check_and_remove_files.
  • We attempt to handle both / and \\ path separators.

Proposed Approach

  1. We know exactly which files to include in the zip file when creating txhashset.zip. The only variable part of this is the <hash> prefix on the "rewound" leaf set files for the output and rangeproof MMR.
kernel/pmmr_data.bin
kernel/pmmr_hash.bin
output/pmmr_data.bin
output/pmmr_hash.bin
output/pmmr_leaf.bin.<hash>
output/pmmr_prun.bin
rangeproof/pmmr_data.bin
rangeproof/pmmr_hash.bin
rangeproof/pmmr_leaf.bin.<hash>
rangeproof/pmmr_prun.bin

We do not need to craft a regex to support these. We can simply define a list of file paths. These are the only files that will be included in the zip when creating it. These are the only files that will be extracted from the zip file when receiving it.

  1. Handle potential panic when extracting files from the zip by wrapping the extraction logic in a separate thread and simply checking the join handle result via join(). https://doc.rust-lang.org/std/thread/

Fatal logic errors in Rust cause thread panic, during which a thread will unwind the stack, running destructors and freeing owned resources. While not meant as a 'try/catch' mechanism, panics in Rust can nonetheless be caught (unless compiling with panic=abort) with catch_unwind and recovered from, or alternatively be resumed with resume_unwind. If the panic is not caught the thread will exit, but the panic may optionally be detected from a different thread with join. If the main thread panics without the panic being caught, the application will exit with a non-zero exit code.

Additional improvements -

  • Bump zip-rs to latest 0.5.2
  • The zip file "spec" permits the use of either / or \\ as path separator in the names of files included in the zip file. For portability we can limit this and only use '/' for both Windows and Unix. We do not need to be permissive in terms of handling a variety of path separators. We simply assume '/' and fail if the paths in the zip file do not meet these assumptions.
  • Use start_file_from_path when creating the zip to ensure paths are handled safely.
  • Use BufReader and BufWriter for IO operations involving reading/writing zip files.

TODO -

  • Verify this works on Windows (both reading and writing the zip file)

@antiochp antiochp added this to the 2.x.x milestone Jun 21, 2019
@antiochp antiochp self-assigned this Jun 21, 2019
@antiochp
Copy link
Member Author

Sample output receiving a zip. We only look for these exact files in the zip and we expect the paths in the zip to match exactly. No attempt will be made to extract anything not matching any of these exact paths.

20190621 13:28:39.278 DEBUG grin_chain::txhashset::txhashset - zip_write on path: "/antiochp/grin/node_mainnet/tmp"
20190621 13:28:39.307 INFO grin_util::zip - extract_files: "kernel/pmmr_data.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/kernel/pmmr_data.bin"
20190621 13:28:39.497 INFO grin_util::zip - extract_files: "kernel/pmmr_hash.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/kernel/pmmr_hash.bin"
20190621 13:28:39.516 INFO grin_util::zip - extract_files: "output/pmmr_data.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/output/pmmr_data.bin"
20190621 13:28:39.607 INFO grin_util::zip - extract_files: "output/pmmr_hash.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/output/pmmr_hash.bin"
20190621 13:28:39.608 INFO grin_util::zip - extract_files: "output/pmmr_prun.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/output/pmmr_prun.bin"
20190621 13:28:40.074 INFO grin_util::zip - extract_files: "rangeproof/pmmr_data.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/rangeproof/pmmr_data.bin"
20190621 13:28:40.144 INFO grin_util::zip - extract_files: "rangeproof/pmmr_hash.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/rangeproof/pmmr_hash.bin"
20190621 13:28:40.145 INFO grin_util::zip - extract_files: "rangeproof/pmmr_prun.bin" -> "/antiochp/grin/node_mainnet/tmp/txhashset/rangeproof/pmmr_prun.bin"
20190621 13:28:40.145 INFO grin_util::zip - extract_files: "output/pmmr_leaf.bin.000003b18778" -> "/antiochp/grin/node_mainnet/tmp/txhashset/output/pmmr_leaf.bin.000003b18778"
20190621 13:28:40.147 INFO grin_util::zip - extract_files: "rangeproof/pmmr_leaf.bin.000003b18778" -> "/antiochp/grin/node_mainnet/tmp/txhashset/rangeproof/pmmr_leaf.bin.000003b18778"

@antiochp antiochp changed the base branch from master to milestone/2.x.x July 9, 2019 15:59
Copy link
Contributor

@DavidBurkett DavidBurkett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't test at the moment, but we'll want to make sure this gets tested in Windows too before merging. It seems every change we make to this code breaks windows due to the file system differences (path separators, allowed filenames, etc).

Copy link
Contributor

@hashmap hashmap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// These are the *only* files we will attempt to extract from the zip file.
// If any of these are missing we will attempt to continue as some are potentially optional.
zip::extract_files(txhashset_data, &txhashset_path, files)?;
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we discussed it before, still, why not to remove ? and the last line?:)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just personal preference.

{
    one()?;
    two()?;
    three()?;
    Ok(())
}

reads better to me than -

{
    one()?;
    two()?;
    three()
}

And if you need to reorder those lines or add one at the end you don't need to go reintroducing ? (or forgetting to).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been experimenting with this though recently -

Ok(())
    .and_then(one())
    .and_then(two())
    .and_then(three())?

let res = thread::spawn(move || {
let mut archive = zip_rs::ZipArchive::new(from_archive).expect("archive file exists");
for x in files {
if let Ok(file) = archive.by_name(x.to_str().expect("valid path")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: valid path and all below may look confusing in the log file, looks good in the source code though

@antiochp antiochp changed the title simplify txhashset zip creation and extraction [2.x.x] simplify txhashset zip creation and extraction Jul 11, 2019
@antiochp
Copy link
Member Author

Going to merge this into the 2.x.x branch now.
We can test on that branch for Windows compatibility.

@antiochp antiochp merged commit 1395074 into mimblewimble:milestone/2.x.x Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants