Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Erasure Coding #606

Merged
merged 2 commits into from
Sep 25, 2020
Merged

Implement Erasure Coding #606

merged 2 commits into from
Sep 25, 2020

Conversation

gilbertchen
Copy link
Owner

A new option -erasure-coding is added to the init and add commands to provide the ability to recover from corrupted chunks.

Example use:

duplicacy init -erasure-coding 5:2 repository_id storage_url

When restoring from a storage with erasure coding enabled, the following log messages show corrupted chunks are being repaired:

  Restoring /private/tmp/duplicacy_test/repository to revision 1
  Recovering a 1824550 byte chunk from 364910 byte shards: ***--**
  Downloaded chunk 1 size 1817347, 1.73MB/s 00:00:11 9.0%
  Recovering a 6617382 byte chunk from 1323477 byte shards: **--***
  Downloaded chunk 2 size 6591322, 8.02MB/s 00:00:02 42.0%
  Recovering a 5136934 byte chunk from 1027387 byte shards: --*****
  Downloaded chunk 3 size 5116593, 12.90MB/s 00:00:01 67.6%
  Recovering a 2515494 byte chunk from 503099 byte shards: -*****-
  Downloaded chunk 4 size 2505558, 15.29MB/s 00:00:01 80.1%
  Recovering a 3984934 byte chunk from 796987 byte shards: --*****
  Downloaded chunk 5 size 3969180, 19.07MB/s 00:00:01 100.0%
  Downloaded file1 (20000000)

@gilbertchen
Copy link
Owner Author

This pull request has been mentioned on Duplicacy Forum. There might be relevant details there:

https://forum.duplicacy.com/t/new-feature-erasure-coding/4168/1

@@ -1403,6 +1421,11 @@ func main() {
Usage: "the RSA public key to encrypt file chunks",
Argument: "<public key>",
},
cli.StringFlag{
Name: "erasure-coding",
Usage: "enable erasure coding to protect against storage corruption",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About naming:

From https://forum.duplicacy.com/t/new-feature-erasure-coding/4168/2?u=thebestpessimist i understand that this only protects the same chunk from corruption, and not the whole storage (ie. like raid5/6 does).

I think the Usage should be reworded to something like enable erasure coding to protect against intra-chunk corruption, as this is not "storage level protection", just "chunk-level protection".

What would happen if in the future you would also add something like storage-level protection via something similar to raid5/6, which would do parity for any chunk?


// RSA encrypted chunks start with "duplicacy\002"
var ENCRYPTION_VERSION_RSA byte = 2

var ERASURE_CODING_BANNER = "duplicacy\003"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be missunderstanding stuff but
RSA encrypted chunks start with "duplicacy\002"
ERASURE_CODING_BANNER = "duplicacy\003"
does this mean that we cannot have rsa and erasure at the same time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can have all of those, then the flag ENCRYPTION_BANNER could be better renamed to FEATURE_LEVEL.

@gilbertchen gilbertchen merged commit 3fd3f6b into master Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants