Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pinset sharding deterministic #3640

Merged
merged 1 commit into from
Feb 12, 2017
Merged

Conversation

whyrusleeping
Copy link
Member

Making this deterministic keeps us from creating an exponential amount
of objects as the number of pins in the set increases.

closes #3621 for the most part
License: MIT
Signed-off-by: Jeromy [email protected]

Making this deterministic keeps us from creating an exponential amount
of objects as the number of pins in the set increases.

License: MIT
Signed-off-by: Jeromy <[email protected]>
@whyrusleeping whyrusleeping added the status/in-progress In progress label Jan 28, 2017
@whyrusleeping whyrusleeping added this to the ipfs 0.4.6 milestone Jan 29, 2017
@rht
Copy link
Contributor

rht commented Jan 31, 2017

Confirmed there is no longer exponential explosion, , but pinning still reverses the benefit of block-level dedup.

@Kubuxu
Copy link
Member

Kubuxu commented Jan 31, 2017

but pinning still reverses the benefit of block-level dedup.

What you mean by that?

@Kubuxu
Copy link
Member

Kubuxu commented Jan 31, 2017

From that graph it seems for me that we should start sharding the pinset a bit more early.

@rht
Copy link
Contributor

rht commented Jan 31, 2017

I meant: the storage-saving effect of the block-dedup is shadowed by the storage-requirement of the pin set. In short, files in an ipFS repo take more space than they were in unixFS.

@Kubuxu
Copy link
Member

Kubuxu commented Feb 1, 2017

IDK if we should go with fully static seed, it allows for same attacks that all languages protect their hash maps against (precalcing which items would go to which buckets and causing requests that would cause many items in one bucket) but in our case it would be causing the tree to have one deep branch.

@whyrusleeping
Copy link
Member Author

@Kubuxu I don't think this is an issue, You would have to find 257 items that all share a hash prefix of length n, where the hash function changes at each byte index. And then convince another node to pin each of them individually.

@Kubuxu
Copy link
Member

Kubuxu commented Feb 1, 2017

Yeah, the split to sub buckets helps in comparison to hash maps.

@whyrusleeping
Copy link
Member Author

@Kubuxu 👍 here?

Copy link
Member

@Kubuxu Kubuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is quite inefficient way of doing it but will have to do. Let's switch to HAMT as soon as it is viable.

@whyrusleeping whyrusleeping merged commit 4028e89 into master Feb 12, 2017
@whyrusleeping whyrusleeping deleted the fix/pinset-obj-explosion branch February 12, 2017 20:05
@whyrusleeping whyrusleeping removed the status/in-progress In progress label Feb 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0.4.5: repo does not scale beyond ~ 8000 files
3 participants