Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify maximum block size #3104

Closed
robcat opened this issue Aug 20, 2016 · 10 comments
Closed

Clarify maximum block size #3104

robcat opened this issue Aug 20, 2016 · 10 comments
Labels
help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) need/community-input Needs input from the wider community status/ready Ready to be worked

Comments

@robcat
Copy link

robcat commented Aug 20, 2016

The current implementation specifies a fixed maximum block size of 1MiB.

This is a pretty fundamental limitation, it should be explained better why this is the case (IPLD does not seem to be limited in the same way).

Motivation

When ipfs/specs#130 will land, it will be straightforward to automatically convert many hashes already used for integrity checks.

For example, Canonical could construct the "ubuntu releases" object (it would contain links to the historical live cd ISOs) and publish it in IPNS. It will just need to convert these hashes in the CIDv1 format, using the "raw data" codec.

This object would have a perfectly legitimate purpose (i.e. proving the integrity of the live cds), but ipfs will not be able to handle its leaves.

@jbenet
Copy link
Member

jbenet commented Aug 21, 2016

Reasons for limitation:

  • security - easy to DOS nodes without forcing small chunks
  • deduplication - small chunks can dedup. big ones effectively dont.
  • latency - can externalize small pieces already (think a stream)
  • bandwidth - optimize the use of bandwidth across many peers
  • performance - much better perf to hold small pieces in memory. hash along the dag to verify the whole thing.

the big DOS problem with huge leaves is that malicious nodes can serve bogus stuff for a long time before a node can detect the problem (imagine having to download 4GB before you can check whether any of it is valid). this was super harmful for bittorrent (when people started choosing huge piece sizes), attackers would routinely do this, very cheaply-- just serve bogus random data. smaller chunks are very important here.

the way i would approach what you describe (which is pretty cool) is that canonical should both:

  • provide the leaf hash with CIDv1. ipfs could address it but not easily transfer it (there may be exceptions here, where it could be a tunable parameter).
  • provide a new graph (chunked, dedup, etc), whose output matches the other leaf.
  • that way ipfs could pull the chunked one, and verify the whole one afterwards.

you raise a good point though that we need to address how to look at objects and know whether to pull them when they may be too big.

@whyrusleeping
Copy link
Member

Could this issue be moved to ipfs/notes please?

@em-ly em-ly added the need/community-input Needs input from the wider community label Aug 25, 2016
@robcat
Copy link
Author

robcat commented Aug 31, 2016

Just one tangential issue: when trying to add a big block, ipfs stops in the middle of the "add", like this:

$ ipfs add -s size-1048577 /path/to/big/file
779.62 KB / 2.42 MB [=========>-----------------] 31.49 % 0
19:03:08.355 ERROR commands/h: object size limit exceeded client.go:247

Maybe ipfs should refuse altogether to use a chunking size greater than BlockSizeLimit and/or give a better error message?

@ghost
Copy link

ghost commented Aug 31, 2016

Maybe ipfs should refuse altogether to use a chunking size greater than BlockSizeLimit and/or give a better error message?

Yes agreed

@whyrusleeping whyrusleeping changed the title Clarify maximum block size Give better errors around blocksize limitations Sep 14, 2016
@whyrusleeping whyrusleeping changed the title Give better errors around blocksize limitations Clarify maximum block size Sep 14, 2016
@whyrusleeping whyrusleeping added the help wanted Seeking public contribution on this issue label Sep 14, 2016
@whyrusleeping whyrusleeping added kind/bug A bug in existing code (including security flaws) status/ready Ready to be worked labels Nov 28, 2016
@ethernomad
Copy link

Is the blocksize limit enforced in the network layer, or just in the chunker?

@mateon1
Copy link
Contributor

mateon1 commented May 30, 2017

@ethernomad Both. As seen here, the chunker limits blocks to 1MB, but libp2p limits blocks to 2MB. The second limit would be hit for directories with large amounts of files prior to directory sharding being implemented.

@alphaCTzo7G
Copy link

I am guessing as network bandwidth, processor speeds, RAM sizes increases, the optimum block size should also scale as it becomes easier to do all the things that are mentioned in #3104 (comment). Further, the volume of data/size of individual files that people want to share with ipfs will also scale.

Is there a way to increase/tune the blocksize for future? If not, wont the minimum block size become a bottleneck?

@Kubuxu
Copy link
Member

Kubuxu commented Aug 7, 2017

No, maximum blocksize won't become bottleneck as IPFS is not a PoW blockchain, there is no limitation on how many blocks can be sent per second.

As @jbenet said: smaller blocksize allows for more deduplication and security but we are working on rising it either way to to characteristics of few version addressed ecosystems (git as one of the example) but it has to be done without any security trade off.

@Stebalien
Copy link
Member

@alphaCTzo7G if you want to better understand this problem, take a look at this discussion on the IPFS discourse.

@whyrusleeping
Copy link
Member

Closing this issue for now, please move further discussion to a new ipfs/notes issue, or discourse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) need/community-input Needs input from the wider community status/ready Ready to be worked
Projects
None yet
Development

No branches or pull requests

9 participants