Replication on IPFS -- Or, the Backing-Up Content Model #47

jbenet · 2015-09-27T02:40:53Z

Some of the most frequently asked questions about IPFS are around "how does IPFS guarantee content sticks around" and "how do you ensure I do not download bad things". The short answer is: IPFS doesn't by itself download things you don't ask it to. Thus, backing up content must be done a layer on top of IPFS, with ipfs-cluster, Filecoin, or similar protocols.

Important Design Goals for Content Distribution:

IPFS has as a strict requirement that content be able to move as fast as the underlying network permits. this rules out designs like freenet's and other oblivious storage platforms, as the base case. They're just way too slow for most of IPFS use cases. That said, these can be implemented trivially with the use of privacy focused transports (like Tor), content encryption, and so on.
IPFS has as a design requirement that nodes be able to only store and/or distribute content they explicitly want to store and/or distribute. This means that computers that run IPFS nodes do not have to host "other people's stuff", which is a very important thing when you consider that lots of content in the internet is -- in some for or other -- illegal under certain jurisdictions.
IPFS nodes will be able to express policies, and subscribe to network allow/denylists and policies that express content storage and distribution requirements. This way, users and groups can express what content should or should not be stored and/or distributed. This is required by users to (a) comply with legal constraints in their respective countries, (b) required by users with stricter codes of conduct (i.e. content that is legal but undesired by a group -- e.g. a childrens website).

Question and Answers:

Q: When I add content, what happens?
A: It is stored in your local node, and made available to other nodes in your network, via advertising it on the routing system (i.e. the IPFS-DHT). The content is not sent to other nodes until they explicitly request it, though of course some content may already exist in the system (content-addressing).
Q: Can peers tell what I have?
A: In some modes yes, in others no. Peers who request content being advertised from a node can retrieve it and thus see that the node indeed had that content. These advertisements will be configurable through policies in the future, to give users better control over what is published to whom. Obscuring content altogether is addressed a layer above raw ipfs, through the use of (a) encryption and capabilities, (b) transport + routing systems with stronger privacy guarantees, and (c) peer authentication and trust models.
Q: Will i store other people's stuff?
A: No, by default IPFS will not download anything your node doesn't explicitly ask for. This is a strict design constraint. In order to build group archiving, and faster distribution, protocols are layered on top that may download content for the network, but these are optional and built on top of basic IPFS. Examples include bitswap agents, ipfs-cluster, and Filecoin.
Q: but bitswap says it may download stuff for others, to do better?
A: yes, this is an extension of bitswap, not implemented yet, and will be either opt-in, or easy to opt-out and following the denylists (to avoid downloading bad bits).
Q: how can i ensure something remains online?
A: you can do this by keeping one or several ipfs nodes online pinning the content you're interested in backing up, the more ipfs nodes pinning content, the better redundancy you get. Tools such as ipfs-persistence-consortium, pincoop, and ipfs-cluster on top of ipfs allow you to share the costs of bandwidth with other people or organizations. Then, protocols like Filecoin will allow you to just pay the network to do it for you (i.e. similar to how people pay "the cloud companies", but here you're paying the network itself). (Filecoin is not live yet)
Work in Progress

longears · 2015-10-31T17:39:02Z

So right now I could discover a random IPFS node, enumerate its hashes, and download the content?

This means if I want to back up my own secret files I need to either

encrypt them before they go into IPFS
or run my own private IPFS network by changing the bootstrap nodes

For the second case, is communication between IPFS nodes encrypted on the wire or should I encrypt my files anyway to avoid eavesdroppers?

I get that IPFS is mostly designed for public content. Just want to understand what precautions to take for secret content with IPFS as it is today. It sounds like "don't tell people your hashes" is not enough. :)

jbenet · 2015-11-01T07:24:06Z

enumerate its hashes,

can't quite enumerate the hashes, but can maybe find provider records for the hashes they're willing to serve.

This means if I want to back up my own secret files I need to either

encrypt them before they go into IPFS

or run my own private IPFS network by changing the bootstrap nodes

That's right. Though we'll have encryption built in soon.

For the second case, is communication between IPFS nodes encrypted on the wire or should I encrypt my files anyway to avoid eavesdroppers?

encrypted. but not yet audited, so beware. we'll upgrade our security advertisements as we test + audit the pieces. it's better to claim less for now. -- though I consider it already safer than most HTTP (and even some HTTPS) traffic already, given HTTP traffic is not encrypted (and HTTPS traffic is not integrity checked at all!!)

I get that IPFS is mostly designed for public content. Just want to understand what precautions to take for secret content with IPFS as it is today.

It isn't designed for public content-- it's designed for private content too. we just dont have encryption in yet. But yes, definitely pre-encrypt anything personal.

It sounds like "don't tell people your hashes" is not enough. :)

certainly not!

randomshinichi · 2016-02-01T20:36:55Z

The content is not sent to other nodes until they explicitly request it, though of course some content may already exist in the system (content-addressing).

Does this mean that unless someone else explicitly downloads a hash of my file, I will be the only one who has a complete copy of the file on IPFS, and all the other nodes may only have a few blocks of my file?

ghost · 2016-02-01T20:39:05Z

Does this mean that unless someone else explicitly downloads a hash of my file, I will be the only one who has a complete copy of the file on IPFS, and all the other nodes may only have a few blocks of my file?

Yes. Nodes won't fetch anything unless told to.

dimitarvp · 2017-01-07T23:47:28Z

I'm wondering if there's an enhanced BitTorrent-like mode planned for IPFS -- for example if I add a publicly-accessible and legal big blob of data, the IPFS network will automatically ensure that at least 2 other nodes have my content as well. This could probably be called "redundancy guarantee mode" or something, and it won't be the default mode in which your node will be running. You'll have to go out of your way to activate it.

One of the uses I would have for IPFS is exactly this: redundancy and cooperation. Say we host our own Dropbox-like directories, fully encrypted (or DB backups, or fully cached smaller websites, or legally owned and DRM'd movies/music, etc). For this to work, part of the network volunteers would donate disk space and machine uptime for it. I know that I certainly will volunteer if IPFS gains this capability.

(Example: Storj, and maybe MaidSafe as well.)

I am well aware this isn't the original goal of IPFS. I am simply wondering if anybody amongst the designers or implementors ever had this idea.

hsanjuan · 2017-01-13T17:46:38Z

@dimitarvp See https://github.com/ipfs/ipfs-cluster/ , particularly user-story issues. Feel welcome to add your own user-story or to contribute to an existing one, as these will shape ipfs-cluster development.

dimitarvp · 2017-01-13T22:24:09Z

@hsanjuan Thank you.

flyingzumwalt · 2017-05-23T00:09:14Z

This issue was moved to https://discuss.ipfs.io/t/replication-on-ipfs-or-the-backing-up-content-model/372

arni077 · 2018-07-09T06:42:36Z

@lgierth when you mean "download a hash of my file" then it means a node just visited to an ipfs website? or to download it a file you have to pin it?

Stebalien · 2018-07-09T07:53:38Z

Visiting will download it (well, at least some of it).

arni077 · 2018-07-09T08:29:09Z

@Stebalien in order to be a node i have to run a daemon on the command line?

Stebalien · 2018-07-09T08:40:49Z

Either that or use ipfs-desktop.

ghost mentioned this issue Feb 1, 2016

How permanent is data stored on IPFS? #93

Closed

hackergrrl mentioned this issue Mar 17, 2016

When will data be permanently available? ipfs/ipfs#165

Closed

mitar mentioned this issue Mar 28, 2016

3.4 Block Exchange - BitSwap Protocol ipfs/papers#15

Open

nothingmuch mentioned this issue Jun 27, 2016

Convergent Encryption ipfs/notes#63

Open

jbenet mentioned this issue Jul 9, 2016

ipfs-cluster - tool to coordinate between nodes ipfs/notes#58

Closed

whyrusleeping mentioned this issue Aug 18, 2016

Pin tutorial on ipfs.io #160

Closed

ibnesayeed mentioned this issue Dec 9, 2016

Consider automatic parameterized archive replication oduwsdl/ipwb#60

Open

RichardLitt added the answered label Jan 18, 2017

flyingzumwalt closed this as completed May 23, 2017

synctext mentioned this issue Oct 18, 2017

Dispersy mini: Dispersy no longer stores any data in databases Tribler/tribler#2778

Closed

8 tasks

yesoer mentioned this issue Jun 4, 2023

Full file replication yesoer/peersdb#13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replication on IPFS -- Or, the Backing-Up Content Model #47

Replication on IPFS -- Or, the Backing-Up Content Model #47

jbenet commented Sep 27, 2015 •

edited

Loading

Work in Progress

longears commented Oct 31, 2015

jbenet commented Nov 1, 2015

randomshinichi commented Feb 1, 2016

ghost commented Feb 1, 2016

dimitarvp commented Jan 7, 2017

hsanjuan commented Jan 13, 2017

dimitarvp commented Jan 13, 2017

flyingzumwalt commented May 23, 2017

arni077 commented Jul 9, 2018

Stebalien commented Jul 9, 2018

arni077 commented Jul 9, 2018

Stebalien commented Jul 9, 2018

Replication on IPFS -- Or, the Backing-Up Content Model #47

Replication on IPFS -- Or, the Backing-Up Content Model #47

Comments

jbenet commented Sep 27, 2015 • edited Loading

Work in Progress

longears commented Oct 31, 2015

jbenet commented Nov 1, 2015

randomshinichi commented Feb 1, 2016

ghost commented Feb 1, 2016

dimitarvp commented Jan 7, 2017

hsanjuan commented Jan 13, 2017

dimitarvp commented Jan 13, 2017

flyingzumwalt commented May 23, 2017

arni077 commented Jul 9, 2018

Stebalien commented Jul 9, 2018

arni077 commented Jul 9, 2018

Stebalien commented Jul 9, 2018

jbenet commented Sep 27, 2015 •

edited

Loading