Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

Project Gutenberg #29

Open
davidar opened this issue Sep 29, 2015 · 15 comments
Open

Project Gutenberg #29

davidar opened this issue Sep 29, 2015 · 15 comments
Assignees
Labels

Comments

@davidar
Copy link
Collaborator

davidar commented Sep 29, 2015

The first thing I mirrored to IPFS was a small subset of Project Gutenberg, so I'm definitely interested in getting the whole thing into IPFS, as both @rht (#14) and @simonv3 (https://github.com/simonv3/ipfs-gutenberg) have suggested.

Making an issue to coordinate this.

@rht
Copy link

rht commented Sep 29, 2015

This is just an rsync away, really.
Currently running it on pollux.

@davidar
Copy link
Collaborator Author

davidar commented Sep 29, 2015

@rht is there enough free disk space on Pollux?

@rht
Copy link

rht commented Sep 29, 2015

(didn't check)

@rht
Copy link

rht commented Sep 29, 2015

https://www.gutenberg.org/wiki/Gutenberg:Mirroring_How-To says it is at least 650 GB (could have been doubled).
Pollux has 13 GB left.

But anyway, the mirroring is a one-liner.

@simonv3
Copy link

simonv3 commented Sep 29, 2015

@rht Yeah, what makes this difficult is the amount of disk space - I don't think many people have that amount of space lying around for this.

It's been suggested by some people to shard the collection and just make sure people hosting those bits keep their things in sync independently. There's also been talk about this tool: ipfs/notes#58

@simonv3
Copy link

simonv3 commented Sep 29, 2015

We could also just pitch in x amount for an Amazon instance (or some other host) of that amount, and just pay that?

Or I could see if I can figure out my raspberry pi, and attach a TB to it.

@rht
Copy link

rht commented Sep 30, 2015

Hmm, rsync doesn't have seek so at least the first 'download -> hash' needs the TB storage to contain it.

Either

  1. https://aws.amazon.com/s3/reduced-redundancy/ ~$24/month.
  2. http://www.amazon.com/Green-1TB-Desktop-Hard-Drive/dp/B006GDVREI ~$50 (can be repurposed for other archivals, once the PG hash has been sharded).

For now, to do partial backup, ipfs object get can be used for each of the nodelinks that form parts of the root hash.

@rht
Copy link

rht commented Sep 30, 2015

(and both storage came from amazon)

@rht
Copy link

rht commented Sep 30, 2015

ipfs check-redundancy $hash would be useful.

@davidar
Copy link
Collaborator Author

davidar commented Sep 30, 2015

@jbenet @lgierth SEND MORE DISKS...

Also see ipfs/infra#89

@davidar
Copy link
Collaborator Author

davidar commented Sep 30, 2015

ipfs check-redundancy $hash would be useful.

@rht Yeah, what I really want to do is have a "click to pin" button on the archive homepage, people select how much storage they want to donate, and the tool randomly selects an appropriate subset of the least-redundant blocks and pins them to the local daemon.

CC: @whyrusleeping

Edit: see ipfs/notes#54

@whyrusleeping
Copy link
Contributor

that would be cool. could have our service enumerate providers for each block under a given archive root, then assign blocks with the least number of providers to the next person who requests.

@rht
Copy link

rht commented Sep 30, 2015

Should be normalized based on the blocks demand curve.

@jbenet
Copy link
Contributor

jbenet commented Sep 30, 2015

@jbenet
Copy link
Contributor

jbenet commented Sep 30, 2015

We can get more storage nodes, if necessary

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants