Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rescue Mission for Sci-Hub and Open Science #643

Closed
Xuanwo opened this issue May 28, 2021 · 1 comment
Closed

Rescue Mission for Sci-Hub and Open Science #643

Xuanwo opened this issue May 28, 2021 · 1 comment

Comments

@Xuanwo
Copy link
Contributor

Xuanwo commented May 28, 2021

We need to do something for Open Access.

Background

Sci-Hub is a shadow library website that provides free access to millions of research papers and books, without regard to copyright, by bypassing publishers' paywalls in various ways. Sci-Hub was founded by Alexandra Elbakyan in 2011 in Kazakhstan in response to the high cost of research papers behind paywalls.

from Wikipedia

On May 7th, Sci-Hub's Alexandra Elbakyan revealed that the FBI has been wiretapping her accounts for over 2 years. This news comes after Twitter silenced the official Sci_Hub Twitter account because Indian academics were organizing on it against Elsevier.

Sci-Hub itself is currently frozen and has not downloaded any new articles since December 2020. This rescue mission is focused on seeding the article collection in order to prepare for a potential Sci-Hub shutdown.

from reddit

For now, sci-hub has more than 85,483,812 papers and the total size is up to 77 TB. The Rescue Mission from Reddit uses BitTorrent to distribute papers. They split those papers into 850 sci-hub torrents (every one of them is about 100G). It looks good, but not so enough.

  • For storage provider: 100GB or 1TB consumes too much (they need to be online)
  • For end-users: They depend on centralized service to get the paper
  • For global networks: They can't reuse the already existing data.

Motivation

We can store PDF / Papers on IPFS to avoid been taken down.

IPFS is a P2P hypermedia protocol:

  • IPFS address file/content via their content hash, no file will be corrupted.
  • IPFS transfers data in a P2P way instead of a centralized node.
  • IPFS can remove duplications via their content hash.

So IPFS is a good fit for us.

Option: IPFS cluster

We can set up an IPFS cluster holding the whole dataset and allow users to set up their own.

This method:

  • Require the user to have an IPFS cluster storing 77TB data.
  • Allow the user to build an API upon data.
  • Allow the user to fetch single paper by it's hash

Option: IPFS Index

We only maintain the index of papers:

  • DOI -> Paper Hash
  • Title -> Paper Hash
  • ... -> Paper Hash

And we can provide APIs including :

  • Insert new papers
  • Query paper via DOI / Titles / ...

The difference from IPFS cluster is, in this way, we only maintain the index/database of papers.

More: we can build a distributed DB over IPFS (maybe OrbitDB).

Related projects

@Xuanwo Xuanwo transferred this issue from beyondstorage/specs Jul 9, 2021
@Xuanwo
Copy link
Contributor Author

Xuanwo commented Sep 1, 2021

Moved to https://forum.beyondstorage.io/t/rescue-mission-for-sci-hub-and-open-science/198

BeyondStorage
We need to do something for Open Access. Background Sci-Hub is a shadow library website that provides free access to millions of research papers and books, without regard to copyright, by bypassing publishers’ paywalls in various ways. Sci-Hub was founded by Alexandra Elbakyan in 2011 in Kazakhstan in response to the high cost of research papers behind paywalls. from Wikipedia On May 7th, Sci-Hub’s Alexandra Elbakyan revealed that the FBI has been wiretapping her accounts for over 2 years....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant