You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This research spike entails exploring alternative ways to store and distribute unpacked ledger metadata (txmeta).
Currently, we've uploaded unpacked txmeta for pubnet through July 2022 to S3. Here are some stats:
~7TB of storage space
~42M separate objects
one directory
With that in mind, there are two avenues to this spike:
Is there a better way to structure these files? For example, we could have a folder for each checkpoint; we could combine all ledgers in a checkpoint into a single file; etc. The task is to come up with some strategies and analyse the pros/cons of each (things like storage/bandwidth costs, etc.)
Is there a better way to distribute these files? A back-of-the-envelope calculation tells us that the majority of the cost of distributing these files comes from egress bandwidth. We also want to let people build & store these files themselves, yet minimize the risk of people using rogue/corrupt/malformed txmeta. The task is to come up with a alternative transport layers and analyse their tradeoffs (for example, BitTorrent gives us bandwidth decentralization and integrity, but it's harder to do incremental updates. We can batch torrents by some ledger range, but is that too hard? What happens when we upgrade the meta format? What about IPFS? Others?? etc.)
The text was updated successfully, but these errors were encountered:
Some initial notes I made regarding using BitTorrent:
There are a few benefits:
lower cost, since we wouldn't need to pay S3 storage and egress costs
though we still have to host/seed the torrent data somewhere
decentralization in that everyone shares the bandwidth to keep it up
trust, in that everyone is using one single source of unpacked meta, so you don't run the risk of some 3rd party organization uploading sketchy unpacked ledgers since everyone uses the same torrent (this problem already exists - for ex. one day some other public Horizon goes rogue and starts giving people bad info)
There are some downsides:
a "one time reliance" on SDF as a source, in the sense that people have to trust us about the unpacked meta, but they can also verify it themselves since all of the tools are public (the only barrier is cost).
this downside is bigger: you can't update it in real-time, but we could have a model where every X amount of time, we publish a new torrent that contains some new range of unpacked metas
Thinking about it more, there is one more downside. S3 (or any other cloud file store with an optional CDN) can give everyone very quick (thinking about 100 milliseconds or less) access to any ledger meta. It probably won't be possible with BitTorrent. It will, however, allow easier replication for orgs/people who would like to host fast access archives.
That's very true @bartekn: there's an initial startup time to connecting to the swarm before you can download. If you want more than a handful of ledgers, though, that startup time should be amortized and hardly impact the overall time.
This research spike entails exploring alternative ways to store and distribute unpacked ledger metadata (txmeta).
Currently, we've uploaded unpacked txmeta for pubnet through July 2022 to S3. Here are some stats:
With that in mind, there are two avenues to this spike:
Is there a better way to structure these files? For example, we could have a folder for each checkpoint; we could combine all ledgers in a checkpoint into a single file; etc. The task is to come up with some strategies and analyse the pros/cons of each (things like storage/bandwidth costs, etc.)
Is there a better way to distribute these files? A back-of-the-envelope calculation tells us that the majority of the cost of distributing these files comes from egress bandwidth. We also want to let people build & store these files themselves, yet minimize the risk of people using rogue/corrupt/malformed txmeta. The task is to come up with a alternative transport layers and analyse their tradeoffs (for example, BitTorrent gives us bandwidth decentralization and integrity, but it's harder to do incremental updates. We can batch torrents by some ledger range, but is that too hard? What happens when we upgrade the meta format? What about IPFS? Others?? etc.)
The text was updated successfully, but these errors were encountered: