-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove shards #5066
Remove shards #5066
Conversation
So what's the replacement method to reliably and trustlessly share a full history node database? |
@MarkusTeufelberger tarball, rsync, all the most popular methods for copying data. The sharding feature was never completed. What is there to replace? |
There's a comment mentioning shards in |
I also see mention of |
We have to ignore the "failing" check from codecov, the one that is showing "36% of diff hit (target 71%)". The alternative is to add more code in unit tests, which this PR should not do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good change, which will simplify a lot of the remaining code. I left a few suggestions and questions.
However, I have to ask, sort of as a follow-up to @MarkusTeufelberger's question: Why not finish the feature? It wasn't bad code, and it had potential for making it a lot easier to manage space, especially as pertains to online_delete
. I don't know what else was planned for it, either.
case protocol::mtGET_PEER_SHARD_INFO_V2: | ||
success = detail::invoke<protocol::TMGetPeerShardInfoV2>( | ||
*header, buffers, handler); | ||
break; | ||
case protocol::mtPEER_SHARD_INFO_V2: | ||
success = detail::invoke<protocol::TMPeerShardInfoV2>( | ||
*header, buffers, handler); | ||
break; | ||
default: | ||
handler.onMessageUnknown(header->message_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about removing these message handlers without changing the protocol version as an indication that this peer no longer supports shards. However, I checked and onMessageUnknown
doesn't do anything, so it looks like the peer won't be punished or booted, so maybe no harm no foul?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong feelings about it. I'm fine incrementing the protocol version if someone can teach me how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW @ximinez that exactly "the peer won't be punished or booted, so not harm no foul" was my thinking when I reviewed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of how one would add a protocol version. ximinez@f978447, but it's not actually useful, because older clients won't speak this protocol version, and newer clients won't support Shards anyway, so there's no need to test for it.
I updated the PeerImp::supportsFeature
function, which for a "real" new protocol version would be called to check the feature before doing the new function or sending the new message or whatever.
// The second argument of fetch is ignored when not using shards | ||
if (auto obj = context.app.getNodeFamily().db().fetchNodeObject( | ||
locator.getNodestoreHash(), locator.getLedgerSequence())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second argument of fetch is ignored when not using shards
So does it make sense to remove the second argument to fetchNodeObject
everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. I started going down that route, but it is all over the place, and I started having second thoughts. We might want to keep this information available for the future. It seems to be a lesson learned the hard way that we often want both the ledger digest and the ledger sequence together. In fact, I added a type in my FLR2 branch just for representing the pair, which I call LedgerIdentifier
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That ledger sequence has a default value of 0. Do you know if there are any places that are using the default, or passing 0 in explicitly? If there aren't any, then I'm fine with leaving this as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I figured.
@ximinez asked...
In part because the code is particularly stale. Mickey gave me a core dump on the state of shards in November of 2021. No one has done any development (at all) on shards since then. That's a good 2.5 years worth of stale. Mickey was hoping that I would pick up development of shards. But I didn't. Ripple had other plans for what I needed to work on. Also, Ripple ended up throwing it weight into clio, which was described as an alternative to shards as a solution for the full history problem. If you believe that clio solves the full history problem, then shards are a lot less useful. Another reason to not finish Shards is because Mickey was just discovering some additional difficult parts of shards. The code is not bad code, but the feature is not actually done. There's no one with a vision for how the feature should be completed. I hope that history helps. |
Based on my understanding of the old design, I have reservations. I imagine that activity among ledger objects follows a power law distribution. Most objects are rarely touched, and a few objects represent most of the touches in any given block. If we pack new objects into a shard until it reaches a size limit, and then seal that shard and distribute it, then most objects in the shard are in the small set of frequently touched objects, but a few objects in the shard will be from the infrequently touched set. How does this affect the performance of serving particular queries I might have?
I think what we really want is consistent hashing of ledger objects, not bundles of ledger objects that we call shards. But we need consistent hashing in the context of a trustless, decentralized peer-to-peer network with high turnover. There may be good solutions in this problem space. I haven't seen them, but I haven't done any research. Then, there are my engineering sentiments. I don't like the proliferation of branches that shards introduced. I'd rather just see a clean replacement of the nodestore abstraction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Thanks for the detailed explanations and historical info @scottschurr and @thejohnfreeman. It sets my mind at ease, and hopefully will be helpful for anyone else asking the same question.
This change removes all text related to shards, including code, tests, documentation, and configuration settings.
99% of the change is removed lines and all remaining tests pass. Only 3 tests are removed:
nodestore/DatabaseShard_test.cpp
rpc/NodeToShardRPC_test.cpp
rpc/ShardArchiveHandler_test.cpp
Only 2 tests are modified:
nodestore/Database_test.cpp
: Remove tests ofDatabase::ledgersPerShard()
.rpc/RPCCall_test.cpp
: Remove tests ofdownload_shard
andnode_to_shard
.