-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: delete orphan nodes by traversing trees #641
Conversation
@yihuang , this PR looks like it conflicted with my work. And your approach is not working on the current iavl structure. I think it is possible under the assumption of keeping always the range versions, right? |
it do works on current iavl structure, I think it's agnostic to node key format, I'm able to do round-trip test on our testnet production db using the python implementation. |
for example, we are keeping the versions of 3, 5, 7, 9. |
yes, by diff 5 and 7, delete orphaned nodes whose version > 3. |
I think it doesn't work properly, for example we have version 2 node in version 5, this node is removed in version 7, so it become orphaned, when will remove this node (version 2)? |
yeah, you have a point, if the version 2 is already deleted. To make it work, we need to check if the version of orphaned nodes still exists, which will further slow down the process. but it's not an issue in the default mode that loads all the versions into memory on startup. |
This is why the current orphans keep |
please refer cosmos/cosmos-sdk#12989 |
What do you think if we delete the orphaned nodes whose version is removed, maybe not so bad if the versions are cached in memory. |
not sure for me, we already had same issues when load all versions in the archive node, so I am suggesting keep only the first and last version instead of versions map, for pruning it will provide only can delete the first version. Like this way we can delete the oldest versions one by one |
anyhow, I am in preparing of the separated PR to remove orphans, I am worried we are on same topic |
delete the first version is a special case in this PR, the previous version will be 0, and it'll simply delete all orphaned nodes. |
tbh, the lazy mode is unclear to me. When will trigger the lazy load or function? |
Whenever you need the root information of a particular version, get from cache or load from db? |
After multiple failures to reproduce the issue, I realize the issue don't exists, the reasoning is like this:
|
if l.height <= 0 { | ||
panic("already at leaf layer") | ||
} | ||
nodes := make([]*Node, 0, len(l.nodes)*2+len(l.pendingNodes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if need this allocate since it's assign nodes = ...
later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean nodes = append(nodes, ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea seems optional, just not sure why keep capacity when assign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea seems optional, just not sure why keep capacity when assign
capacity is to reduce the number of reallocations during the append
.
Sorry, I don't know about that, I thought you are working for solutions for the new node key format. |
Never mind, I'd like to refactor the diff algorithm. // Traverse the subtree with a given node as the root.
func (ndb *nodeDB) traverseTree(hash []byte, fn func(node *Node) (bool, error)) error {
if len(hash) == 0 {
return nil
}
node, err := ndb.GetNode(hash)
if err != nil {
return err
}
stop, err := fn(node)
if err != nil || stop {
return err
}
if node.leftHash != nil {
if err := ndb.traverseTree(node.leftHash, fn); err != nil {
return err
}
}
if node.rightHash != nil {
if err := ndb.traverseTree(node.rightHash, fn); err != nil {
return err
}
}
return nil
}
func (ndb *nodeDB) deleteOrphans(version int64) ([][]byte, error) {
nRoot, err := ndb.getRoot(version + 1)
if err != nil {
return nil, err
}
originalNodes := make([]*Node, 0)
if err := ndb.traverseTree(nRoot, func(node *Node) (bool, error) {
if node.version > version {
return false, nil
}
originalNodes = append(originalNodes, node)
return true, nil
}); err != nil {
return nil, err
}
cRoot, err := ndb.getRoot(version)
if err != nil {
return nil, err
}
index := 0
if err := ndb.traverseTree(cRoot, func(node *Node) (bool, error) {
if index < len(originalNodes) && bytes.Equal(node.hash, originalNodes[index].hash) {
index++
return true, nil
}
if err = ndb.batch.Delete(ndb.nodeKey(node.hash)); err != nil {
return true, err
}
return false, nil
}); err != nil {
return nil, err
}
return orphans, err
} Loading all orphans in memory is so expensive, and also your implementation looks complicated. |
FYI, removing orphans will impact node key refactoring, this is why I am interested in your PR. |
My algorithm is mainly try to skip the common subtrees early on, so we only need to traverse the branches contains actual differences. |
- removed the orphan bookkeepings
- removed the orphan bookkeepings
To help understanding the algorithm, i created some visualizations: The example iavl tree, with versions 4 and 5, the nodes are labaled as The pruning graph demonstrate the result of deleting version 4, it contains all the nodes loaded by the algorithm, the deleted nodes have dotted lines, you can notice there are two nodes coming from version 2 and 3 which are not deleted: Another one done on our production testnet db: |
curious to hear if there is a performance improvement with this over what exists? |
The pruning operation is definitely slower than current one, which just iterate the maintained orphan records, the new methods need to load some nodes to partially traverse the tree.
|
can you provide benchmarks on the slow down. I think there is some margin of slow down for a trade off that could be accepted and with fast node system it shouldn't make a difference |
sure, I'll do them next week, the basic intuition is it depends by a lot on the node cache, if we are pruning a recent version whose orphaned nodes are still hot in the node cache, then it'll be pretty fast. |
@tac0turtle @cool-develope I'm closing this PR for following reasons:
|
Don't need the orphan bookkeepings.
Consequences
Positives
Negatives
O(N)
whereN
is the size of orphaned nodes, but the constant factor is heavier, the new approach need to load the nodes, while the old approach just iterate orphan records and delete nodes by hashes.Alternative
If it's considered too controversial to downgrade performance of online pruning, an alternative is to just provide an option to not storing the orphan records for archive nodes, who can always do pruning offline using the algorithm described here.