-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refactor the export traversal order as pre-order #662
Conversation
That's why I don't like the choice of using path as nonce, I don't see the benefits yet, but I see the troubles 😂 |
I can see some significant advantages of the |
can you elaborate on this?
I think that'd some insignificant save if there's any at all:
|
OK, for example when commit the branch, we are trying to assign the path for new nodes with iterating the tree. we can do this iterating and assigning parallelly, there would be more use cases
it's not true, now we have
|
again, iteration and assigning nonce is a trivial part during committing branch, especially with sequential nonce assignment, computing hash is the heavy one, if it can parallelize hash computation, that'd be very useful.
node key is just a tuple
I assume we do variable length integer encoding here, so larger integers take more space on average. I can also point out some advantages of sequential nonce compared with path:
|
right, I think we can parallelize hash calc with assigning the path (sequence id is impossible) regarding (version, nonce), I think there would be a problem in encoding/decoding of nodes if we use empty methods for version anyhow, even using the sequence id as a nonce, post-order would be a problem in export/import. and I think it is not a good place for this topic. |
I am just thinking pre-order is better than post-order at least in our iavl, I can't find the reason why use post-order |
my biggest concern is actually consensus breaking, without using path as nonce, the new node key format is not a consensus breaking change, that'd make it much more easier to rollout to the network node to node asynchronously. |
in post-order, we just assign nonce in post order, technically the nonce only need to be kept unique within the version, right? |
what is your plan to migrate to new version? my idea is we can restrict the storage using export/import, there is a no way of soft landing |
I'm striving for a non-consensus breaking version that just do storage optimization, the first step is versiondb running alongside with existing iavl tree, the second step should be optimize iavl tree itself, I think there are lots of potential already before introducing breaking stuff. |
I think parallel hash computation is a very interesting topic, the difficult part is most of the time we create branches rather than full sub-trees. A naïve implementation could easily end up slower than a sequential one, because the potential for parallel is low. But I don't see why we can't do that with sequential nonce, we can do sequential iteration while distributing the task of hash. |
@yihuang , I just remembered why the |
May I know what's the plan to rebuild the path even with pre-order export? the path need to be the path in the version node created in, not the exported version, right? |
I plan to build the current node path based on the parent one even it is inherited from the different version. |
Isn't that breaks assumption of path design? |
yeah, it could not be exactly the same, but it doesn't affect any logic of adr. |
@yihuang , how about adding a flag to denote if this export is post-order or pre-order? then it would not be a consensus breaking, right? |
there's format version field in state sync snapshot, will the new node key format support both format? if that's the case, then it don't breaks anything. |
I have no exact idea how to interact within cosmos-sdk (with snapshot format), but that's true, the iavl will provide both post-order, pre-order, how about this @tac0turtle ? |
@yihuang , done please review again. |
We will update store, when this version is released in alpha/beta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of this pr is the new node key refactor works with both post and pre order import, if so do we need both? I could be lacking the understanding the need for both if the node key refactor will be merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM left one request for a godoc, after that lets merge this
@kocubinski @yihuang |
dismissing the approval as the implementation landed
immutable_tree.go
Outdated
@@ -155,8 +155,8 @@ func (t *ImmutableTree) Hash() ([]byte, error) { | |||
|
|||
// Export returns an iterator that exports tree nodes as ExportNodes. These nodes can be | |||
// imported with MutableTree.Import() to recreate an identical tree. | |||
func (t *ImmutableTree) Export() (*Exporter, error) { | |||
return newExporter(t) | |||
func (t *ImmutableTree) Export(traverseOrder OrderType) (*Exporter, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need an API breaking change here? Maybe adding a new method ExportPreOrder
would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds great
import.go
Outdated
@@ -63,7 +102,7 @@ func (i *Importer) Close() { | |||
} | |||
|
|||
// Add adds an ExportNode to the import. ExportNodes must be added in the order returned by | |||
// Exporter, i.e. depth-first post-order (LRN). Nodes are periodically flushed to the database, | |||
// Exporter, i.e. depth-first pre-order (NLR). Nodes are periodically flushed to the database, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a bit misleading to me. We could have post- or pre-ordered nodes. Let's be explicit that the caller must choose to import in the same order as was exported.
node.leftHash = node.leftNode.hash | ||
node.rightNode = i.stack[stackSize-1] | ||
node.rightHash = node.rightNode.hash | ||
case stackSize >= 1 && i.stack[stackSize-1].subtreeHeight < node.subtreeHeight: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this branch go? If post-order export didn't change why are we now handling import differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is related to #656, the inner nodes always have two children
@@ -83,82 +122,57 @@ func (i *Importer) Add(exportNode *ExportNode) error { | |||
version: exportNode.Version, | |||
subtreeHeight: exportNode.Height, | |||
} | |||
|
|||
if node.subtreeHeight == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if node.subtreeHeight == 0 { | |
// set leaf nodes subtree size = 1 | |
if node.subtreeHeight == 0 { |
@@ -321,7 +321,7 @@ func (node *Node) validate() error { | |||
if node.value != nil { | |||
return errors.New("value must be nil for non-leaf node") | |||
} | |||
if node.leftHash == nil && node.rightHash == nil { | |||
if node.leftHash == nil || node.rightHash == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this change, it's more restrictive than previous, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type OrderType int | ||
|
||
// OrderTraverse is the type of traversal order to use when exporting and importing. | ||
// PreOrder is needed for the new node-key refactoring. The default is PostOrder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought new node-key refactoring works with both orderings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, the pre-order is needed for node-key refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly confused now. This pr adds support for both pre and post order, but node key refactor require pre-order. If a node exports using post-order then we cant import it into the new version correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's why provides both orders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even it is the original version, we can request a pre-order snapshot, then import it into the new version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason post-order is chosen in the first place is the node hash is updated in a post-order way, you need to update the children first to update the parent node, will pre-order import need more temporary memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, literally post-order more makes sense, both way requires stack to keep the current path, you are right pre-order will require 2 times memory, but the stack length is at most the height of the tree, it is so trivial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tac0turtle , the old version can use any order, the new version should use pre-order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the stack length is at most the height of the tree, it is so trivial
yeah, that should be trivial then, if this is indeed necessary, I think the chains can do a coordinated upgrade in advance to switch to pre-order snapshots, before doing node key format migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that should be trivial then, if this is indeed necessary, I think the chains can do a coordinated upgrade in advance to switch to pre-order snapshots, before doing node key format migration.
I think this is what we should discuss, tbh I have no clear idea which way is more efficient
In this case, we can at least use a global increasing unique nonce, similar to how you have to use the path in current version (instead of node creation version). |
It might lead to several problems in the implementation, |
A continuously increasing number is at least smaller than the path which is also a unique integer between the nodes, right? |
But in practice, there are only millions or tens of millions of versions, a continuous array of 10millions of |
That makes sense, it will be at most hundreds MG, I will implement version + local nonce, let's do some benchmarks |
closing this as we dont need pre-order reconstruction. We discussed this yesterday on the storage working group call |
ref: #608, #656
We are using post-order traversal in the
Export
, but it violates adr-001, not being able to make a path fromExportNode
.It refactored
Export/Import
to provide both traversal orderspre-order
andpost-order