feat: refactor the export traversal order as pre-order #662

cool-develope · 2023-01-17T20:54:26Z

We are using post-order traversal in the Export, but it violates adr-001, not being able to make a path from ExportNode.
It refactored Export/Import to provide both traversal orders pre-order and post-order

cool-develope · 2023-01-17T20:57:29Z

@yihuang , sorry I referred your PR (#656), there is no way to contribute to your PR.
Please review it.

yihuang · 2023-01-18T00:30:38Z

That's why I don't like the choice of using path as nonce, I don't see the benefits yet, but I see the troubles 😂

cool-develope · 2023-01-18T17:32:51Z

That's why I don't like the choice of using path as nonce, I don't see the benefits yet, but I see the troubles 😂

I can see some significant advantages of the path, one thing is parallel restricting of the tree because assigning path of left child tree and right child tree would be independent.
The other thing is we can skip child node keys for the same version. I believe it will reduce the entire storage.
Those are the advantages of pre-order iterating.

yihuang · 2023-01-18T17:53:08Z

That's why I don't like the choice of using path as nonce, I don't see the benefits yet, but I see the troubles 😂

I can see some significant advantages of the path, one thing is parallel restricting of the tree because assigning path of left child tree and right child tree would be independent.

can you elaborate on this?

The other thing is we can skip child node keys for the same version. I believe it will reduce the entire storage. Those are the advantages of pre-order iterating.

I think that'd some insignificant save if there's any at all:

the version field can always be saved if you like, no matter what nonce strategy used.
using path as nonce make the nonce bigger than a simply sequential one, which means more bytes under variable-length integer encoding.
for most of the nodes only one of the children's key is saved.
the saved bytes (i'd say at most 2 bytes on average if any at all based on above reasoning) is insignificant compared to the hash field(32bytes) and key field(usually dozens of bytes).

cool-develope · 2023-01-18T18:03:35Z

can you elaborate on this?

OK, for example when commit the branch, we are trying to assign the path for new nodes with iterating the tree. we can do this iterating and assigning parallelly, there would be more use cases

the version field can always be saved if you like, no matter what nonce strategy used.

it's not true, now we have leftNodeKey and rightNodeKey instead of leftHash and rightHash, we will save the whole child node key

using path as nonce make the nonce bigger than a simply sequential one, which means more bytes under variable-length integer encoding.

version is uint64 and it requires 8bytes, I think 8bytes is enough for path, 8bytes = 64bit (64 height in tree)

yihuang · 2023-01-18T18:08:19Z

can you elaborate on this?

OK, for example when commit the branch, we are trying to assign the path for new nodes with iterating the tree. we can do this iterating and assigning parallelly, there would be more use cases

again, iteration and assigning nonce is a trivial part during committing branch, especially with sequential nonce assignment, computing hash is the heavy one, if it can parallelize hash computation, that'd be very useful.

the version field can always be saved if you like, no matter what nonce strategy used.

it's not true, now we have leftNodeKey and rightNodeKey instead of leftHash and rightHash, we will save the whole child node key

node key is just a tuple (version, nonce), right? I mean compared with other nonce assignment strategy, no matter what nonce strategy we choose, the version field can always saved, decode a special empty value as the same version number as the parent node.

using path as nonce make the nonce bigger than a simply sequential one, which means more bytes under variable-length integer encoding.

version is uint64 and it requires 8bytes, I think 8bytes is enough for path, 8bytes = 64bit (64 height in tree)

I assume we do variable length integer encoding here, so larger integers take more space on average.

I can also point out some advantages of sequential nonce compared with path:

the nodes in a version can stored in a continuous array: version -> [node, node, ...], the nonce is used as the array index.
works with any traversal order, because the exact nonce assignment is only a local decision, not a consensus in network.
it's just so much simpler.

cool-develope · 2023-01-18T18:18:25Z

again, iteration and assigning nonce is a trivial part during committing branch, computing hash is the heavy one, if it can parallelize hash computation, that'd be very useful.

right, I think we can parallelize hash calc with assigning the path (sequence id is impossible)

regarding (version, nonce), I think there would be a problem in encoding/decoding of nodes if we use empty methods for version

anyhow, even using the sequence id as a nonce, post-order would be a problem in export/import. and I think it is not a good place for this topic.

cool-develope · 2023-01-18T18:24:24Z

I am just thinking pre-order is better than post-order at least in our iavl, I can't find the reason why use post-order

yihuang · 2023-01-18T18:43:32Z

I am just thinking pre-order is better than post-order at least in our iavl, I can't find the reason why use post-order

my biggest concern is actually consensus breaking, without using path as nonce, the new node key format is not a consensus breaking change, that'd make it much more easier to rollout to the network node to node asynchronously.

yihuang · 2023-01-18T18:45:07Z

anyhow, even using the sequence id as a nonce, post-order would be a problem in export/import. and I think it is not a good place for this topic.

in post-order, we just assign nonce in post order, technically the nonce only need to be kept unique within the version, right?

cool-develope · 2023-01-18T18:47:30Z

I am just thinking pre-order is better than post-order at least in our iavl, I can't find the reason why use post-order

my biggest concern is actually consensus breaking, without using path as nonce, the new node key format is not a consensus breaking change, that'd make it much more easier to rollout to the network node to node asynchronously.

what is your plan to migrate to new version?

my idea is we can restrict the storage using export/import, there is a no way of soft landing

yihuang · 2023-01-19T00:19:04Z

I am just thinking pre-order is better than post-order at least in our iavl, I can't find the reason why use post-order

my biggest concern is actually consensus breaking, without using path as nonce, the new node key format is not a consensus breaking change, that'd make it much more easier to rollout to the network node to node asynchronously.

what is your plan to migrate to new version?

my idea is we can restrict the storage using export/import, there is a no way of soft landing

I'm striving for a non-consensus breaking version that just do storage optimization, the first step is versiondb running alongside with existing iavl tree, the second step should be optimize iavl tree itself, I think there are lots of potential already before introducing breaking stuff.

yihuang · 2023-01-19T00:35:50Z

right, I think we can parallelize hash calc with assigning the path (sequence id is impossible)

I think parallel hash computation is a very interesting topic, the difficult part is most of the time we create branches rather than full sub-trees. A naïve implementation could easily end up slower than a sequential one, because the potential for parallel is low. But I don't see why we can't do that with sequential nonce, we can do sequential iteration while distributing the task of hash.
I'll research parallel hash computation more, it'll be useful in our change set verification step.

cool-develope · 2023-01-19T13:10:56Z

@yihuang
OK, I will close this PR after reflecting the idea in the node-key refactoring branch, let's discuss this further in the next storage meeting.
BTW, could you review #646 ? It's not a consensus-breaking.

cool-develope · 2023-01-23T17:55:56Z

@yihuang , I just remembered why the version + nonce is not working in the Import.
There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

yihuang · 2023-01-23T18:03:24Z

@yihuang , I just remembered why the version + nonce is not working in the Import. There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

May I know what's the plan to rebuild the path even with pre-order export? the path need to be the path in the version node created in, not the exported version, right?

cool-develope · 2023-01-23T18:31:00Z

May I know what's the plan to rebuild the path even with pre-order export? the path need to be the path in the version node created in, not the exported version, right?

I plan to build the current node path based on the parent one even it is inherited from the different version.
Of course, we can use the global unique nonce using big.Int but it would be expensive

yihuang · 2023-01-24T05:19:17Z

May I know what's the plan to rebuild the path even with pre-order export? the path need to be the path in the version node created in, not the exported version, right?

I plan to build the current node path based on the parent one even it is inherited from the different version.

Isn't that breaks assumption of path design?

cool-develope · 2023-01-24T13:31:23Z

Isn't that breaks assumption of path design?

yeah, it could not be exactly the same, but it doesn't affect any logic of adr.

cool-develope · 2023-01-26T13:32:56Z

@yihuang , how about adding a flag to denote if this export is post-order or pre-order? then it would not be a consensus breaking, right?

yihuang · 2023-01-26T14:36:25Z

@yihuang , how about adding a flag to denote if this export is post-order or pre-order? then it would not be a consensus breaking, right?

there's format version field in state sync snapshot, will the new node key format support both format? if that's the case, then it don't breaks anything.

cool-develope · 2023-01-26T14:53:04Z

@yihuang , how about adding a flag to denote if this export is post-order or pre-order? then it would not be a consensus breaking, right?

there's format version field in state sync snapshot, will the new node key format support both format? if that's the case, then it don't breaks anything.

I have no exact idea how to interact within cosmos-sdk (with snapshot format), but that's true, the iavl will provide both post-order, pre-order, how about this @tac0turtle ?

cool-develope · 2023-01-26T20:11:08Z

@yihuang , done please review again.
@tac0turtle , do we need to update the cosmos/store?

tac0turtle · 2023-01-26T20:55:24Z

We will update store, when this version is released in alpha/beta.

tac0turtle

My understanding of this pr is the new node key refactor works with both post and pre order import, if so do we need both? I could be lacking the understanding the need for both if the node key refactor will be merged

export.go

tac0turtle

LGTM left one request for a godoc, after that lets merge this

cool-develope · 2023-01-30T13:20:36Z

@kocubinski @yihuang
I updated it to provide both pre-order and post-order, please review it!

dismissing the approval as the implementation landed

kocubinski · 2023-01-31T15:04:21Z

immutable_tree.go

@@ -155,8 +155,8 @@ func (t *ImmutableTree) Hash() ([]byte, error) {

 // Export returns an iterator that exports tree nodes as ExportNodes. These nodes can be
 // imported with MutableTree.Import() to recreate an identical tree.
-func (t *ImmutableTree) Export() (*Exporter, error) {
-	return newExporter(t)
+func (t *ImmutableTree) Export(traverseOrder OrderType) (*Exporter, error) {


Do we really need an API breaking change here? Maybe adding a new method ExportPreOrder would be better.

that sounds great

mutable_tree.go

kocubinski · 2023-01-31T15:23:35Z

import.go

@@ -63,7 +102,7 @@ func (i *Importer) Close() {
 }

 // Add adds an ExportNode to the import. ExportNodes must be added in the order returned by
-// Exporter, i.e. depth-first post-order (LRN). Nodes are periodically flushed to the database,
+// Exporter, i.e. depth-first pre-order (NLR). Nodes are periodically flushed to the database,


This comment is a bit misleading to me. We could have post- or pre-ordered nodes. Let's be explicit that the caller must choose to import in the same order as was exported.

kocubinski · 2023-01-31T15:46:38Z

import.go

-		node.leftHash = node.leftNode.hash
-		node.rightNode = i.stack[stackSize-1]
-		node.rightHash = node.rightNode.hash
-	case stackSize >= 1 && i.stack[stackSize-1].subtreeHeight < node.subtreeHeight:


Where did this branch go? If post-order export didn't change why are we now handling import differently?

it is related to #656, the inner nodes always have two children

kocubinski · 2023-01-31T16:40:25Z

import.go

@@ -83,82 +122,57 @@ func (i *Importer) Add(exportNode *ExportNode) error {
 		version:       exportNode.Version,
 		subtreeHeight: exportNode.Height,
 	}
-
+	if node.subtreeHeight == 0 {


Suggested change

if node.subtreeHeight == 0 {

// set leaf nodes subtree size = 1

if node.subtreeHeight == 0 {

kocubinski · 2023-01-31T16:52:52Z

node.go

@@ -321,7 +321,7 @@ func (node *Node) validate() error {
 		if node.value != nil {
 			return errors.New("value must be nil for non-leaf node")
 		}
-		if node.leftHash == nil && node.rightHash == nil {
+		if node.leftHash == nil || node.rightHash == nil {


Could you explain this change, it's more restrictive than previous, right?

yihuang · 2023-01-31T17:17:18Z

export.go

+type OrderType int
+
+// OrderTraverse is the type of traversal order to use when exporting and importing.
+// PreOrder is needed for the new node-key refactoring. The default is PostOrder.


I thought new node-key refactoring works with both orderings?

no, the pre-order is needed for node-key refactoring

slightly confused now. This pr adds support for both pre and post order, but node key refactor require pre-order. If a node exports using post-order then we cant import it into the new version correct?

yes, that's why provides both orders

even it is the original version, we can request a pre-order snapshot, then import it into the new version

I think the reason post-order is chosen in the first place is the node hash is updated in a post-order way, you need to update the children first to update the parent node, will pre-order import need more temporary memory?

yeah, literally post-order more makes sense, both way requires stack to keep the current path, you are right pre-order will require 2 times memory, but the stack length is at most the height of the tree, it is so trivial

@tac0turtle , the old version can use any order, the new version should use pre-order.

but the stack length is at most the height of the tree, it is so trivial

yeah, that should be trivial then, if this is indeed necessary, I think the chains can do a coordinated upgrade in advance to switch to pre-order snapshots, before doing node key format migration.

yeah, that should be trivial then, if this is indeed necessary, I think the chains can do a coordinated upgrade in advance to switch to pre-order snapshots, before doing node key format migration.

I think this is what we should discuss, tbh I have no clear idea which way is more efficient

cool-develope · 2023-02-01T16:32:40Z

@yihuang #662 (comment)

yihuang · 2023-02-01T16:49:24Z

@yihuang , I just remembered why the version + nonce is not working in the Import. There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

In this case, we can at least use a global increasing unique nonce, similar to how you have to use the path in current version (instead of node creation version).

cool-develope · 2023-02-01T16:53:51Z

@yihuang , I just remembered why the version + nonce is not working in the Import. There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

In this case, we can at least use a global increasing unique nonce, similar to how you have to use the path in current version (instead of node creation version).

It might lead to several problems in the implementation, int32 is enough in the tree operation, but it will be int64 or big.Int in export/import

yihuang · 2023-02-01T16:56:00Z

@yihuang , I just remembered why the version + nonce is not working in the Import. There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

In this case, we can at least use a global increasing unique nonce, similar to how you have to use the path in current version (instead of node creation version).

It might lead to several problems in the implementation, int32 is enough in the tree operation, but it will be int64 or big.Int in export/import

A continuously increasing number is at least smaller than the path which is also a unique integer between the nodes, right?

yihuang · 2023-02-01T17:03:15Z

There is no way to get the current sequence integer of the given version, you can't keep this status in memory (like map[int64]int32) because there could be massive versions (int64)

But in practice, there are only millions or tens of millions of versions, a continuous array of 10millions of int32s is only: (10000000 * 4) / 1024/1024 = 38.15 megabytes, should be practical I think.

cool-develope · 2023-02-01T17:27:00Z

But in practice, there are only millions or tens of millions of versions, a continuous array of 10millions of int32s is only: (10000000 * 4) / 1024/1024 = 38.15 megabytes, should be practical I think.

That makes sense, it will be at most hundreds MG, I will implement version + local nonce, let's do some benchmarks

tac0turtle · 2023-02-16T01:30:46Z

closing this as we dont need pre-order reconstruction. We discussed this yesterday on the storage working group call

refactor the export traversal order as pre-order

be26364

cool-develope requested a review from a team as a code owner January 17, 2023 20:54

changelog

58fab7e

kocubinski self-assigned this Jan 19, 2023

cool-develope mentioned this pull request Jan 25, 2023

feat: refactor the node key as version + path #650

Closed

cool-develope and others added 2 commits January 26, 2023 15:08

add flag

18f4491

Merge branch 'master' into 592/export_preorder

9005aa0

cool-develope and others added 2 commits January 26, 2023 15:12

Update CHANGELOG.md

7ac76aa

small fix

58bbcb0

tac0turtle reviewed Jan 26, 2023

View reviewed changes

export.go Show resolved Hide resolved

tac0turtle previously approved these changes Jan 27, 2023

View reviewed changes

kocubinski reviewed Jan 31, 2023

View reviewed changes

mutable_tree.go Outdated Show resolved Hide resolved

godoc

0a4235e

cool-develope requested review from kocubinski, yihuang and tac0turtle and removed request for yihuang January 31, 2023 15:36

Merge branch 'master' into 592/export_preorder

ec49c25

kocubinski reviewed Jan 31, 2023

View reviewed changes

yihuang reviewed Jan 31, 2023

View reviewed changes

cool-develope and others added 2 commits January 31, 2023 15:45

comments

82e1ae1

Merge branch 'master' into 592/export_preorder

ee7fd11

tac0turtle closed this Feb 16, 2023

	if node.subtreeHeight == 0 {
	// set leaf nodes subtree size = 1
	if node.subtreeHeight == 0 {

feat: refactor the export traversal order as pre-order #662

feat: refactor the export traversal order as pre-order #662

Conversation

cool-develope commented Jan 17, 2023 • edited Loading

cool-develope commented Jan 17, 2023

yihuang commented Jan 18, 2023 • edited Loading

cool-develope commented Jan 18, 2023 • edited Loading

yihuang commented Jan 18, 2023 • edited Loading

cool-develope commented Jan 18, 2023

yihuang commented Jan 18, 2023 • edited Loading

cool-develope commented Jan 18, 2023 • edited Loading

cool-develope commented Jan 18, 2023

yihuang commented Jan 18, 2023

yihuang commented Jan 18, 2023

cool-develope commented Jan 18, 2023 • edited Loading

yihuang commented Jan 19, 2023

yihuang commented Jan 19, 2023

cool-develope commented Jan 19, 2023 • edited Loading

cool-develope commented Jan 23, 2023 • edited Loading

yihuang commented Jan 23, 2023

cool-develope commented Jan 23, 2023 • edited Loading

yihuang commented Jan 24, 2023

cool-develope commented Jan 24, 2023

cool-develope commented Jan 26, 2023

yihuang commented Jan 26, 2023 • edited Loading

cool-develope commented Jan 26, 2023

cool-develope commented Jan 26, 2023

tac0turtle commented Jan 26, 2023

tac0turtle left a comment

Choose a reason for hiding this comment

tac0turtle left a comment

Choose a reason for hiding this comment

cool-develope commented Jan 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cool-develope commented Feb 1, 2023

yihuang commented Feb 1, 2023

cool-develope commented Feb 1, 2023

yihuang commented Feb 1, 2023

yihuang commented Feb 1, 2023 • edited Loading

cool-develope commented Feb 1, 2023

tac0turtle commented Feb 16, 2023

cool-develope commented Jan 17, 2023 •

edited

Loading

yihuang commented Jan 18, 2023 •

edited

Loading

cool-develope commented Jan 18, 2023 •

edited

Loading

yihuang commented Jan 18, 2023 •

edited

Loading

yihuang commented Jan 18, 2023 •

edited

Loading

cool-develope commented Jan 18, 2023 •

edited

Loading

cool-develope commented Jan 18, 2023 •

edited

Loading

cool-develope commented Jan 19, 2023 •

edited

Loading

cool-develope commented Jan 23, 2023 •

edited

Loading

cool-develope commented Jan 23, 2023 •

edited

Loading

yihuang commented Jan 26, 2023 •

edited

Loading

yihuang commented Feb 1, 2023 •

edited

Loading