Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat : Insert storage diff to DB #5

Merged
merged 9 commits into from
Nov 22, 2023

Conversation

dvcpull
Copy link
Collaborator

@dvcpull dvcpull commented Nov 13, 2023

Insert storage changes and child storage changes to DB, these values are then used to return the storage diff for blocks from storage_diff endpoint.

@dvcpull dvcpull changed the title WIP : Insert storage diff to DB feat : Insert storage diff to DB Nov 16, 2023
@dvcpull dvcpull marked this pull request as ready for review November 16, 2023 10:24
@dvcpull dvcpull requested a review from mezrin November 16, 2023 10:24
@dvcpull
Copy link
Collaborator Author

dvcpull commented Nov 16, 2023

Some test results for benchmark:

Query code:

import requests
import json

url = "http://localhost:9944/"
headers = {"Content-Type": "application/json"}

data = {
    "id": 1,
    "jsonrpc": "2.0",
    "method": "state_getStorageDiff",
    "params": ["0x89955cc93a51534104d97a6c409f31ce7a1ef345b232d111cbf1c1c0e0c76fb0", ["26aa394eea5630e07c"]]
}

# Convert the data dictionary to a JSON string
json_data = json.dumps(data)

# Make the API request
response = requests.post(url, data=json_data, headers=headers)

print("block hash", data['params'])
print("time take for reponse", response.elapsed.total_seconds())

# Check if the request was successful (status code 200)
if response.status_code == 200:
    print("modified keys", len(response.json()['result'][0]))
    print("modified child keys", len(response.json()['result'][1]))

    for item in response.json()['result'][0]:
        #print(item[0])
        print("".join(format(x, '02x') for x in item[0]))
        if item[1] != None:
            print("".join(format(x, '02x') for x in item[1]))
        else:
            print("None")
else:
    # Print an error message if the request was not successful
    print(f"Error: {response.status_code}")
    print(response.text)

With local node (without prefixes)

block hash ['0x89955cc93a51534104d97a6c409f31ce7a1ef345b232d111cbf1c1c0e0c76fb0']
time take for reponse 0.000945
modified keys 18
modified child keys 0
26aa394eea5630e07c48ae0c9558cef702a5c1b19ab7a04f536c519aca4983ac
06000000
26aa394eea5630e07c48ae0c9558cef70a98fdbe9ce6c55837576c60c7af3850
01000000
26aa394eea5630e07c48ae0c9558cef734abf5cb34d6244378cddbf18e849d96
000000000270d8c55517
26aa394eea5630e07c48ae0c9558cef780d41e5e16056765bc8461851072c9d7
04000000000000004248ed435517020000
26aa394eea5630e07c48ae0c9558cef78a42f33323cb5ced3b44dd825fda9fcc
c4e944230d977e77334a39a55a7b95be67a881aea7752b8ec11324ed2d0cfa75
26aa394eea5630e07c48ae0c9558cef799e7f93fc6a98f0874fd057f111c4d2d
040661757261206cb4e31000000000
26aa394eea5630e07c48ae0c9558cef7a44704b568d21667356a5a050c11874639b9d2792f8bd4c305000000
c4e944230d977e77334a39a55a7b95be67a881aea7752b8ec11324ed2d0cfa75
26aa394eea5630e07c48ae0c9558cef7a86da5a932684f199539836fcb8c886f
None
26aa394eea5630e07c48ae0c9558cef7bdc0bd303e9855813aa8a30d4efc5112
None
26aa394eea5630e07c48ae0c9558cef7df1daeb8986837f21cc5d17596bb78d1b4def25cfda6ef3a00000000
None
26aa394eea5630e07c48ae0c9558cef7ff553b5a9862a516939d82b3d3d8661a
None
3a65787472696e7369635f696e646578
None
3a696e747261626c6f636b5f656e74726f7079
None
3a7472616e73616374696f6e5f6c6576656c3a
None
3f1467a096bcd71a5b6a0c8155e208103f2edf3bdf381debe331ab7446addfdc
000064a7b3b6e00d0000000000000000
57f8dc2f5ab09467896f47300f04243806155b3cd9a8c9e5e9a23fd5dc13a5ed
6cb4e31000000000
f0c365c3cf59d671eb72da0e7a4113c49f1f0515f462cdcf84e0f1d6045dfcbb
44a3d4d88b010000
f0c365c3cf59d671eb72da0e7a4113c4bbd108c4899964f707fdaffb82636065
None

With local node (with prefixes)

block hash ['0x89955cc93a51534104d97a6c409f31ce7a1ef345b232d111cbf1c1c0e0c76fb0', ['26aa394eea5630e07c']]
time take for reponse 0.001004
modified keys 11
modified child keys 0
26aa394eea5630e07c48ae0c9558cef702a5c1b19ab7a04f536c519aca4983ac
06000000
26aa394eea5630e07c48ae0c9558cef70a98fdbe9ce6c55837576c60c7af3850
01000000
26aa394eea5630e07c48ae0c9558cef734abf5cb34d6244378cddbf18e849d96
000000000270d8c55517
26aa394eea5630e07c48ae0c9558cef780d41e5e16056765bc8461851072c9d7
04000000000000004248ed435517020000
26aa394eea5630e07c48ae0c9558cef78a42f33323cb5ced3b44dd825fda9fcc
c4e944230d977e77334a39a55a7b95be67a881aea7752b8ec11324ed2d0cfa75
26aa394eea5630e07c48ae0c9558cef799e7f93fc6a98f0874fd057f111c4d2d
040661757261206cb4e31000000000
26aa394eea5630e07c48ae0c9558cef7a44704b568d21667356a5a050c11874639b9d2792f8bd4c305000000
c4e944230d977e77334a39a55a7b95be67a881aea7752b8ec11324ed2d0cfa75
26aa394eea5630e07c48ae0c9558cef7a86da5a932684f199539836fcb8c886f
None
26aa394eea5630e07c48ae0c9558cef7bdc0bd303e9855813aa8a30d4efc5112
None
26aa394eea5630e07c48ae0c9558cef7df1daeb8986837f21cc5d17596bb78d1b4def25cfda6ef3a00000000
None
26aa394eea5630e07c48ae0c9558cef7ff553b5a9862a516939d82b3d3d8661a
None

With polkadot (block 971264)

block hash ['0x61f3804be06dcbadbcc79d9d8292d2ddc7c45cf6ad17b4ecb482e29b018a0153']
time take for reponse 0.002858
modified keys 35
modified child keys 0
0b76934f4cc08dee01012d059e1b83eebbd108c4899964f707fdaffb82636065
00
0b76934f4cc08dee01012d059e1b83eefdad6eef5c4b1c68eaa71ea17a02d9de
00
1cb6f36e027abb2091cfb5110ab5087f0323475657e0890fbdbf66fb24b4649e
None
.......

Copy link

@mezrin mezrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.
Made several requests for changes.

substrate/client/rpc/src/state/state_full.rs Outdated Show resolved Hide resolved
substrate/client/rpc/src/state/state_full.rs Outdated Show resolved Hide resolved
substrate/client/rpc/src/state/state_full.rs Outdated Show resolved Hide resolved
substrate/client/db/src/lib.rs Outdated Show resolved Hide resolved
@mezrin
Copy link

mezrin commented Nov 16, 2023

At the moment need to make simple manual tests - make a transfer of native tokens. Then get storage diff via RPC and see that returned data contain the changed storage elements.
Opaque storage keys make it an easy task.

later we will be able to make a test in indexer:

  • download the full state for block Z
  • download the full state for block (Z + 1000)
  • download all diffs for blocks (Z; Z + 1000]
  • rolling diffs on the first state. Compare the result with a later full state.
  • If it doesn't match, it means there's a bug somewhere.

@dvcpull
Copy link
Collaborator Author

dvcpull commented Nov 17, 2023

At the moment need to make simple manual tests - make a transfer of native tokens. Then get storage diff via RPC and see that returned data contain the changed storage elements. Opaque storage keys make it an easy task.

later we will be able to make a test in indexer:

  • download the full state for block Z
  • download the full state for block (Z + 1000)
  • download all diffs for blocks (Z; Z + 1000]
  • rolling diffs on the first state. Compare the result with a later full state.
  • If it doesn't match, it means there's a bug somewhere.

Ok, confirmed with test, below is the python code used

import requests
import json
from substrateinterface import SubstrateInterface, Keypair
from substrateinterface.exceptions import SubstrateRequestException

url = "http://localhost:9944/"
headers = {"Content-Type": "application/json"}

# init substrate interface
substrate = SubstrateInterface(
    url="ws://127.0.0.1:9944",
    ss58_format=42,
    type_registry_preset='substrate-node-template',
)

def query_storage_diff(block_hash):
    data = {
        "id": 1,
        "jsonrpc": "2.0",
        "method": "state_getStorageDiff",
        "params": [block_hash]
    }

    # Convert the data dictionary to a JSON string
    json_data = json.dumps(data)

    # Make the API request
    response = requests.post(url, data=json_data, headers=headers)

    print("block hash", data['params'])
    print("time take for reponse", response.elapsed.total_seconds())

    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        print("modified keys", len(response.json()['result'][0]))
        print("modified child keys", len(response.json()['result'][1]))

        expected_keys = [
            substrate.create_storage_key(
                "System", "Account", ["5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY"] #Alice
            ).to_hex(),
            substrate.create_storage_key(
                "System", "Account", ["5FHneW46xGXgs5mUiveU4sbTyGBzmstUspZC92UhjJM694ty"] #Bob
            ).to_hex(),
        ]
        print("Expected keys", expected_keys)

        for item in response.json()['result'][0]:
            #print(item[0])
            #print("0x" + "".join(format(x, '02x') for x in item[0]))
            hex_key = "0x" + "".join(format(x, '02x') for x in item[0])
            if hex_key in expected_keys:
                hex_value = "0x" + "".join(format(x, '02x') for x in item[1])
                print("found expected key, value is ", hex_value)
    else:
        # Print an error message if the request was not successful
        print(f"Error: {response.status_code}")
        print(response.text)


def do_transfer():
    keypair = Keypair.create_from_uri('//Alice')

    # send balance from alice to bob
    call = substrate.compose_call(
        call_module='Balances',
        call_function='transfer_keep_alive',
        call_params={
            'dest': '5FHneW46xGXgs5mUiveU4sbTyGBzmstUspZC92UhjJM694ty',
            'value': 1 * 10**15
        }
    )

    extrinsic = substrate.create_signed_extrinsic(
        call=call,
        keypair=keypair,
        era={'period': 64}
    )

    try:
        receipt = substrate.submit_extrinsic(extrinsic, wait_for_inclusion=True)

        print('Extrinsic "{}" included in block "{}"'.format(
            receipt.extrinsic_hash, receipt.block_hash
        ))

        if receipt.is_success:

            print('✅ Success, triggered events:')
            for event in receipt.triggered_events:
                print(f'* {event.value}')

            return receipt.block_hash

        else:
            print('⚠️ Extrinsic Failed: ', receipt.error_message)


    except SubstrateRequestException as e:
        print("Failed to send: {}".format(e))


## lets do a transfer onchain
block_hash = do_transfer()
query_storage_diff(block_hash)

@mezrin
Copy link

mezrin commented Nov 17, 2023

At the moment need to make simple manual tests - make a transfer of native tokens. Then get storage diff via RPC and see that returned data contain the changed storage elements. Opaque storage keys make it an easy task.
later we will be able to make a test in indexer:

  • download the full state for block Z
  • download the full state for block (Z + 1000)
  • download all diffs for blocks (Z; Z + 1000]
  • rolling diffs on the first state. Compare the result with a later full state.
  • If it doesn't match, it means there's a bug somewhere.

Ok, confirmed with test, below is the python code used

Great news!!

@mezrin
Copy link

mezrin commented Nov 17, 2023

I googled now about child tries in Substrate
And found this

https://docs.substrate.io/learn/state-transitions-and-storage/
See the section "Child trie"

https://substrate.stackexchange.com/questions/1771/how-to-create-an-iterator-for-a-child-storage-trie
https://github.com/paritytech/polkadot/blob/master/runtime/common/src/crowdloan/mod.rs
Search in the page for the word "child"

https://www.shawntabrizi.com/assets/presentations/substrate-storage-deep-dive.pdf
See the page 27

I guess this is what we have in "modified child keys".
Could you please check in code.
If so - let's do the following:

  • don't filter "modified child keys" by "include_prefixes" / "exclude_prefixes"
  • add one more param to the RPC endpoint "include_modified_child_tries". Default "false"

@mezrin
Copy link

mezrin commented Nov 17, 2023

One more point.

The node has the console launch params: --state-pruning and --blocks-pruning.
And there is code to delete the old data from the database.

Does this mechanic affect the stored state diffs?

Ideally we need a separate param --state-diff-pruning.
If it requires too much code - the param --state-pruning should regulate the pruning of state diffs.

@dvcpull
Copy link
Collaborator Author

dvcpull commented Nov 17, 2023

One more point.

The node has the console launch params: --state-pruning and --blocks-pruning. And there is code to delete the old data from the database.

Does this mechanic affect the stored state diffs?

Ideally we need a separate param --state-diff-pruning. If it requires too much code - the param --state-pruning should regulate the pruning of state diffs.

Yes, our logic will only work when the node is started with --pruning archive aka running a full node. Is that ok?

@mezrin
Copy link

mezrin commented Nov 17, 2023

One more point.
The node has the console launch params: --state-pruning and --blocks-pruning. And there is code to delete the old data from the database.
Does this mechanic affect the stored state diffs?
Ideally we need a separate param --state-diff-pruning. If it requires too much code - the param --state-pruning should regulate the pruning of state diffs.

Yes, our logic will only work when the node is started with --pruning archive aka running a full node. Is that ok?

The last version of node has two params.
Extras from the app' help text:

--state-pruning <PRUNING_MODE>
Specify the state pruning mode. This mode specifies when the block's state (ie, storage) should be pruned (ie, removed) from the database. This setting can only be set on the first creation of the database. Every subsequent run will load the pruning mode from the database and will error if the stored mode doesn't match this CLI value. It is fine to drop this CLI flag for subsequent runs. Possible values: - archive: Keep the state of all blocks. - 'archive-canonical' Keep only the state of finalized blocks. - number Keep the state of the last number of finalized blocks. [default: 256]

--blocks-pruning <PRUNING_MODE>
Specify the blocks pruning mode. This mode specifies when the block's body (including justifications) should be pruned (ie, removed) from the database. Possible values: - 'archive' Keep all blocks. - 'archive-canonical' Keep only finalized blocks. - number Keep the last number of finalized blocks
[default: archive-canonical]

I.e. --pruning was split to two params.

For the indexer' nodes we will use --state-pruning 1024 --blocks-pruning archive-canonical.
Indexer will provide via RESTAPI only the state of last ~1000 blocks - so we don't need the full history of state diffs.
Also we can't use --state-pruning archive-canonical because it will significantly increase the size of the node' database.

Ideally state diffs should follow the same param as the state itself - --state-pruning.
I.e. deleted for blocks older than 1000.
Can we do this?

@dvcpull
Copy link
Collaborator Author

dvcpull commented Nov 20, 2023

One more point.
The node has the console launch params: --state-pruning and --blocks-pruning. And there is code to delete the old data from the database.
Does this mechanic affect the stored state diffs?
Ideally we need a separate param --state-diff-pruning. If it requires too much code - the param --state-pruning should regulate the pruning of state diffs.

Yes, our logic will only work when the node is started with --pruning archive aka running a full node. Is that ok?

The last version of node has two params. Extras from the app' help text:

--state-pruning <PRUNING_MODE>
Specify the state pruning mode. This mode specifies when the block's state (ie, storage) should be pruned (ie, removed) from the database. This setting can only be set on the first creation of the database. Every subsequent run will load the pruning mode from the database and will error if the stored mode doesn't match this CLI value. It is fine to drop this CLI flag for subsequent runs. Possible values: - archive: Keep the state of all blocks. - 'archive-canonical' Keep only the state of finalized blocks. - number Keep the state of the last number of finalized blocks. [default: 256]

--blocks-pruning <PRUNING_MODE>
Specify the blocks pruning mode. This mode specifies when the block's body (including justifications) should be pruned (ie, removed) from the database. Possible values: - 'archive' Keep all blocks. - 'archive-canonical' Keep only finalized blocks. - number Keep the last number of finalized blocks
[default: archive-canonical]

I.e. --pruning was split to two params.

For the indexer' nodes we will use --state-pruning 1024 --blocks-pruning archive-canonical. Indexer will provide via RESTAPI only the state of last ~1000 blocks - so we don't need the full history of state diffs. Also we can't use --state-pruning archive-canonical because it will significantly increase the size of the node' database.

Ideally state diffs should follow the same param as the state itself - --state-pruning. I.e. deleted for blocks older than 1000. Can we do this?

Can confirm the node works as expected when using the flags --state-pruning 1024 --blocks-pruning archive-canonical, querying a block older than 1024 returns {"code":-32000,"message":"Client error: UnknownBlock: State already discarded for 0x891a56cbdfec4a24306cf2ed5a3fdbc2dfcd627d8b46a43ac3c3bab03fa6a9c0"},"id":1}

@dvcpull
Copy link
Collaborator Author

dvcpull commented Nov 20, 2023

I googled now about child tries in Substrate And found this

https://docs.substrate.io/learn/state-transitions-and-storage/ See the section "Child trie"

https://substrate.stackexchange.com/questions/1771/how-to-create-an-iterator-for-a-child-storage-trie https://github.com/paritytech/polkadot/blob/master/runtime/common/src/crowdloan/mod.rs Search in the page for the word "child"

https://www.shawntabrizi.com/assets/presentations/substrate-storage-deep-dive.pdf See the page 27

I guess this is what we have in "modified child keys". Could you please check in code. If so - let's do the following:

  • don't filter "modified child keys" by "include_prefixes" / "exclude_prefixes"
  • add one more param to the RPC endpoint "include_modified_child_tries". Default "false"

After digging through code and docs I think modified child keys are only relevant if they are used by pallets, aka they do not affect how regular pallet storage works, and I could find only pallet crowdloan that uses this child tries. This video is very helpful (https://www.youtube.com/watch?v=9S8rmW8LD5o) child tries start from ~31min

@dvcpull dvcpull requested a review from mezrin November 20, 2023 22:49
Copy link

@mezrin mezrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is good!!!
One last optimisation and work is done 🎉🎉🎉

And let's update the ticket paritytech/json-rpc-interface-spec#108

substrate/client/rpc/src/state/state_full.rs Outdated Show resolved Hide resolved
@dvcpull dvcpull requested a review from mezrin November 22, 2023 06:48
@dvcpull dvcpull merged commit ff837ef into release-polkadot-v1.3.0 Nov 22, 2023
4 of 7 checks passed
@dvcpull dvcpull deleted the trait/add-storage-diff-db branch November 22, 2023 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants