Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: parse chain-id from big genesis file could be slow #18204

Merged
merged 28 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
192261b
Problem: parse chain-id from big genesis file could be slow
yihuang Oct 23, 2023
7c59903
Update CHANGELOG.md
yihuang Oct 23, 2023
ee9b00a
fix unit test
yihuang Oct 23, 2023
567ee2d
patch the library to support early abort
yihuang Oct 23, 2023
55be3ec
add benchmarking
yihuang Oct 23, 2023
9c220f5
mention encoding/json/v2
yihuang Oct 23, 2023
c74ba3b
fix error handling
yihuang Oct 23, 2023
7680cdd
fix shadow var
yihuang Oct 23, 2023
1035e32
use simpler solution with different trade off
yihuang Oct 23, 2023
4677221
Merge branch 'main' into parse-chain-id
yihuang Oct 24, 2023
be3a3f7
cleanup
yihuang Oct 24, 2023
7cd0787
skip value more efficiently
yihuang Oct 24, 2023
c84599c
Merge branch 'main' into parse-chain-id
yihuang Oct 24, 2023
df3e2a3
trim space
yihuang Oct 25, 2023
3d7ef1a
trim space
yihuang Oct 25, 2023
b09130e
review suggestions
yihuang Oct 25, 2023
d52f3b6
assert error messages
yihuang Oct 25, 2023
1ad9a8f
Update types/chain_id.go
yihuang Oct 25, 2023
63b604c
use cronos mainnet genesis for benchmark
yihuang Oct 25, 2023
a4cc48a
fix conflicts
yihuang Oct 25, 2023
7a58af1
Merge branch 'main' into parse-chain-id
yihuang Oct 25, 2023
5147d1c
fix cyclic import
yihuang Oct 25, 2023
5d99b23
Merge branch 'main' into parse-chain-id
yihuang Oct 25, 2023
ba767d9
fix lint
yihuang Oct 25, 2023
2d3ba07
move directory
yihuang Oct 26, 2023
310b580
Merge branch 'main' into parse-chain-id
yihuang Oct 26, 2023
5145bac
Update x/genutil/types/chain_id.go
yihuang Oct 26, 2023
1bb078f
Merge branch 'main' into parse-chain-id
julienrbrt Oct 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ Ref: https://keepachangelog.com/en/1.0.0/
* (rpc) [#17470](https://github.com/cosmos/cosmos-sdk/pull/17470) Avoid open 0.0.0.0 to public by default and add `listen-ip-address` argument for `testnet init-files` cmd.
* (types) [#17670](https://github.com/cosmos/cosmos-sdk/pull/17670) Use `ctx.CometInfo` in place of `ctx.VoteInfos`
* [#17733](https://github.com/cosmos/cosmos-sdk/pull/17733) Ensure `buf export` exports all proto dependencies
* [#18204](https://github.com/cosmos/cosmos-sdk/pull/18204) Use streaming json parser to parse chain-id from genesis file.

### Bug Fixes

Expand Down
9 changes: 6 additions & 3 deletions server/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ import (
sdk "github.com/cosmos/cosmos-sdk/types"
"github.com/cosmos/cosmos-sdk/types/mempool"
"github.com/cosmos/cosmos-sdk/version"
genutiltypes "github.com/cosmos/cosmos-sdk/x/genutil/types"
)

// ServerContextKey defines the context key used to retrieve a server.Context from
Expand Down Expand Up @@ -485,12 +484,16 @@ func DefaultBaseappOptions(appOpts types.AppOptions) []func(*baseapp.BaseApp) {
chainID := cast.ToString(appOpts.Get(flags.FlagChainID))
if chainID == "" {
// fallback to genesis chain-id
appGenesis, err := genutiltypes.AppGenesisFromFile(filepath.Join(homeDir, "config", "genesis.json"))
reader, err := os.Open(filepath.Join(homeDir, "config", "genesis.json"))
Dismissed Show dismissed Hide dismissed
if err != nil {
panic(err)
}
defer reader.Close()

chainID = appGenesis.ChainID
chainID, err = sdk.ParseChainIDFromGenesis(reader)
if err != nil {
panic(fmt.Errorf("failed to parse chain-id from genesis file: %w", err))
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
}
alexanderbez marked this conversation as resolved.
Show resolved Hide resolved
}

snapshotStore, err := GetSnapshotStore(appOpts)
Expand Down
70 changes: 70 additions & 0 deletions types/chain_id.go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe move this to genutil? Imho this makes more sense to have it there as its genesis related than in types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
package types

import (
"encoding/json"
"errors"
"fmt"
"io"

"github.com/cometbft/cometbft/types"
)

const ChainIDFieldName = "chain_id"

// ParseChainIDFromGenesis parses the `chain_id` from the genesis json file and abort early,
// it still parses the values before the `chain_id` field, particularly if the `app_state` field is
// before the `chain_id` field, it will parse the `app_state` value, user must make sure the `chain_id`
// is put before `app_state` or other big entries to enjoy the efficiency.
yihuang marked this conversation as resolved.
Show resolved Hide resolved
func ParseChainIDFromGenesis(r io.Reader) (string, error) {
dec := json.NewDecoder(r)

var t json.Token
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this declaration yet down below we do: t, err := dec.Token()

t, err := dec.Token()
if err != nil {
return "", err
}
if t != json.Delim('{') {
return "", fmt.Errorf("expected {, got %s", t)
}

for dec.More() {
t, err = dec.Token()
if err != nil {
return "", err
}
key, ok := t.(string)
if !ok {
return "", fmt.Errorf("expected string, got %s", t)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected string for the key type, got %s

}

if key == ChainIDFieldName {
var chainId string
if err := dec.Decode(&chainId); err != nil {
return "", err
}
if err := validateChainID(chainId); err != nil {
return "", err
}
return chainId, nil
}

// skip the value
var value json.RawMessage
if err := dec.Decode(&value); err != nil {
return "", err
}
}

return "", errors.New("missing chain-id in genesis file")
}

func validateChainID(chainID string) error {
if chainID == "" {
return errors.New("genesis doc must include non-empty chain_id")
}
if len(chainID) > types.MaxChainIDLen {
return fmt.Errorf("chain_id in genesis doc is too long (max: %d)", types.MaxChainIDLen)
}

return nil
}
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validateChainID function checks if the chain_id is not empty and if its length does not exceed types.MaxChainIDLen. However, it would be better to also check if chain_id is not a string of whitespace characters because the current implementation considers a string of spaces as a valid chain_id.

func validateChainID(chainID string) error {
	if chainID == "" {
		return errors.New("genesis doc must include non-empty chain_id")
	}
+	if strings.TrimSpace(chainID) == "" {
+		return errors.New("genesis doc must include non-whitespace chain_id")
+	}
	if len(chainID) > types.MaxChainIDLen {
		return fmt.Errorf("chain_id in genesis doc is too long (max: %d)", types.MaxChainIDLen)
	}

	return nil
}
Committable suggestion (Beta)
Suggested change
func validateChainID(chainID string) error {
if chainID == "" {
return errors.New("genesis doc must include non-empty chain_id")
}
if len(chainID) > types.MaxChainIDLen {
return fmt.Errorf("chain_id in genesis doc is too long (max: %d)", types.MaxChainIDLen)
}
return nil
}
func validateChainID(chainID string) error {
if chainID == "" {
return errors.New("genesis doc must include non-empty chain_id")
}
if strings.TrimSpace(chainID) == "" {
return errors.New("genesis doc must include non-whitespace chain_id")
}
if len(chainID) > types.MaxChainIDLen {
return fmt.Errorf("chain_id in genesis doc is too long (max: %d)", types.MaxChainIDLen)
}
return nil
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yihuang, this is a great suggestion from the rabbit! We can do it one shot though by making it the first check so

if strings.TrimSpace(chainID) == "" {

and that covers both cases where it was an empty string and where it only contains whitespace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, done.

113 changes: 113 additions & 0 deletions types/chain_id_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
package types
yihuang marked this conversation as resolved.
Show resolved Hide resolved

import (
_ "embed"
"strings"
"testing"

genutiltypes "github.com/cosmos/cosmos-sdk/x/genutil/types"
"github.com/stretchr/testify/require"
)

//go:embed testdata/parse_chain_id.json
var BenchmarkGenesis string

func TestParseChainIDFromGenesis(t *testing.T) {
testCases := []struct {
name string
json string
expChainID string
}{
{
"success",
`{
"state": {
"accounts": {
"a": {}
}
},
"chain_id": "test-chain-id"
}`,
"test-chain-id",
},
{
"nested",
`{
"state": {
"accounts": {
"a": {}
},
"chain_id": "test-chain-id"
}
}`,
"",
},
{
"not exist",
`{
"state": {
"accounts": {
"a": {}
}
},
"chain-id": "test-chain-id"
}`,
"",
},
{
"invalid type",
`{
"chain-id":1,
}`,
"",
},
{
"invalid json",
`[ " ': }`,
"",
},
{
"empty chain_id",
`{"chain_id": ""}`,
"",
},
{
"chain_id too long",
`{"chain_id": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}`,
"",
},
}

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
chain_id, err := ParseChainIDFromGenesis(strings.NewReader(tc.json))
if tc.expChainID == "" {
require.Error(t, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanna rigorously tighten the checks, you can do a substring match for the expected error e.g. require.Contains(t, err.Error(), "not a string")

Copy link
Collaborator Author

@yihuang yihuang Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, assert the error string.

} else {
require.NoError(t, err)
require.Equal(t, tc.expChainID, chain_id)
}
})
}
}

func BenchmarkParseChainID(b *testing.B) {
b.ReportAllocs()
b.Run("new", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
chainId, err := ParseChainIDFromGenesis(strings.NewReader(BenchmarkGenesis))
require.NoError(b, err)
require.Equal(b, "test_777-1", chainId)
}
})

b.Run("old", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
doc, err := genutiltypes.AppGenesisFromReader(strings.NewReader(BenchmarkGenesis))
require.NoError(b, err)
require.Equal(b, "test_777-1", doc.ChainID)
}
})
}
Loading
Loading