Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: upgrade non-determinism #10188

Closed
4 tasks
tomtau opened this issue Sep 17, 2021 · 7 comments · Fixed by #10189
Closed
4 tasks

fix: upgrade non-determinism #10188

tomtau opened this issue Sep 17, 2021 · 7 comments · Fixed by #10189
Assignees
Labels

Comments

@tomtau
Copy link
Contributor

tomtau commented Sep 17, 2021

Summary of Bug

post-upgrade block may have a different result app hash.

Version

0.44.0

Steps to Reproduce

I did not find a way to reliably reproduce it so far :(


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@tomtau tomtau changed the title upgrade non-determinism fix: upgrade non-determinism Sep 17, 2021
@alexanderbez
Copy link
Contributor

Hey @tomtau, any chance you could provide a bit more info here? This issue really isn't saying much.

@tomtau
Copy link
Contributor Author

tomtau commented Sep 21, 2021

@alexanderbez sure, reposting the original quick info from Discord:

12:44AM ERR prevote step: ProposalBlock is invalid err="wrong Block.Header.AppHash.  Expected EF34ACE75EA7F21119E26E059F8AE85E6D9CB8505B1C8D4DF8D4359370CB004C, got EBE5701410CEFCC7461FB4314205F02334571CD9CB7098EEDE95FDB3A2A49603" height=250777 module=consensus round=6

-- and the network came to halt (given it's 40-60)

@alexanderbez
Copy link
Contributor

alexanderbez commented Sep 21, 2021

So where does the non-determinism come from in the state machine? There isn't anything we can achieve with this issue atm.

If there is any non-determinism in the SDK, the simulations would've caught it.

@tomtau
Copy link
Contributor Author

tomtau commented Sep 23, 2021

@alexanderbez do any simulations have cross-SDK version upgrades (0.42->0.44)? One potential source of non-determinism may be in upgrade migrations: #10189 (comment)

@alexanderbez
Copy link
Contributor

Simulations don't simulate/test cross-version live upgrades -- they simulate the state machine only. If there is non-determinism across upgrades, then that sounds like an issue in your upgrade handler?

@tomtau
Copy link
Contributor Author

tomtau commented Sep 24, 2021

@alexanderbez the upgrade handler doesn't have anything special besides the Cosmos SDK's 0.42->0.44 migrations: https://github.com/crypto-org-chain/chain-main/blob/v3.1.1/app/app.go#L464

@alexanderbez
Copy link
Contributor

Yes @AmauryM explained it to me -- iteration over modules was non-deterministic because we were using a map.

odeke-em added a commit to cosmos/gosec that referenced this issue Nov 10, 2021
… value

This pass exists to curtail non-determinism in the cosmos-sdk
which stemmed from iterating over maps during upgrades and that
caused a chaotic debug for weeks. With this change, we'll now
enforce and report failed iterations, with the rule being that
a map in a range should involve ONLY one of these 2 operations:
* for k := range m { delete(m, k) } for fast map clearing
* for k := range m { keys = append(keys, k) } to retrieve keys & sort

thus we shall get this report:
```shell
[gosec] 2021/11/09 03:18:57 Rule error: *sdk.mapRanging => the value in the range statement should be nil: want: for key := range m (main.go:19)
[gosec] 2021/11/09 03:18:57 Rule error: *sdk.mapRanging => the value in the range statement should be nil: want: for key := range m (main.go:27)
```

from the code below:

```go
package main

func main() {
	m := map[string]int{
		"a": 0,
		"b": 1,
		"c": 2,
		"d": 3,
	}

	makeMap := func() map[string]string { return nil }

	keys := make([]string, 0, len(m))
	for k := range m {
		keys = append(keys, k)
	}

	values := make([]int, 0, len(m))
	for _, value := range m {
		values = append(values, value)
	}

	type kv struct {
		k, v interface{}
	}
	kvL := make([]*kv, 0, len(m))
	for k, v := range m {
		kvL = append(kvL, &kv{k, v})
	}

	for k := range m {
		delete(m, k)
	}

	for k := range makeMap() {
		delete(m, k)
	}

	for k := range do() {
		delete(m, k)
	}
}

func do() map[string]string { return nil }
```

Updates cosmos/cosmos-sdk#10189
Updates cosmos/cosmos-sdk#10188
Updates cosmos/cosmos-sdk#10190
odeke-em added a commit to cosmos/gosec that referenced this issue Nov 15, 2021
… value

This pass exists to curtail non-determinism in the cosmos-sdk
which stemmed from iterating over maps during upgrades and that
caused a chaotic debug for weeks. With this change, we'll now
enforce and report failed iterations, with the rule being that
a map in a range should involve ONLY one of these 2 operations:
* for k := range m { delete(m, k) } for fast map clearing
* for k := range m { keys = append(keys, k) } to retrieve keys & sort

thus we shall get this report:
```shell
[gosec] 2021/11/09 03:18:57 Rule error: *sdk.mapRanging => the value in the range statement should be nil: want: for key := range m (main.go:19)
[gosec] 2021/11/09 03:18:57 Rule error: *sdk.mapRanging => the value in the range statement should be nil: want: for key := range m (main.go:27)
```

from the code below:

```go
package main

func main() {
	m := map[string]int{
		"a": 0,
		"b": 1,
		"c": 2,
		"d": 3,
	}

	makeMap := func() map[string]string { return nil }

	keys := make([]string, 0, len(m))
	for k := range m {
		keys = append(keys, k)
	}

	values := make([]int, 0, len(m))
	for _, value := range m {
		values = append(values, value)
	}

	type kv struct {
		k, v interface{}
	}
	kvL := make([]*kv, 0, len(m))
	for k, v := range m {
		kvL = append(kvL, &kv{k, v})
	}

	for k := range m {
		delete(m, k)
	}

	for k := range makeMap() {
		delete(m, k)
	}

	for k := range do() {
		delete(m, k)
	}
}

func do() map[string]string { return nil }
```

Updates cosmos/cosmos-sdk#10189
Updates cosmos/cosmos-sdk#10188
Updates cosmos/cosmos-sdk#10190
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants