Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/capability: GetCapability deletes capabilities unexpectedly #7805

Closed
4 tasks
ebuchman opened this issue Nov 4, 2020 · 4 comments
Closed
4 tasks

x/capability: GetCapability deletes capabilities unexpectedly #7805

ebuchman opened this issue Nov 4, 2020 · 4 comments
Assignees
Labels

Comments

@ebuchman
Copy link
Member

ebuchman commented Nov 4, 2020

Surfaced from Informal Systems IBC Audit of cosmos-sdk hash 82f15f3.

If ScopedKeeper.GetCapability is called with a name that is not in its store, or by a module that doesn't own the cap it's getting, it will delete the capability with the 0th index.

GetCapability contains the following code:

	key := types.RevCapabilityKey(sk.module, name)
	indexBytes := memStore.Get(key)
	index := sdk.BigEndianToUint64(indexBytes)

	if len(indexBytes) == 0 {
		// If a tx failed and NewCapability got reverted, it is possible
		// to still have the capability in the go map since changes to
		// go map do not automatically get reverted on tx failure,
		// so we delete here to remove unnecessary values in map
		delete(sk.capMap, index)
		return nil, false
	}

The intention here seems to be that if the key is not in the store, make sure we remove the associated index from our capMap. However, the capability is referred by name here, and we have to look it up in the store to get the index. But if the lookup returns empty, we don't actually know what the index is, and the empty return bytes are interpreted as 0, so we end up deleting the 0th capability! This is a double whammy: it fails to delete the phantom capability in the capMap, and it succeeds in deleting the 0th indexed capability from the capMap, which will likely cause problems later when something tries to access it.

We don't have a test yet for the case where there was a roll back, but the following two tests (to be run from x/capability/keeper) demonstrate ways this could be triggered. The first involves a module calling GetCapability with a name that doesn't exist, and the second involves a second module called GetCapability on a name owned by another module:

func TestNewGetCap1(t *testing.T) {
	cdc := simapp.MakeTestEncodingConfig().Marshaler
	db := dbm.NewMemDB()
	cms := store.NewCommitMultiStore(db)
	key := sdk.NewKVStoreKey("hello")
	memkey := storetypes.NewMemoryStoreKey("memhello")

	// create new keeper so we can define custom scoping before init and seal
	keeper := keeper.NewKeeper(cdc, key, memkey)

	// scope to module
	sk1 := keeper.ScopeToModule(banktypes.ModuleName)

	// mount stores, initialize and seal
	cms.MountStoreWithDB(key, sdk.StoreTypeIAVL, db)
	cms.MountStoreWithDB(memkey, sdk.StoreTypeMemory, nil)
	require.NoError(t, cms.LoadLatestVersion())
	ctx := sdk.NewContext(cms, tmproto.Header{Height: 1}, false, log.NewNopLogger())
	keeper.InitializeAndSeal(ctx)

	// create a new capability
	cap, err := sk1.NewCapability(ctx, "transfer")
	require.NoError(t, err)
	require.NotNil(t, cap)

	// try to get it, should pass.
	got, ok := sk1.GetCapability(ctx, "transfer")
	require.True(t, ok)
	require.NotNil(t, got)
	require.Equal(t, cap, got, "expected memory addresses to be equal")

	// try to get a made up capability, should fail.
	got, ok = sk1.GetCapability(ctx, "transfer-1")
	require.False(t, ok)
	require.Nil(t, got)

	// try to get the original again. should pass. but fails !
	got, ok = sk1.GetCapability(ctx, "transfer")
	require.True(t, ok)
	require.NotNil(t, got)
	require.Equal(t, cap, got, "expected memory addresses to be equal")
}

func TestNewGetCap2(t *testing.T) {
	cdc := simapp.MakeTestEncodingConfig().Marshaler
	db := dbm.NewMemDB()
	cms := store.NewCommitMultiStore(db)
	key := sdk.NewKVStoreKey("hello")
	memkey := storetypes.NewMemoryStoreKey("memhello")

	// create new keeper so we can define custom scoping before init and seal
	keeper := keeper.NewKeeper(cdc, key, memkey)

	// scope to two modules
	sk1 := keeper.ScopeToModule(banktypes.ModuleName)
	sk2 := keeper.ScopeToModule(stakingtypes.ModuleName)

	// mount stores, initialize and seal
	cms.MountStoreWithDB(key, sdk.StoreTypeIAVL, db)
	cms.MountStoreWithDB(memkey, sdk.StoreTypeMemory, nil)
	require.NoError(t, cms.LoadLatestVersion())
	ctx := sdk.NewContext(cms, tmproto.Header{Height: 1}, false, log.NewNopLogger())
	keeper.InitializeAndSeal(ctx)

	// module 1 creates a new capability
	cap, err := sk1.NewCapability(ctx, "transfer")
	require.NoError(t, err)
	require.NotNil(t, cap)

	// module 1 tries to get it. should pass.
	got, ok := sk1.GetCapability(ctx, "transfer")
	require.True(t, ok)
	require.NotNil(t, got)
	require.Equal(t, cap, got, "expected memory addresses to be equal")

	// module 2 tries to get it. should fail
	got, ok = sk2.GetCapability(ctx, "transfer")
	require.False(t, ok)
	require.Nil(t, got)

	// module 1 tries to get it. should pass. but fails !
	got, ok = sk1.GetCapability(ctx, "transfer")
	require.True(t, ok)
	require.NotNil(t, got)
	require.Equal(t, cap, got, "expected memory addresses to be equal")
}

While both tests should pass, they fail. In the first test, if the call to sk1.GetCapability(ctx, "transfer-1") is removed, the test passes. In the second, if the call to sk2.GetCapability is removed, the test passes. So it seems these calls are causing the 0th index capability to be wiped due to the empty bytes returned from the store being interpreted as the 0th index. Notably, this bug only applies to the capability with index 0 - if the index is already greater than 0, the test sequences above would pass as expected.

Fixes

One approach may be to simply start the capability index at 1, which would solve the deletion issues. However it would not solve the problem of failing to remove phantom capabilities from reverted txs, since the problem here is we don't know their index. Possibly the simplest solution there is to maintain a reverse lookup map from name to index (just like the fwd/reverse maps are kept in the store, we'd keep one in the local map too).

Also, should the capMap really be shared across all ScopedKeepers?

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@ebuchman ebuchman added this to the IBC 1.0 Implementation Audit milestone Nov 4, 2020
@ebuchman ebuchman added the T:Bug label Nov 4, 2020
@alexanderbez
Copy link
Contributor

I like the idea of a reverse lookup map. @AdityaSripal was there a reason we went with zero-based indexing?

@AdityaSripal
Copy link
Member

AdityaSripal commented Nov 6, 2020

Sorry it took me a while to reply @alexanderbez @ebuchman, I was pretty sure I initialized with 1 but needed to reread the code to make sure as it has been a while. Read through your issue, thanks for surfacing this!!

So we actually do start with one-based indexing with the capability module which is why there isn't a bug with regards to deleting a capability.

The global starting capability index is inititialized in InitChain by reading from the module's GenesisState. The default GenesisState will intialize the capability index to 1. The reason this is done through genesis and not statically in code, is that a chain that is importing state from a previous blockchain will need to initialize the capability index to the index that was exported by the previous chain (ie some index > 1). Note that if a chain tried to Initialize the capability index with 0, this would indeed cause issues which is why we error on ValidateGenesis if GenesisState.Index == 0.

Here is the PR that fixed this issue: #6047

Specific files on master to look at:
https://github.com/cosmos/cosmos-sdk/blob/master/x/capability/types/genesis.go
https://github.com/cosmos/cosmos-sdk/blob/master/x/capability/genesis.go

So while this error is possible to recreate in keeper tests, it cannot happen in a real chain since capability index will never start at 0 and no capability will ever be written to the 0th index. Let me know if there is a way for this to be documented better, or if you believe there is a better solution than the one implemented.

However, your point about us not deleting the capability from the map in case of a tx rollback is absolutely correct and should be fixed. A reverse lookup map should work.

One issue I found is that SetIndex I believe should still check it hasn't been previously initialized. This was implemented in #6047 but removed in #6057. I believe that check should be added back.
f82bc19#diff-6f9e7102c865d8072bd3eceea20a24354290dfafe38c7cf59135b31ead9ec3f4R133
While still allowing initialization with "non-1" indices.

@ebuchman
Copy link
Member Author

ebuchman commented Nov 6, 2020

Aha, thanks for the details, I should have seen that! I had the comment on SetIndex marked as incorrect, but went after the "its set to 1" issue and demonstrated it artificially. Clearly the stores need to be implemented from genesis states - in trying to cut out the simapp as a testing dependency, I neglected that.

I think adding more constraints to SetIndex is a good idea - both checking if its already set and if the index is greater than 0. I've submitted a PR which does so and fixes some other comments/tests here: #7845

@colin-axner
Copy link
Contributor

looks like this issue has been addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants