Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending a proto tx message without the public key causes error in RPC and tx gets stuck in cache #7585

Closed
4 tasks
andynog opened this issue Oct 16, 2020 · 7 comments · Fixed by #7594
Closed
4 tasks
Assignees
Labels

Comments

@andynog
Copy link
Contributor

andynog commented Oct 16, 2020

Summary of Bug

While testing to send a MsgConnectionOpenInit transaction to a Cosmos SDK chain (v0.40.0-rc0), I've encountered an Internal Error with the RPC HTTP Handler and potentially might cause additional errors.

Version

Relayer (Stargate-4)
Tendermint v0.34.0-rc4
Cosmos SDK v0.40.0-rc0

Steps to Reproduce

  • Spin up two chains using the Go Relayer using its dev-env script
  • I've built a transaction (in my case MsgConnectionOpenInit) using our Rust Relayer. The tx signing code is not in the repo yet. But I believe the error would happen for any transaction and any client libs/relayer since it seems it's related to protobuf.
  • In my Rust code, while testing I've set the public_key property of a SignerInfo to None (which I believe might set it to an empty string when decoded in Cosmos).
     let signer_info = SignerInfo {
            public_key: None,
            mode_info: mode,
            sequence: 0,
        };

The protobuf file where SignerInfo is says the public_key is optional so that's why I set it to None. Other frameworks might set it to a null or empty value.

message SignerInfo {
  // public_key is the public key of the signer. It is optional for accounts
  // that already exist in state. If unset, the verifier can use the required \
  // signer address for this position and lookup the public key.
  google.protobuf.Any public_key = 1;
  // mode_info describes the signing mode of the signer and is a nested
  // structure to support nested multisig pubkey's
  ModeInfo mode_info = 2;
  // sequence is the sequence of the account, which describes the
  // number of committed transactions signed by a given address. It is used to
  // prevent replay attacks.
  uint64 sequence = 3;
}
  • Sent a broadcast_tx_sync to the chain. Probably you can reproduce the same using this command below (the tx hash is for a MsgConnectionOpenInit)
curl localhost:26657/broadcast_tx_sync?tx=0x0a9f010a9c010a2d2f6962632e636f72652e636f6e6e656374696f6e2e76312e4d7367436f6e6e656374696f6e4f70656e496e6974126b0a0d6962637a65726f636c69656e74120b6962637a65726f636f6e6e1a1c0a0c6962636f6e65636c69656e74120a6962636f6e65636f6e6e1a002205312e302e302a283235454635364341373935313335453430393336384536444238313130463232413442453035433212080a0612040a0208011a40ac3342993e25da936cddc7be3d8f603ca6e9661518d536d0c482e18a0154aa096e438a6b9bcadfcfc2f0d689dccaf55b96399d67a8361b70f5da13091e2f929b
  • In the broadcast response there was an Internal Error message.
Internal error: runtime error: invalid memory address or nil pointer dereference (code: -32603)
  • Checked the log for the chain found this message:
E[2020-10-16|16:57:54.538] Panic in RPC HTTP handler                    module=rpc-server err="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 1086398 [running]:\nruntime/debug.Stack(0x1890fc0, 0x1880780, 0x2941c40)\n\truntime/debug/stack.go:24 +0x9f\ngithub.com/tendermint/tendermint/rpc/jsonrpc/server.RecoverAndLogHandler.func1.2(0xc006312a20, 0x1ebd880, 0xc0028dee80, 0xbfdaa034a00d6792, 0x12b18964a9e4, 0x29e0ae0, 0xc004b94100)\n\tgithub.com/tendermint/[email protected]/rpc/jsonrpc/server/http_server.go:185 +0x175\npanic(0x1880780, 0x2941c40)\n\truntime/panic.go:969 +0x175\ngithub.com/cosmos/cosmos-sdk/codec/types.(*interfaceRegistry).UnpackAny(0xc0000c2200, 0x0, 0x17b65c0, 0xc006727340, 0xc0028e0f60, 0x4113e5)\n\tgithub.com/cosmos/[email protected]/codec/types/interface_registry.go:149 +0x3a\ngithub.com/cosmos/cosmos-sdk/types/tx.(*SignerInfo).UnpackInterfaces(...)\n\tgithub.com/cosmos/[email protected]/types/tx/types.go:163\ngithub.com/cosmos/cosmos-sdk/types/tx.(*AuthInfo).UnpackInterfaces(0xc006312b40, 0x1e822e0, 0xc0000c2200, 0x7f08901a2a38, 0xc006312b40)\n\tgithub.com/cosmos/[email protected]/types/tx/types.go:153 +0xa9\ngithub.com/cosmos/cosmos-sdk/codec/types.UnpackInterfaces(...)\n\tgithub.com/cosmos/[email protected]/codec/types/interface_registry.go:205\ngithub.com/cosmos/cosmos-sdk/codec.(*ProtoCodec).UnmarshalBinaryBare(0xc0000c2210, 0xc006fe3230, 0x8, 0x8, 0x1ecb720, 0xc006312b40, 0x0, 0x0)\n\tgithub.com/cosmos/[email protected]/codec/proto_codec.go:70 +0x114\ngithub.com/cosmos/cosmos-sdk/x/auth/tx.DefaultTxDecoder.func1(0xc000278b40, 0xee, 0x1e0, 0x2d2846dfb9e8e46, 0x78a5636f748f82ee, 0x8cc7020884c87814, 0xbad52375509a6be6)\n\tgithub.com/cosmos/[email protected]/x/auth/tx/decoder.go:48 +0x274\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).CheckTx(0xc000583500, 0xc000278b40, 0xee, 0x1e0, 0xc000000000, 0x0, 0x0, 0x0, 0x0, 0x0, ...)\n\tgithub.com/cosmos/[email protected]/baseapp/abci.go:216 +0x197\ngithub.com/tendermint/tendermint/abci/client.(*localClient).CheckTxAsync(0xc0010391a0, 0xc000278b40, 0xee, 0x1e0, 0xc000000000, 0x0)\n\tgithub.com/tendermint/[email protected]/abci/client/local_client.go:98 +0xdc\ngithub.com/tendermint/tendermint/proxy.(*appConnMempool).CheckTxAsync(0xc0005deaf0, 0xc000278b40, 0xee, 0x1e0, 0x0, 0x4133b0)\n\tgithub.com/tendermint/[email protected]/proxy/app_conn.go:126 +0x59\ngithub.com/tendermint/tendermint/mempool.(*CListMempool).CheckTx(0xc0010549c0, 0xc000278b40, 0xee, 0x1e0, 0xc006727310, 0x17d0000, 0x0, 0x0, 0x0, 0x0)\n\tgithub.com/tendermint/[email protected]/mempool/clist_mempool.go:288 +0x477\ngithub.com/tendermint/tendermint/rpc/core.BroadcastTxSync(0xc006312a60, 0xc000278b40, 0xee, 0x1e0, 0x0, 0x0, 0x0)\n\tgithub.com/tendermint/[email protected]/rpc/core/mempool.go:36 +0xd3\nreflect.Value.call(0x185cb40, 0x1cd7498, 0x13, 0x1a8a19b, 0x4, 0xc002e18f00, 0x2, 0x2, 0xc002e18f18, 0xc006312aa0, ...)\n\treflect/value.go:475 +0x8c7\nreflect.Value.Call(0x185cb40, 0x1cd7498, 0x13, 0xc002e18f00, 0x2, 0x2, 0x1, 0x2, 0x1cd9500)\n\treflect/value.go:336 +0xb9\ngithub.com/tendermint/tendermint/rpc/jsonrpc/server.makeHTTPHandler.func2(0x1eb6ac0, 0xc006312a20, 0xc004b94100)\n\tgithub.com/tendermint/[email protected]/rpc/jsonrpc/server/http_uri_handler.go:56 +0x439\nnet/http.HandlerFunc.ServeHTTP(0xc002deacf0, 0x1eb6ac0, 0xc006312a20, 0xc004b94100)\n\tnet/http/server.go:2042 +0x44\nnet/http.(*ServeMux).ServeHTTP(0xc000698d40, 0x1eb6ac0, 0xc006312a20, 0xc004b94100)\n\tnet/http/server.go:2417 +0x1ad\ngithub.com/tendermint/tendermint/rpc/jsonrpc/server.maxBytesHandler.ServeHTTP(0x1e842a0, 0xc000698d40, 0xf4240, 0x1eb6ac0, 0xc006312a20, 0xc004b94100)\n\tgithub.com/tendermint/[email protected]/rpc/jsonrpc/server/http_server.go:234 +0xd4\ngithub.com/tendermint/tendermint/rpc/jsonrpc/server.RecoverAndLogHandler.func1(0x1eb7680, 0xc0006ae000, 0xc004b94100)\n\tgithub.com/tendermint/[email protected]/rpc/jsonrpc/server/http_server.go:207 +0x39a\nnet/http.HandlerFunc.ServeHTTP(0xc002788cc0, 0x1eb7680, 0xc0006ae000, 0xc004b94100)\n\tnet/http/server.go:2042 +0x44\nnet/http.serverHandler.ServeHTTP(0xc00015f500, 0x1eb7680, 0xc0006ae000, 0xc004b94100)\n\tnet/http/server.go:2843 +0xa3\nnet/http.(*conn).serve(0xc003a7d5e0, 0x1ebd340, 0xc004b19f80)\n\tnet/http/server.go:1925 +0x8ad\ncreated by net/http.(*Server).Serve\n\tnet/http/server.go:2969 +0x36c\n"
if any.TypeUrl == "" {
		// if TypeUrl is empty return nil because without it we can't actually unpack anything
		return nil
	}
  • Also, event though the chain keeps running (doesn't crash), when I tried to send the same transaction again, I've got a message that the tx already exists in cache. So not sure if this could cause additional problems trying to replay it many times.
{
  "jsonrpc": "2.0",
  "id": -1,
  "error": {
    "code": -32603,
    "message": "Internal error",
    "data": "tx already exists in cache"
  }
}

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@ebuchman
Copy link
Member

ebuchman commented Oct 16, 2020

My impression is that sending this through the RPC it triggers a panic which is caught by Tendermint's RPC panic handler and so doesn't cause any real problems. However if a malicious proposer included this tx directly in a block I think it would crash all the nodes. I would suspect there's similar such bugs to this lurking (effectively empty proto fields deserializing to nil and not being checked before access) so we should probably try to suss them all out.

@ebuchman
Copy link
Member

Actually the SDK has a panic handler doesnt it that would catch a panic like this and just cause the tx to be invalid, but wouldn't crash anything ?

@alexanderbez
Copy link
Contributor

Yes, the SDK has a recover handler during tx/message execution.

@aaronc
Copy link
Member

aaronc commented Oct 18, 2020

This is happening in TxDecoder. Maybe that doesn't have a panic handler??

So two things to do:

  • UnpackAny should properly handle the case where *Any is nil gracefully
  • Make sure all of CheckTx and DeliverTx have panic handlers including TC decoding

We can also address the call to UnpackAny with nil for public key, but the above change to UnpackAny should allow that to be okay generally.

/cc @amaurymartiny @blushi

@alexanderbez
Copy link
Contributor

alexanderbez commented Oct 18, 2020

CheckTx and DeliverTx decode the tx before calling BaseApp#runTx -- I don't think that makes the most sense and the function signature of BaseApp#runTx is weird.

So I propose we move the tx decoding to BaseApp#runTx (after recover block) and we get panic recovery for free.

@blushi
Copy link
Contributor

blushi commented Oct 19, 2020

CheckTx and DeliverTx decode the tx before calling BaseApp#runTx -- I don't think that makes the most sense and the function signature of BaseApp#runTx is weird.

So I propose we move the tx decoding to BaseApp#runTx (after recover block) and we get panic recovery for free.

@alexanderbez The function signature of BaseApp#runTx is indeed weird but it seems like there's a reason for it. In fact, it's used in BaseApp#Check and BaseApp#Deliver methods with decoded tx only.
So if we remove tx sdk.Tx from runTx's arguments, that means that we would need to re-encode it so it then can be decoded again...
That being said, @amaurymartiny noticed BaseApp#Check and BaseApp#Deliver are just used in tests and simulation for now, so maybe that's actually not really an issue.

@alexanderbez
Copy link
Contributor

That being said, @amaurymartiny noticed BaseApp#Check and BaseApp#Deliver are just used in tests and simulation for now, so maybe that's actually not really an issue.

Yes, these are solely used for tests. I don't know how much of a lift it would be, but I think it's the better solution.

@mergify mergify bot closed this as completed in #7594 Oct 19, 2020
@aaronc aaronc removed the backlog label Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants