The wire decoder (in-house transaction and event decoding) #2158

haltman-at · 2019-07-05T22:34:58Z

So, first off, @davidmurdoch, @adrianmcli, this PR features yet more breaking changes to the contract decoder. Sorry. Don't worry, they'll be grouped with the others; I'll write a document later describing all the breaking changes at once. Anyway, if you want to know about the breaking changes in this PR specifically, just read the section where I talk about the interface (I've bolded its beginning and end).

Background: The contract decoder has long included functionality for decoding transactions and events. However, this functionality was delegated to web3, and thereby to ethers. Since we now have a new decoder output format, however, and the ethers decoder has various problems, it was decided to bring transaction and event decoding in-house. Now, the contract decoder's transaction and event decoding is done internally by Truffle, and the result uses our new output format. It also expands on the capabilities of ethers in several ways.

In addition to these upgrades to the contract decoder, also added is what I'm calling the "wire decoder". Whereas the contract decoder works only for a specific contract, the wire decoder will handle any transaction or event in the context of the project. (The contract decoder will throw an exception if you ask it to decode an event whose address does not match the contract's address, a message call whose to does not match the contract's address, or a contract creation that did not create that contract.)

Note that the wire decoder, while it works, is, IMO, not quite ready for prime-time, as they say. The reason is that currently it works like the contract decoder: Based on definitions. Since it's just doing ABI decoding, ideally it -- as well as the contract decoder's corresponding functions -- ought to be able to work with just ABI information, rather than definitions. Definitions have the particular problem that, in addition to possibly being missing, they may run into serious problems if the project was not compiled all at once. (The contract decoder already has this problem, so it's not new there.) I'm intending to address this in a later PR. But for now, while you can continue using the contract decoder where you already use it, I'm not sure I would recommend using the wire decoder quite yet.

Also note that currently there's a bunch of duplicated code between the wire decoder and the contract decoder. I've decided to just save factoring that for later, sorry.

So, let's start by talking about the interface.

First off, there's no more need to call decoder.init(). This is now handled in forContract() (or forProject(), for the wire decoder). However, correspondingly, forContract() is now async. So you'll have to await that. (I've also made forProject() async for consistency, even though it doesn't actually need to be.)

Next, let's look at the relevant functions -- decodeTransaction, decodeLog, and events. (There's also decodeLogs, but of course that works similarly to decodeLog.) An important interface change -- all of these are now async, so be sure to account for that!

The decodeTransaction function, as before, accepts a web3 Transaction object. It returns a DecodedTransaction object, which consists of the same contents but now with a decoding field. The decoding field holds an object, which comes in one of four forms. There is always a kind field to distinguish the cases.

If the transaction is a message call to a contract calling one of its external functions, kind will be equal to "function". There will also be the fields class (a ContractType object corresponding to the contract class that was called), name (the name of the function called), selector (the selector of the function called), and, most importantly, arguments. The arguments field is an array of objects; each object has a field value, which holds a decoder Result for that argument, and optionally name, for the name of that argument. (The reason name may be omitted is that it is legal for arguments to be nameless.)
If instead it's a constructor call, kind will be equal to "constructor". There is still the class field, but there is no name, and in place of selector there is bytecode, which is the bytecode of the constructor excluding the arguments. Finally of course there is arguments as before. In the future I'm hoping also to include information about what libraries the constructor has been linked to, but that isn't included at present.
If it's a message call but doesn't match the ABI, kind will be equal to "fallback". In this case we can't really decode; class will still be present, but there will be no arguments. There will be data, which is just another copy of the binary data sent (this is kind of unnecessary, but I thought I'd include it).
Finally, if we can't even tell what class is being called (or created), then kind will be "unknown", and there will be no other information included. This is basically the decoding failure case.

(By the way, yes you can decode library creation transactions, although seeing as these have no arguments, there's not much to decode. You cannot decode other transactions sent to libraries, as libraries do not expose an ABI other than for events; if you try the result will register as "fallback". I don't consider this a problem because you shouldn't be sending transactions to libraries, and if the library was compiled with Solidity 0.4.20 or later, it will just reject your transaction anyway. Anyway, contract creations are the only transactions that the decoder will handle without having a corresponding ABI entry, and those will work even for libraries.)

OK, what about decodeLog? This works similarly in that it takes a web3 Log object as input, and returns a DecodedLog, which has the same information but with a decodings field added. Note that plural -- a log may have multiple decodings! The decodings field is an array of decoding objects. (So, in case of decoding failure, there is not an "unknown" case; rather decodings will simply be empty.)

Each decoding object can have one of two forms; as before, the kind field distinguishes. The two cases are:

The decoding is for a non-anonymous event; in this case, kind is equal to "event". There are, as before, the fields class, arguments, name, and selector. The argument objects are slightly different from in the transaction case in that there will also be an indexed field.
The decoding is for an anonymous event; in this case, kind is equal to "anonymous". The only difference between this case and the previous one is the lack of a selector field.

Before we move on to the events function, it's worth making a few additional notes here about the output format of event decoding:

Indexed reference parameters cannot be decoded. If there's an indexed reference parameter, the Result for it will be an ErrorResult. Sorry. The ErrorResult will contain the raw information, at least.
Event decoding is done in "strict mode", meaning that a potential decoding for an event will not be included unless it has been encoded strictly correctly; any errors will cause that potential decoding to be skipped. This means that, other than the above exception for indexed reference parameters, all decoded event arguments should hold a Value, not an ErrorResult (and this applies to any Results contained within these values as well).
You may be wondering whether, in case of multiple decodings, these multiple decodings occur in any particular order. The answer is yes (to some extent). If there is a decoding corresponding to a non-anonymous event originating from the contract itself (there can be at most one of these), it will always be first. Following that will be any decodings corresponding to non-anonymous events originating from libraries. Following that will be decodings corresponding to anonymous events originating from the contract itself, and following that will be anonymous events originating from libraries.

OK, finally let's describe how to use the events function. This one has an additional breaking change, in that now, instead of taking positional arguments, it takes just a single options argument. (If you were calling it without arguments, at least, it will continue to work the same.) The three options that can be passed are name (meaning you only want events with a specified name; leave this out to not filter by name), fromBlock, and toBlock (note that these latter two default to "latest", so you'll have to change these if you want older events.)

As you might expect, it returns an array of DecodedLogs, in the order those events occurred. The thing to note here is that if the name option is passed, this has two effects. First, each decodings array will be filtered by name; only decodings matching the specified name will be included. Secondly, the array of decoded logs itself will be filtered; any decoded log with no decodings matching the specified name will be excluded entirely. (Note that if name is not passed, the return value can contain logs with an empty decodings array.)

OK, that's it for the interface changes! If you don't want to know about the internals, you can stop reading here.

So -- how does all this work?

Well, first we need to discuss a few renamings. Because it now handles encoding as well (spoilers!), truffle-decoder-core is now truffle-codec, and truffle-decode-utils is now truffle-codec-utils. Also, because we now handle multiple ABI-encoded location, all the stuff that previously referred to "calldata" in its name (or filename) now refers to "abi". So e.g. decodeCalldata is now decodeAbi, getCalldataAllocations is now getAbiAllocations, etc. Be careful not to be confused by this.

Anyway, let's start with allocation! In addition to storage, memory, and ABI allocations, there's now calldata allocation (that means something different now!) and event allocations. These latter two are handled in the same file as ABI allocations, though, since they're still ABI-related.

Calldata allocations are indexed first by contract ID and then by selector (with a separate field for the constructor allocation). They contain allocation information for the arguments, but they also include an overall offset indicating how far into the data the arguments start. (This is always 4 for functions, but it can obviously vary for constructors.) This is necessary for decoding any dynamic arguments, as those will be given as pointers relative to this offset.

Event allocations are a bit complicated. They're indexed first by number of topics; we then split into anonymous and non-anonymous cases. So non-anonymous events are indexed overall first by number of topics, then selector, then contract kind, then contract ID. Whereas anonymous events are indexed overall first by number of topics, then contract kind, then contract ID, and then if there are multiple they just go in a big array. Event allocations are similar to calldata allocations, but there's no offset (since it's always zero), and there's a contractId field telling us which contract it came from (since that information can't be found in the definition).

(Note: I'm hoping soon to remove all these definitions from all the allocations, but I'm going to save that for a future PR. For now allocations continue to contain definitions.)

This is probably also the time to mention that there's been a change to the pointer format so that they play better with TypeScript. Pointers now all have a location field indicating what location they point to. (Pointers returned by the ABI allocator, which has to work for multiple locations, have the generic location of "abi", but these pointers can't be passed to the decoder.) The rest of the pointer format has been appropriately adjusted (although storage pointers still rely on Range objects; I chose not to mess with that for now, even though that's probably a bit of unnecessary complication now).

I'll skip describing the actual process of allocation, as it's mostly nothing new, but one thing worth noting is that the calldata and event allocators, in order to perform allocation, start with the ABI entry for the function or event being allocated, and then search through contracts to find the corresponding definition, which they then use for the allocation process. Note that for functions and events, the allocator must search not only the contract itself, but also its parent contracts (in order, obviously, to find the right one in case of multiple inheritance). For constructors, of course, the allocator does not search parent contracts. However for constructors there's another little complication, in that default constructors don't appear in the ABI at all, so the allocator starts by giving the constructor a default allocation, and then later overwrites it if it finds a constructor entry in the ABI.

(By the way, also note that we now do computation of selectors ourself, rather than relying on web3; the versioning had simply gotten to be too much of a hassle.)

Anyway, that's allocation, how does decoding work?

Calldata decoding is pretty straightforward. The decodeCalldata function (not the old function by that name, but the new function, which lives alongside decodeVariable and decodeEvent in lib/interface/decoding.ts) takes just an EvmInfo argument; the calldata itself to decode is (of course) put in the calldata section of the state. Note however that that EvmInfo should contain a currentContext giving a context argument for the contract being called or created -- the caller is responsible for passing this in! If it's not included, an unknown decoding will be returned.

(Why is this left up to the caller, when decodeEvent doesn't make the caller do this work, and instead just has an address argument? Basically, because of the possibility that the context is a constructor, which doesn't fit very well into how the decoder core does things at the moment. I am thinking of chanigng this in the future, though. In any case, at the moment both the contract decoder and the wire decoder do this work to determine the context before calling decodeCalldata, so it all works.)

Anyway, decodeCalldata determines whether it's looking at a constructor call or function call based on currentContext, looks up the constructor allocation for the context in the former case and the function allocation (by selector) in the latter case, and then uses that allocation to decode the arguments. Pretty straightforward.

(By the way note of course that decodeCalldata and decodeEvent are generators, like decodeVariable. However the functions that call them at presently don't bother to handle the storage request case since that should never occur.)

Event decoding is where it gets messy. The decodeEvent function takes three arguments: info (which in particular includes the data to decode in the eventdata and eventtopics part of the state), address (the address the event came from, or appeared to come from), and, optionally targetName (if included this will skip any attempts to produce decodings that wouldn't yield the target name; obviously including this parameter in the low-level interface wasn't strictly necessary, but I figured speeding up decoding is a good thing, right?).

So, to summarize, decodeEvent first gathers the allocation objects for all possible events this could correspond to, whether from the contract at the given address or from a library, whether non-anonymous or anonymous. It does this by looking at the number of topics, the first topic (if present), and, as mentioned, what address it appears to have come from. Once it's gathered all that up, it's time to try them one by one, and see which ones produce valid decodings! (Note that it tries them in the order I described in the interface section above.)

How do we tell if a given possible allocation works? Well, we attempt to decode, and see if anything goes wrong. To do this we run the decoder in "strict mode", a new option. In strict mode, if the decoder encounters a problem, it won't return an ErrorResult; instead, it'll throw a StopDecodingError, telling us to skip over that possibility. (The one exception is the new error for indexed reference parameters; those do not stop the decoder for obvious reasons.) Also, strict mode turns on a check for overlong arrays or strings (or bytestrings). Note that I've written this check pretty crudely; it's not intended to actually catch all overlong arrays or strings, just ridiculously long ones so that trying an incorrect allocation doesn't DOS the decoder. But don't worry, other overlong arrays and strings will still be caught by another means!

(Let me pause here to talk about decoder options more generally -- now, for the core decode functions, any arguments other than the standard dataType, pointer, info are passed as a single options argument. The possible options are permissivePadding (which already existed), strictAbiMode (just described), abiPointerBase (the old base argument, now as an option), and memoryVisited (not actually used currently but will be in the future for circularity detection).)

Anyway, as I was saying, strict mode still isn't enough to tell us whether something is encoded strictly correctly according to a given allocation. So if an allocation has passed the trials of strict mode, and the decoder has produced a result for it, it must then face... the encoder! Having successfully decoded (or so we think), the resulting decoding (for the non-indexed parameters -- for indexed parameters, strict mode, together with the topic count check at the beginning, should be sufficient to catch any problems) is then fed back through the encoder to check whether it matches the original data. Only if we have a match do we push a new entry onto our array of decodings!

That's right, we have an encoder now! Thankfully it's a lot simpler than the decoder. Note that it doesn't really have an interface yet; that will be added in the future, so that our encoder can replace our use of the ethers encoder, but for now it's purely used for checking event decodings. Right now it only accepts Results as input.

Unfortunately, due to TypeScript not being as good at type inference as I thought, the encoder contains a bunch of type coercions. Oh well. But anyway it's still pretty simple. It has two main functions -- encodeAbi, which takes a single Result and encodes it; and encodeTupleAbi, which takes an array of Results and encodes them as a tuple. (These obviously both call each other; encodeTupleAbi needs to know how the individual components are encoded, and encodeAbi needs to call encodeTupleAbi in order to encode arrays and structs.) Also note that these functions use the ABI allocations (which they take as an argument) in order to simplify the process of encoding.

Also note that these functions can return either a Uint8Array or undefined (don't worry, this is accounted for). Why can they return undefined? This is what you'll get if you pass in either a value that can't be ABI-encoded (such as a mapping or internal function), or if you pass in an ErrorResult. Errors can't be encoded!

Anyway, it's possible I've forgotten something, this is a big PR, but I think that basically sums this up! Hopefully soon both the ABI encoder and wire decoder will have more complete and robust interfaces, and we'll be able to ditch all reliance on the ethers encoder and decoder!

…rmat

Redo output format with interfaces instead of classes (also some bugfixes)

haltman-at · 2019-07-16T00:35:14Z

Note: This PR has had #2180 merged into it, so make sure to also read the description and comments there.

…ecoder

Factor out wire decoder from contract decoder

haltman-at · 2019-07-25T02:38:26Z

Note: This PR has now had PR #2232 merged into it, so be sure to see the comments there, too.

haltman-at · 2019-07-25T04:50:55Z

OK, three more changes:

In the contract decoder, the balance and nonce fields in the state have been renamed balanceAsBN and nonceAsBN (note: this is another breaking change).
For bools and enums there is no longer a padding error separate from the out of range error; these are now all just treated as out of range errors.
In ABI decodings, the name field has been replaced with an abi field containing the ABI entry. Note that for default constructors and fallback functions, this is generated rather than read from the ABI (since it's not in the ABI). Also, to make this work, calldata and event allocations now include the ABI entry as well as the definition. As I've said before, I'm going to get rid of the definitions later, but for now they're still there.

gnidan

well I pretty much read every line. most of this I'm just going to have to take your word for. well, much of it.

packages/truffle-codec-utils/package.json

packages/truffle-codec-utils/src/abi.ts

packages/truffle-codec-utils/src/definition.ts

packages/truffle-codec/lib/types/evm.ts

packages/truffle-codec/lib/types/pointer.ts

packages/truffle-core/lib/debug/printer.js

packages/truffle-debugger/lib/session/index.js

packages/truffle-decoder/test/contracts/WireTest.sol

haltman-at · 2019-08-16T03:08:53Z

OK I've renamed the "fallback" decoding case to "message". I should probably change how the ABI returned in the decoding works in that case, but, well, I think I'm going to save that for a separate PR.

gnidan

mergetown!

haltman-at added 30 commits June 3, 2019 22:43

Add event and transaction decoding (initial commit, don't use)

07d663d

Merge branch 'separate-contract-decoder' into wire-decoder

1540474

Account for further merge changes

995fade

Add "unknown" type to decoded transactions & events

dd51ee1

Account for possibility of anonymous calldata parameters

e54d8a9

Merge branch 'separate-contract-decoder' into wire-decoder

cd0e9fc

Update errors for new type system

9d01456

Merge branch 'separate-contract-decoder' into wire-decoder

3992f40

Add class field to decoding; have contract decoder check address

d50b0b1

Merge branch 'separate-contract-decoder' into wire-decoder

51e8ad7

Update types

a637570

Merge branch 'separate-contract-decoder' into wire-decoder

769ef1c

Make use of functionKind shim (+ correction to name check)

d2845d3

Merge branch 'separate-contract-decoder' into wire-decoder

cda1056

Fix default constructor allocation; regularize calldata allocation fo…

d039eef

…rmat

Handle the possibility of library events

e586eaa

Store event allocations by selector only; update contract.ts

ba1b5af

Merge branch 'separate-contract-decoder' into wire-decoder

c797ff1

Merge branch 'separate-contract-decoder' into wire-decoder

98075b0

Pass along compiler info in wire decoder

06e5ad6

Merge branch 'separate-contract-decoder' into wire-decoder

fc908ec

Remove redundancy in calldata interface

65afedd

Fix lots of compile errors!

c686278

Merge branch 'separate-contract-decoder' into wire-decoder

6185c2a

Add a bit that went missing in a merge somewhere

65a4ebd

Redo calldata/event allocation interface

c95ad3f

Add an abi encoder!

ac1a2a5

Merge branch 'separate-contract-decoder' into wire-decoder

a267db7

Update yarn.lock

b6115b8

Rename truffle-decoder-core to truffle-codec

cd630d7

haltman-at and others added 2 commits July 15, 2019 20:03

Add missing return; add infinite loop comments

63d732f

Merge pull request #2180 from trufflesuite/declassify

93cb78b

Redo output format with interfaces instead of classes (also some bugfixes)

haltman-at and others added 6 commits July 17, 2019 17:35

Make type hints and type IDs just strings

a49a63a

Fix error in previous commit (sorry!)

d246e1d

Factor out wire decoder from contract decoder

0afcbdf

Avoid including a duplicate of contract in contracts passed to wire d…

9503dac

…ecoder

Add extra context so self-pointers can be recognized even w/o bytecode

f3d74b5

Merge pull request #2232 from trufflesuite/factor-decoders

bbf6c39

Factor out wire decoder from contract decoder

haltman-at added 3 commits July 24, 2019 23:18

Rename balance, nonce to balanceAsBN, nonceAsBN

4c7b96b

Get rid of separate padding errors for bool, enum

784cd78

Replace name in decodings with abi

63ada12

haltman-at added 2 commits July 25, 2019 16:37

Fix out of date package name!

781b86b

Fix error with watch expression display

bb6438f

gnidan reviewed Jul 28, 2019

View reviewed changes

haltman-at added 7 commits July 30, 2019 16:35

Merge branch 'next' into wire-decoder

ee56bc7

Remove no-longer-used lodash.isequal

9b78765

Clean out-of-range bools used as mapping keys

2751d6d

Merge branch 'next' into wire-decoder

2181dc0

Merge branch 'next' into wire-decoder

210f9e6

Address PR comments

3fb9352

Rename fallback decoding case to message

d5c833c

Add test of debugger's boolean-key-cleaning

1d329e9

haltman-at requested a review from gnidan August 16, 2019 17:32

gnidan approved these changes Aug 21, 2019

View reviewed changes

haltman-at merged commit 9666431 into next Aug 21, 2019

gnidan deleted the wire-decoder branch November 23, 2020 00:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The wire decoder (in-house transaction and event decoding) #2158

The wire decoder (in-house transaction and event decoding) #2158

haltman-at commented Jul 5, 2019 •

edited

Loading

haltman-at commented Jul 16, 2019

haltman-at commented Jul 25, 2019

haltman-at commented Jul 25, 2019 •

edited

Loading

gnidan left a comment •

edited

Loading

haltman-at commented Aug 16, 2019

gnidan left a comment

The wire decoder (in-house transaction and event decoding) #2158

The wire decoder (in-house transaction and event decoding) #2158

Conversation

haltman-at commented Jul 5, 2019 • edited Loading

haltman-at commented Jul 16, 2019

haltman-at commented Jul 25, 2019

haltman-at commented Jul 25, 2019 • edited Loading

gnidan left a comment • edited Loading

Choose a reason for hiding this comment

haltman-at commented Aug 16, 2019

gnidan left a comment

Choose a reason for hiding this comment

haltman-at commented Jul 5, 2019 •

edited

Loading

haltman-at commented Jul 25, 2019 •

edited

Loading

gnidan left a comment •

edited

Loading