Skip to content
This repository has been archived by the owner on Feb 26, 2024. It is now read-only.

The wire decoder (in-house transaction and event decoding) #2158

Merged
merged 114 commits into from
Aug 21, 2019
Merged

Conversation

haltman-at
Copy link
Contributor

@haltman-at haltman-at commented Jul 5, 2019

So, first off, @davidmurdoch, @adrianmcli, this PR features yet more breaking changes to the contract decoder. Sorry. Don't worry, they'll be grouped with the others; I'll write a document later describing all the breaking changes at once. Anyway, if you want to know about the breaking changes in this PR specifically, just read the section where I talk about the interface (I've bolded its beginning and end).

Background: The contract decoder has long included functionality for decoding transactions and events. However, this functionality was delegated to web3, and thereby to ethers. Since we now have a new decoder output format, however, and the ethers decoder has various problems, it was decided to bring transaction and event decoding in-house. Now, the contract decoder's transaction and event decoding is done internally by Truffle, and the result uses our new output format. It also expands on the capabilities of ethers in several ways.

In addition to these upgrades to the contract decoder, also added is what I'm calling the "wire decoder". Whereas the contract decoder works only for a specific contract, the wire decoder will handle any transaction or event in the context of the project. (The contract decoder will throw an exception if you ask it to decode an event whose address does not match the contract's address, a message call whose to does not match the contract's address, or a contract creation that did not create that contract.)

Note that the wire decoder, while it works, is, IMO, not quite ready for prime-time, as they say. The reason is that currently it works like the contract decoder: Based on definitions. Since it's just doing ABI decoding, ideally it -- as well as the contract decoder's corresponding functions -- ought to be able to work with just ABI information, rather than definitions. Definitions have the particular problem that, in addition to possibly being missing, they may run into serious problems if the project was not compiled all at once. (The contract decoder already has this problem, so it's not new there.) I'm intending to address this in a later PR. But for now, while you can continue using the contract decoder where you already use it, I'm not sure I would recommend using the wire decoder quite yet.

Also note that currently there's a bunch of duplicated code between the wire decoder and the contract decoder. I've decided to just save factoring that for later, sorry.

So, let's start by talking about the interface.

First off, there's no more need to call decoder.init(). This is now handled in forContract() (or forProject(), for the wire decoder). However, correspondingly, forContract() is now async. So you'll have to await that. (I've also made forProject() async for consistency, even though it doesn't actually need to be.)

Next, let's look at the relevant functions -- decodeTransaction, decodeLog, and events. (There's also decodeLogs, but of course that works similarly to decodeLog.) An important interface change -- all of these are now async, so be sure to account for that!

The decodeTransaction function, as before, accepts a web3 Transaction object. It returns a DecodedTransaction object, which consists of the same contents but now with a decoding field. The decoding field holds an object, which comes in one of four forms. There is always a kind field to distinguish the cases.

  1. If the transaction is a message call to a contract calling one of its external functions, kind will be equal to "function". There will also be the fields class (a ContractType object corresponding to the contract class that was called), name (the name of the function called), selector (the selector of the function called), and, most importantly, arguments. The arguments field is an array of objects; each object has a field value, which holds a decoder Result for that argument, and optionally name, for the name of that argument. (The reason name may be omitted is that it is legal for arguments to be nameless.)

  2. If instead it's a constructor call, kind will be equal to "constructor". There is still the class field, but there is no name, and in place of selector there is bytecode, which is the bytecode of the constructor excluding the arguments. Finally of course there is arguments as before. In the future I'm hoping also to include information about what libraries the constructor has been linked to, but that isn't included at present.

  3. If it's a message call but doesn't match the ABI, kind will be equal to "fallback". In this case we can't really decode; class will still be present, but there will be no arguments. There will be data, which is just another copy of the binary data sent (this is kind of unnecessary, but I thought I'd include it).

  4. Finally, if we can't even tell what class is being called (or created), then kind will be "unknown", and there will be no other information included. This is basically the decoding failure case.

(By the way, yes you can decode library creation transactions, although seeing as these have no arguments, there's not much to decode. You cannot decode other transactions sent to libraries, as libraries do not expose an ABI other than for events; if you try the result will register as "fallback". I don't consider this a problem because you shouldn't be sending transactions to libraries, and if the library was compiled with Solidity 0.4.20 or later, it will just reject your transaction anyway. Anyway, contract creations are the only transactions that the decoder will handle without having a corresponding ABI entry, and those will work even for libraries.)

OK, what about decodeLog? This works similarly in that it takes a web3 Log object as input, and returns a DecodedLog, which has the same information but with a decodings field added. Note that plural -- a log may have multiple decodings! The decodings field is an array of decoding objects. (So, in case of decoding failure, there is not an "unknown" case; rather decodings will simply be empty.)

Each decoding object can have one of two forms; as before, the kind field distinguishes. The two cases are:

  1. The decoding is for a non-anonymous event; in this case, kind is equal to "event". There are, as before, the fields class, arguments, name, and selector. The argument objects are slightly different from in the transaction case in that there will also be an indexed field.

  2. The decoding is for an anonymous event; in this case, kind is equal to "anonymous". The only difference between this case and the previous one is the lack of a selector field.

Before we move on to the events function, it's worth making a few additional notes here about the output format of event decoding:

  1. Indexed reference parameters cannot be decoded. If there's an indexed reference parameter, the Result for it will be an ErrorResult. Sorry. The ErrorResult will contain the raw information, at least.

  2. Event decoding is done in "strict mode", meaning that a potential decoding for an event will not be included unless it has been encoded strictly correctly; any errors will cause that potential decoding to be skipped. This means that, other than the above exception for indexed reference parameters, all decoded event arguments should hold a Value, not an ErrorResult (and this applies to any Results contained within these values as well).

  3. You may be wondering whether, in case of multiple decodings, these multiple decodings occur in any particular order. The answer is yes (to some extent). If there is a decoding corresponding to a non-anonymous event originating from the contract itself (there can be at most one of these), it will always be first. Following that will be any decodings corresponding to non-anonymous events originating from libraries. Following that will be decodings corresponding to anonymous events originating from the contract itself, and following that will be anonymous events originating from libraries.

OK, finally let's describe how to use the events function. This one has an additional breaking change, in that now, instead of taking positional arguments, it takes just a single options argument. (If you were calling it without arguments, at least, it will continue to work the same.) The three options that can be passed are name (meaning you only want events with a specified name; leave this out to not filter by name), fromBlock, and toBlock (note that these latter two default to "latest", so you'll have to change these if you want older events.)

As you might expect, it returns an array of DecodedLogs, in the order those events occurred. The thing to note here is that if the name option is passed, this has two effects. First, each decodings array will be filtered by name; only decodings matching the specified name will be included. Secondly, the array of decoded logs itself will be filtered; any decoded log with no decodings matching the specified name will be excluded entirely. (Note that if name is not passed, the return value can contain logs with an empty decodings array.)

OK, that's it for the interface changes! If you don't want to know about the internals, you can stop reading here.

So -- how does all this work?

Well, first we need to discuss a few renamings. Because it now handles encoding as well (spoilers!), truffle-decoder-core is now truffle-codec, and truffle-decode-utils is now truffle-codec-utils. Also, because we now handle multiple ABI-encoded location, all the stuff that previously referred to "calldata" in its name (or filename) now refers to "abi". So e.g. decodeCalldata is now decodeAbi, getCalldataAllocations is now getAbiAllocations, etc. Be careful not to be confused by this.

Anyway, let's start with allocation! In addition to storage, memory, and ABI allocations, there's now calldata allocation (that means something different now!) and event allocations. These latter two are handled in the same file as ABI allocations, though, since they're still ABI-related.

Calldata allocations are indexed first by contract ID and then by selector (with a separate field for the constructor allocation). They contain allocation information for the arguments, but they also include an overall offset indicating how far into the data the arguments start. (This is always 4 for functions, but it can obviously vary for constructors.) This is necessary for decoding any dynamic arguments, as those will be given as pointers relative to this offset.

Event allocations are a bit complicated. They're indexed first by number of topics; we then split into anonymous and non-anonymous cases. So non-anonymous events are indexed overall first by number of topics, then selector, then contract kind, then contract ID. Whereas anonymous events are indexed overall first by number of topics, then contract kind, then contract ID, and then if there are multiple they just go in a big array. Event allocations are similar to calldata allocations, but there's no offset (since it's always zero), and there's a contractId field telling us which contract it came from (since that information can't be found in the definition).

(Note: I'm hoping soon to remove all these definitions from all the allocations, but I'm going to save that for a future PR. For now allocations continue to contain definitions.)

This is probably also the time to mention that there's been a change to the pointer format so that they play better with TypeScript. Pointers now all have a location field indicating what location they point to. (Pointers returned by the ABI allocator, which has to work for multiple locations, have the generic location of "abi", but these pointers can't be passed to the decoder.) The rest of the pointer format has been appropriately adjusted (although storage pointers still rely on Range objects; I chose not to mess with that for now, even though that's probably a bit of unnecessary complication now).

I'll skip describing the actual process of allocation, as it's mostly nothing new, but one thing worth noting is that the calldata and event allocators, in order to perform allocation, start with the ABI entry for the function or event being allocated, and then search through contracts to find the corresponding definition, which they then use for the allocation process. Note that for functions and events, the allocator must search not only the contract itself, but also its parent contracts (in order, obviously, to find the right one in case of multiple inheritance). For constructors, of course, the allocator does not search parent contracts. However for constructors there's another little complication, in that default constructors don't appear in the ABI at all, so the allocator starts by giving the constructor a default allocation, and then later overwrites it if it finds a constructor entry in the ABI.

(By the way, also note that we now do computation of selectors ourself, rather than relying on web3; the versioning had simply gotten to be too much of a hassle.)

Anyway, that's allocation, how does decoding work?

Calldata decoding is pretty straightforward. The decodeCalldata function (not the old function by that name, but the new function, which lives alongside decodeVariable and decodeEvent in lib/interface/decoding.ts) takes just an EvmInfo argument; the calldata itself to decode is (of course) put in the calldata section of the state. Note however that that EvmInfo should contain a currentContext giving a context argument for the contract being called or created -- the caller is responsible for passing this in! If it's not included, an unknown decoding will be returned.

(Why is this left up to the caller, when decodeEvent doesn't make the caller do this work, and instead just has an address argument? Basically, because of the possibility that the context is a constructor, which doesn't fit very well into how the decoder core does things at the moment. I am thinking of chanigng this in the future, though. In any case, at the moment both the contract decoder and the wire decoder do this work to determine the context before calling decodeCalldata, so it all works.)

Anyway, decodeCalldata determines whether it's looking at a constructor call or function call based on currentContext, looks up the constructor allocation for the context in the former case and the function allocation (by selector) in the latter case, and then uses that allocation to decode the arguments. Pretty straightforward.

(By the way note of course that decodeCalldata and decodeEvent are generators, like decodeVariable. However the functions that call them at presently don't bother to handle the storage request case since that should never occur.)

Event decoding is where it gets messy. The decodeEvent function takes three arguments: info (which in particular includes the data to decode in the eventdata and eventtopics part of the state), address (the address the event came from, or appeared to come from), and, optionally targetName (if included this will skip any attempts to produce decodings that wouldn't yield the target name; obviously including this parameter in the low-level interface wasn't strictly necessary, but I figured speeding up decoding is a good thing, right?).

So, to summarize, decodeEvent first gathers the allocation objects for all possible events this could correspond to, whether from the contract at the given address or from a library, whether non-anonymous or anonymous. It does this by looking at the number of topics, the first topic (if present), and, as mentioned, what address it appears to have come from. Once it's gathered all that up, it's time to try them one by one, and see which ones produce valid decodings! (Note that it tries them in the order I described in the interface section above.)

How do we tell if a given possible allocation works? Well, we attempt to decode, and see if anything goes wrong. To do this we run the decoder in "strict mode", a new option. In strict mode, if the decoder encounters a problem, it won't return an ErrorResult; instead, it'll throw a StopDecodingError, telling us to skip over that possibility. (The one exception is the new error for indexed reference parameters; those do not stop the decoder for obvious reasons.) Also, strict mode turns on a check for overlong arrays or strings (or bytestrings). Note that I've written this check pretty crudely; it's not intended to actually catch all overlong arrays or strings, just ridiculously long ones so that trying an incorrect allocation doesn't DOS the decoder. But don't worry, other overlong arrays and strings will still be caught by another means!

(Let me pause here to talk about decoder options more generally -- now, for the core decode functions, any arguments other than the standard dataType, pointer, info are passed as a single options argument. The possible options are permissivePadding (which already existed), strictAbiMode (just described), abiPointerBase (the old base argument, now as an option), and memoryVisited (not actually used currently but will be in the future for circularity detection).)

Anyway, as I was saying, strict mode still isn't enough to tell us whether something is encoded strictly correctly according to a given allocation. So if an allocation has passed the trials of strict mode, and the decoder has produced a result for it, it must then face... the encoder! Having successfully decoded (or so we think), the resulting decoding (for the non-indexed parameters -- for indexed parameters, strict mode, together with the topic count check at the beginning, should be sufficient to catch any problems) is then fed back through the encoder to check whether it matches the original data. Only if we have a match do we push a new entry onto our array of decodings!

That's right, we have an encoder now! Thankfully it's a lot simpler than the decoder. Note that it doesn't really have an interface yet; that will be added in the future, so that our encoder can replace our use of the ethers encoder, but for now it's purely used for checking event decodings. Right now it only accepts Results as input.

Unfortunately, due to TypeScript not being as good at type inference as I thought, the encoder contains a bunch of type coercions. Oh well. But anyway it's still pretty simple. It has two main functions -- encodeAbi, which takes a single Result and encodes it; and encodeTupleAbi, which takes an array of Results and encodes them as a tuple. (These obviously both call each other; encodeTupleAbi needs to know how the individual components are encoded, and encodeAbi needs to call encodeTupleAbi in order to encode arrays and structs.) Also note that these functions use the ABI allocations (which they take as an argument) in order to simplify the process of encoding.

Also note that these functions can return either a Uint8Array or undefined (don't worry, this is accounted for). Why can they return undefined? This is what you'll get if you pass in either a value that can't be ABI-encoded (such as a mapping or internal function), or if you pass in an ErrorResult. Errors can't be encoded!

Anyway, it's possible I've forgotten something, this is a big PR, but I think that basically sums this up! Hopefully soon both the ABI encoder and wire decoder will have more complete and robust interfaces, and we'll be able to ditch all reliance on the ethers encoder and decoder!

haltman-at added 30 commits June 3, 2019 22:43
haltman-at and others added 2 commits July 15, 2019 20:03
Redo output format with interfaces instead of classes (also some bugfixes)
@haltman-at
Copy link
Contributor Author

Note: This PR has had #2180 merged into it, so make sure to also read the description and comments there.

@haltman-at
Copy link
Contributor Author

Note: This PR has now had PR #2232 merged into it, so be sure to see the comments there, too.

@haltman-at
Copy link
Contributor Author

haltman-at commented Jul 25, 2019

OK, three more changes:

  1. In the contract decoder, the balance and nonce fields in the state have been renamed balanceAsBN and nonceAsBN (note: this is another breaking change).

  2. For bools and enums there is no longer a padding error separate from the out of range error; these are now all just treated as out of range errors.

  3. In ABI decodings, the name field has been replaced with an abi field containing the ABI entry. Note that for default constructors and fallback functions, this is generated rather than read from the ABI (since it's not in the ABI). Also, to make this work, calldata and event allocations now include the ABI entry as well as the definition. As I've said before, I'm going to get rid of the definitions later, but for now they're still there.

Copy link
Contributor

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I pretty much read every line. most of this I'm just going to have to take your word for. well, much of it.

packages/truffle-codec-utils/package.json Show resolved Hide resolved
packages/truffle-codec-utils/src/abi.ts Outdated Show resolved Hide resolved
packages/truffle-codec-utils/src/abi.ts Show resolved Hide resolved
packages/truffle-codec-utils/src/definition.ts Outdated Show resolved Hide resolved
packages/truffle-codec/lib/types/evm.ts Outdated Show resolved Hide resolved
packages/truffle-codec/lib/types/pointer.ts Show resolved Hide resolved
packages/truffle-core/lib/debug/printer.js Show resolved Hide resolved
@haltman-at
Copy link
Contributor Author

OK I've renamed the "fallback" decoding case to "message". I should probably change how the ABI returned in the decoding works in that case, but, well, I think I'm going to save that for a separate PR.

@haltman-at haltman-at requested a review from gnidan August 16, 2019 17:32
Copy link
Contributor

@gnidan gnidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergetown!

@haltman-at haltman-at merged commit 9666431 into next Aug 21, 2019
@gnidan gnidan deleted the wire-decoder branch November 23, 2020 00:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants