-
Notifications
You must be signed in to change notification settings - Fork 2.3k
The wire decoder (in-house transaction and event decoding) #2158
Conversation
Redo output format with interfaces instead of classes (also some bugfixes)
Note: This PR has had #2180 merged into it, so make sure to also read the description and comments there. |
Factor out wire decoder from contract decoder
Note: This PR has now had PR #2232 merged into it, so be sure to see the comments there, too. |
OK, three more changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I pretty much read every line. most of this I'm just going to have to take your word for. well, much of it.
OK I've renamed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mergetown!
So, first off, @davidmurdoch, @adrianmcli, this PR features yet more breaking changes to the contract decoder. Sorry. Don't worry, they'll be grouped with the others; I'll write a document later describing all the breaking changes at once. Anyway, if you want to know about the breaking changes in this PR specifically, just read the section where I talk about the interface (I've bolded its beginning and end).
Background: The contract decoder has long included functionality for decoding transactions and events. However, this functionality was delegated to
web3
, and thereby toethers
. Since we now have a new decoder output format, however, and theethers
decoder has various problems, it was decided to bring transaction and event decoding in-house. Now, the contract decoder's transaction and event decoding is done internally by Truffle, and the result uses our new output format. It also expands on the capabilities ofethers
in several ways.In addition to these upgrades to the contract decoder, also added is what I'm calling the "wire decoder". Whereas the contract decoder works only for a specific contract, the wire decoder will handle any transaction or event in the context of the project. (The contract decoder will throw an exception if you ask it to decode an event whose address does not match the contract's address, a message call whose
to
does not match the contract's address, or a contract creation that did not create that contract.)Note that the wire decoder, while it works, is, IMO, not quite ready for prime-time, as they say. The reason is that currently it works like the contract decoder: Based on definitions. Since it's just doing ABI decoding, ideally it -- as well as the contract decoder's corresponding functions -- ought to be able to work with just ABI information, rather than definitions. Definitions have the particular problem that, in addition to possibly being missing, they may run into serious problems if the project was not compiled all at once. (The contract decoder already has this problem, so it's not new there.) I'm intending to address this in a later PR. But for now, while you can continue using the contract decoder where you already use it, I'm not sure I would recommend using the wire decoder quite yet.
Also note that currently there's a bunch of duplicated code between the wire decoder and the contract decoder. I've decided to just save factoring that for later, sorry.
So, let's start by talking about the interface.
First off, there's no more need to call
decoder.init()
. This is now handled inforContract()
(orforProject()
, for the wire decoder). However, correspondingly,forContract()
is now async. So you'll have toawait
that. (I've also madeforProject()
async for consistency, even though it doesn't actually need to be.)Next, let's look at the relevant functions --
decodeTransaction
,decodeLog
, andevents
. (There's alsodecodeLogs
, but of course that works similarly todecodeLog
.) An important interface change -- all of these are nowasync
, so be sure to account for that!The
decodeTransaction
function, as before, accepts a web3Transaction
object. It returns aDecodedTransaction
object, which consists of the same contents but now with adecoding
field. The decoding field holds an object, which comes in one of four forms. There is always akind
field to distinguish the cases.If the transaction is a message call to a contract calling one of its external functions,
kind
will be equal to"function"
. There will also be the fieldsclass
(aContractType
object corresponding to the contract class that was called),name
(the name of the function called),selector
(the selector of the function called), and, most importantly,arguments
. Thearguments
field is an array of objects; each object has a fieldvalue
, which holds a decoderResult
for that argument, and optionallyname
, for the name of that argument. (The reasonname
may be omitted is that it is legal for arguments to be nameless.)If instead it's a constructor call,
kind
will be equal to"constructor"
. There is still theclass
field, but there is noname
, and in place ofselector
there isbytecode
, which is the bytecode of the constructor excluding the arguments. Finally of course there isarguments
as before. In the future I'm hoping also to include information about what libraries the constructor has been linked to, but that isn't included at present.If it's a message call but doesn't match the ABI,
kind
will be equal to"fallback"
. In this case we can't really decode;class
will still be present, but there will be noarguments
. There will bedata
, which is just another copy of the binary data sent (this is kind of unnecessary, but I thought I'd include it).Finally, if we can't even tell what class is being called (or created), then
kind
will be"unknown"
, and there will be no other information included. This is basically the decoding failure case.(By the way, yes you can decode library creation transactions, although seeing as these have no arguments, there's not much to decode. You cannot decode other transactions sent to libraries, as libraries do not expose an ABI other than for events; if you try the result will register as
"fallback"
. I don't consider this a problem because you shouldn't be sending transactions to libraries, and if the library was compiled with Solidity 0.4.20 or later, it will just reject your transaction anyway. Anyway, contract creations are the only transactions that the decoder will handle without having a corresponding ABI entry, and those will work even for libraries.)OK, what about
decodeLog
? This works similarly in that it takes a web3Log
object as input, and returns aDecodedLog
, which has the same information but with adecodings
field added. Note that plural -- a log may have multiple decodings! Thedecodings
field is an array of decoding objects. (So, in case of decoding failure, there is not an"unknown"
case; ratherdecodings
will simply be empty.)Each decoding object can have one of two forms; as before, the
kind
field distinguishes. The two cases are:The decoding is for a non-anonymous event; in this case,
kind
is equal to"event"
. There are, as before, the fieldsclass
,arguments
,name
, andselector
. The argument objects are slightly different from in the transaction case in that there will also be anindexed
field.The decoding is for an anonymous event; in this case,
kind
is equal to"anonymous"
. The only difference between this case and the previous one is the lack of aselector
field.Before we move on to the
events
function, it's worth making a few additional notes here about the output format of event decoding:Indexed reference parameters cannot be decoded. If there's an indexed reference parameter, the
Result
for it will be anErrorResult
. Sorry. TheErrorResult
will contain the raw information, at least.Event decoding is done in "strict mode", meaning that a potential decoding for an event will not be included unless it has been encoded strictly correctly; any errors will cause that potential decoding to be skipped. This means that, other than the above exception for indexed reference parameters, all decoded event arguments should hold a
Value
, not anErrorResult
(and this applies to anyResult
s contained within these values as well).You may be wondering whether, in case of multiple decodings, these multiple decodings occur in any particular order. The answer is yes (to some extent). If there is a decoding corresponding to a non-anonymous event originating from the contract itself (there can be at most one of these), it will always be first. Following that will be any decodings corresponding to non-anonymous events originating from libraries. Following that will be decodings corresponding to anonymous events originating from the contract itself, and following that will be anonymous events originating from libraries.
OK, finally let's describe how to use the
events
function. This one has an additional breaking change, in that now, instead of taking positional arguments, it takes just a singleoptions
argument. (If you were calling it without arguments, at least, it will continue to work the same.) The three options that can be passed arename
(meaning you only want events with a specified name; leave this out to not filter by name),fromBlock
, andtoBlock
(note that these latter two default to"latest"
, so you'll have to change these if you want older events.)As you might expect, it returns an array of
DecodedLog
s, in the order those events occurred. The thing to note here is that if thename
option is passed, this has two effects. First, eachdecodings
array will be filtered by name; only decodings matching the specified name will be included. Secondly, the array of decoded logs itself will be filtered; any decoded log with no decodings matching the specified name will be excluded entirely. (Note that ifname
is not passed, the return value can contain logs with an emptydecodings
array.)OK, that's it for the interface changes! If you don't want to know about the internals, you can stop reading here.
So -- how does all this work?
Well, first we need to discuss a few renamings. Because it now handles encoding as well (spoilers!),
truffle-decoder-core
is nowtruffle-codec
, andtruffle-decode-utils
is nowtruffle-codec-utils
. Also, because we now handle multiple ABI-encoded location, all the stuff that previously referred to "calldata" in its name (or filename) now refers to "abi". So e.g.decodeCalldata
is nowdecodeAbi
,getCalldataAllocations
is nowgetAbiAllocations
, etc. Be careful not to be confused by this.Anyway, let's start with allocation! In addition to storage, memory, and ABI allocations, there's now calldata allocation (that means something different now!) and event allocations. These latter two are handled in the same file as ABI allocations, though, since they're still ABI-related.
Calldata allocations are indexed first by contract ID and then by selector (with a separate field for the constructor allocation). They contain allocation information for the arguments, but they also include an overall
offset
indicating how far into the data the arguments start. (This is always4
for functions, but it can obviously vary for constructors.) This is necessary for decoding any dynamic arguments, as those will be given as pointers relative to this offset.Event allocations are a bit complicated. They're indexed first by number of topics; we then split into anonymous and non-anonymous cases. So non-anonymous events are indexed overall first by number of topics, then selector, then contract kind, then contract ID. Whereas anonymous events are indexed overall first by number of topics, then contract kind, then contract ID, and then if there are multiple they just go in a big array. Event allocations are similar to calldata allocations, but there's no offset (since it's always zero), and there's a
contractId
field telling us which contract it came from (since that information can't be found in the definition).(Note: I'm hoping soon to remove all these definitions from all the allocations, but I'm going to save that for a future PR. For now allocations continue to contain definitions.)
This is probably also the time to mention that there's been a change to the pointer format so that they play better with TypeScript. Pointers now all have a
location
field indicating what location they point to. (Pointers returned by the ABI allocator, which has to work for multiple locations, have the generic location of"abi"
, but these pointers can't be passed to the decoder.) The rest of the pointer format has been appropriately adjusted (although storage pointers still rely onRange
objects; I chose not to mess with that for now, even though that's probably a bit of unnecessary complication now).I'll skip describing the actual process of allocation, as it's mostly nothing new, but one thing worth noting is that the calldata and event allocators, in order to perform allocation, start with the ABI entry for the function or event being allocated, and then search through contracts to find the corresponding definition, which they then use for the allocation process. Note that for functions and events, the allocator must search not only the contract itself, but also its parent contracts (in order, obviously, to find the right one in case of multiple inheritance). For constructors, of course, the allocator does not search parent contracts. However for constructors there's another little complication, in that default constructors don't appear in the ABI at all, so the allocator starts by giving the constructor a default allocation, and then later overwrites it if it finds a
constructor
entry in the ABI.(By the way, also note that we now do computation of selectors ourself, rather than relying on
web3
; the versioning had simply gotten to be too much of a hassle.)Anyway, that's allocation, how does decoding work?
Calldata decoding is pretty straightforward. The
decodeCalldata
function (not the old function by that name, but the new function, which lives alongsidedecodeVariable
anddecodeEvent
inlib/interface/decoding.ts
) takes just anEvmInfo
argument; the calldata itself to decode is (of course) put in the calldata section of the state. Note however that thatEvmInfo
should contain acurrentContext
giving acontext
argument for the contract being called or created -- the caller is responsible for passing this in! If it's not included, anunknown
decoding will be returned.(Why is this left up to the caller, when
decodeEvent
doesn't make the caller do this work, and instead just has anaddress
argument? Basically, because of the possibility that the context is a constructor, which doesn't fit very well into how the decoder core does things at the moment. I am thinking of chanigng this in the future, though. In any case, at the moment both the contract decoder and the wire decoder do this work to determine the context before callingdecodeCalldata
, so it all works.)Anyway,
decodeCalldata
determines whether it's looking at a constructor call or function call based oncurrentContext
, looks up the constructor allocation for the context in the former case and the function allocation (by selector) in the latter case, and then uses that allocation to decode the arguments. Pretty straightforward.(By the way note of course that
decodeCalldata
anddecodeEvent
are generators, likedecodeVariable
. However the functions that call them at presently don't bother to handle the storage request case since that should never occur.)Event decoding is where it gets messy. The
decodeEvent
function takes three arguments:info
(which in particular includes the data to decode in theeventdata
andeventtopics
part of the state),address
(the address the event came from, or appeared to come from), and, optionallytargetName
(if included this will skip any attempts to produce decodings that wouldn't yield the target name; obviously including this parameter in the low-level interface wasn't strictly necessary, but I figured speeding up decoding is a good thing, right?).So, to summarize,
decodeEvent
first gathers the allocation objects for all possible events this could correspond to, whether from the contract at the given address or from a library, whether non-anonymous or anonymous. It does this by looking at the number of topics, the first topic (if present), and, as mentioned, what address it appears to have come from. Once it's gathered all that up, it's time to try them one by one, and see which ones produce valid decodings! (Note that it tries them in the order I described in the interface section above.)How do we tell if a given possible allocation works? Well, we attempt to decode, and see if anything goes wrong. To do this we run the decoder in "strict mode", a new option. In strict mode, if the decoder encounters a problem, it won't return an
ErrorResult
; instead, it'll throw aStopDecodingError
, telling us to skip over that possibility. (The one exception is the new error for indexed reference parameters; those do not stop the decoder for obvious reasons.) Also, strict mode turns on a check for overlong arrays or strings (or bytestrings). Note that I've written this check pretty crudely; it's not intended to actually catch all overlong arrays or strings, just ridiculously long ones so that trying an incorrect allocation doesn't DOS the decoder. But don't worry, other overlong arrays and strings will still be caught by another means!(Let me pause here to talk about decoder options more generally -- now, for the core
decode
functions, any arguments other than the standarddataType
,pointer
,info
are passed as a singleoptions
argument. The possible options arepermissivePadding
(which already existed),strictAbiMode
(just described),abiPointerBase
(the oldbase
argument, now as an option), andmemoryVisited
(not actually used currently but will be in the future for circularity detection).)Anyway, as I was saying, strict mode still isn't enough to tell us whether something is encoded strictly correctly according to a given allocation. So if an allocation has passed the trials of strict mode, and the decoder has produced a result for it, it must then face... the encoder! Having successfully decoded (or so we think), the resulting decoding (for the non-indexed parameters -- for indexed parameters, strict mode, together with the topic count check at the beginning, should be sufficient to catch any problems) is then fed back through the encoder to check whether it matches the original data. Only if we have a match do we push a new entry onto our array of decodings!
That's right, we have an encoder now! Thankfully it's a lot simpler than the decoder. Note that it doesn't really have an interface yet; that will be added in the future, so that our encoder can replace our use of the
ethers
encoder, but for now it's purely used for checking event decodings. Right now it only acceptsResult
s as input.Unfortunately, due to TypeScript not being as good at type inference as I thought, the encoder contains a bunch of type coercions. Oh well. But anyway it's still pretty simple. It has two main functions --
encodeAbi
, which takes a singleResult
and encodes it; andencodeTupleAbi
, which takes an array ofResult
s and encodes them as a tuple. (These obviously both call each other;encodeTupleAbi
needs to know how the individual components are encoded, andencodeAbi
needs to callencodeTupleAbi
in order to encode arrays and structs.) Also note that these functions use the ABI allocations (which they take as an argument) in order to simplify the process of encoding.Also note that these functions can return either a
Uint8Array
orundefined
(don't worry, this is accounted for). Why can they returnundefined
? This is what you'll get if you pass in either a value that can't be ABI-encoded (such as a mapping or internal function), or if you pass in anErrorResult
. Errors can't be encoded!Anyway, it's possible I've forgotten something, this is a big PR, but I think that basically sums this up! Hopefully soon both the ABI encoder and wire decoder will have more complete and robust interfaces, and we'll be able to ditch all reliance on the
ethers
encoder and decoder!