-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deterministic handling of adversarial code calling console.log with a Proxy #1852
Comments
OMG this is a good explanation of how "non-determinism" is relative to each layer of abstraction! I did not have words for this before. |
endojs/endo#487 is for the SES and |
#2519 is about adding a "consensus mode" flag to swingset (maybe as part of |
@michaelfig recently landed #4364 which makes the |
@warner, I'm reexamining this discussion, and I believe I have a counterpoint to the following:
I would argue that debugging information can be useful, and having its availability doesn't commit us to doing exactly the same things any more or less than returning immediately. Indeed, the fact that a given vat worker does anything different than another type should be accommodated, and speaks nothing about what that vat worker should or should not do. With do-nothing logging, debugging a chain node looks like this:
With do-something logging, debugging a chain node looks like this:
This do-something approach is the same used by Cosmos everywhere: an archive node, a validator, a follower, a seed node all keep consensus, so you don't have to worry about turning those features on or off. I would argue we should do the same. I do agree that we should also work on improving the debugging experience, but I think this situation has made day-to-day debugging more difficult than it needs to be. |
Hazards to worry about: GC decisions will almost certainly diverge Snapshots are likely to diverge Metering may diverge. If we're careful, perhaps metering should not diverge. But we have not yet tested it against divergent gc decisions, much less console changes. Divergent metering will cause meter exhaustion to be outside consensus. |
I think @michaelfig's point is that if we always process the log call, there is no source of divergence anymore. We pay a cost upfront to remove the complexity for nodes which wish to print the vat logs. |
@michaelfig , is this done with the new |
Yes, actually it was done for the old stuff, too. I'll close this issue. |
What is the Problem Being Solved?
@erights is close to landing a
console
wrapper into SES (endojs/endo#440 and endojs/endo#447), after which callinglockdown()
will change the behavior ofError
andconsole.log
. In the new behavior, creating anError
will stash additional stack trace information (including deep stacks) in a private WeakMap where it cannot be seen by code that merely has theError
object. but the newconsole.log
does get access to that stash, so when youconsole.log(error)
, your stdout will get the hidden details. This makes debug information (which is not necessarily deterministic, and probably exposes internal details of everyone on the call stack) unobservable to user-level code.We generally treat
console.log
as a write-only channel: it is safe to whisper your internal secrets to the log, because only a system-level debugger tool can see them. User-level code should not be able to tell whatconsole.log
is doing: it might hold on to the objects to be rendered later, it might convert them into strings immediately, or it might just ignore them outright.However, sneaky user-level code that submits a
Proxy
(or an object with accessor properties) toconsole.log
can distinguish between these cases, by watching for their properties to be read.This introduces a read channel by which user-level code can "read" data out of the kernel (or whatever is providing the supposedly-write-only
console
object). We need to prevent this from enabling non-deterministic behavior in the user-level code.The way to think about this is that each layer of our platform is required to be a deterministic computation of some specified set of input data. Vat behavior must be a deterministic function of the transcript inputs, so that we can achieve orthogonal persistence by replaying the transcript. A swingset kernel that lives in a Cosmos-SDK blockchain (specifically the data and messages it publishes through Cosmos) must be a deterministic function of the transaction messages it receives from Cosmos, to enable replicated consensus validation.
When one layer has access to data that is not part of the specified input of the next-(weaker-) layer up, that data represents a source of non-determinism for the upper layer. The lower layer can use that data for its own purposes just fine, but it must carefully defend against allowing this data to leak into the upper layer.
console.log()
has no return value, but if the caller can sense something about the implementation, then the "data" being read is that knowledge about what the implementation chooses to do. And if those choices are based upon inputs that are not supposed to be available to the vat layer, this would enable the vat to behave in ways that are non-deterministic relative to the suposed vat inputs.The pattern for defending against this is very similar to hiding secrets from objects outside some boundary, but more subtle. Any observable difference in behavior would leak the non-determinism.
Proxy
and accessor properties increase the upper layer's ability to observe behavior in the lower layer.on-chain console.log should be a immediate NOP
@erights argued that when a SwingSet is running on-chain, the
console
object we give to vats should immediately return without touching its arguments. Anything else might reveal to the vat if/when the lower-layer code does something with those arguments, and commits us to doing exactly the same set of accesses in all consensus-critical situations. For example, a validator running their node with logging turned on would allow vats to behave differently than on a node with logging turned off, which would allow vats to violate consensus and cause slashing for some subset of the validators.What, then, is the point of leaving
console.log
statements in vat code? His idea is clever: we use deterministic replay of the vats in a separate, non-consensus environment, in whichconsole.log
actually does something. I've been working (#1359) on a scheme to gather enough data from the running chain to let us retroactively replay one or more vats on a local machine (under a debugger). We made vats deterministically replayable to facilitate orthogonal persistence (replaying the transcript in a very similar environment to the original), but a lovely side-effect is that we can replay the transcript in a different environment too: under a different JS engine, with a debugger attached, or with more logging enabled.So the idea is that our chain nodes (validators) route all
console
methods to a stub that returns immediately without ever examining the arguments (so noProxy
hooks get triggered). But we run an extra non-voting follower node, which does haveconsole.log
turned on. We record the console messages it emits (probably through theslog.makeConsole
object, which addsvatID
anddeliveryNum
so we can correlate them with which message is being processed), and publish the results in a block-explorer -style tool.The vat running in the follower node might be adversarial and use a
Proxy
to discover that it is running in this different environment, and could choose behave differently than when on-chain. We can compare some amount of behavior (syscalls) against what happened on the chain, to limit the deviation. But the vats running in the validator nodes will all have access to the same data (nothing), so they'll all behave the same, maintaining consensus, even adversarial vats which are trying to sense the kind of machine they're running on.The uptake is that we'll give off-chain vats access to non-determinism via the observable treatment of objects passed into the
console
methods. Our programming style rule is "don't do that", but we only enforce this by withholding the non-determinism when running in the consensus-critical environment.Other channels
We need a comprehensive analysis of all points of interaction between the vat and the host it runs in. Everything that crosses this boundary is a potential source of non-determinism, and we must specify exactly what can be observed from the vat and what cannot.
All vat syscalls are defined in terms of a data-only API, which ought to limit how sneaky the vat can be. Some vat workers live in separate (unix) processes entirely, and thus can only communicate with the kernel through a data pipe. But others share a process with the kernel, which means a non-data object might make it far enough into the kernel to sense how it gets accessed. For example, the vat might submit a Proxy as the capdata
args
argument tosyscall.send
, and monitor if/when theslots
property is accessed.Even for vats in separate processes, they interact with local code (like liveslots, or the code that serializes syscalls into messages to send over the pipe) which are outside the vat boundary. We must make sure this local code does not leak nondeterminisim into a vat.
Security Considerations
This is all about security. In particular, vat code must not be able to cause validator slashing or consensus faults (at least not when running on a chain). To achieve this, vat code (in a chain) must not be able to sense anything that isn't part of the execution model (which is what all the other validators are running and comparing against).
The text was updated successfully, but these errors were encountered: