Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Deprecation and Removal of Substrate Native Runtime Optimization #7288

Open
pepyakin opened this issue Oct 9, 2020 · 21 comments
Open

Deprecation and Removal of Substrate Native Runtime Optimization #7288

pepyakin opened this issue Oct 9, 2020 · 21 comments
Labels
J1-meta A specific issue for grouping tasks or bugs of a specific category.

Comments

@pepyakin
Copy link
Contributor

pepyakin commented Oct 9, 2020

With this, I want to start a design discussion about a potential major change for substrate, specifically deprecation and removal of the Substrate Native Runtime Optimization and instead rely exclusively on wasm for the runtime execution. It is not a call to action, nor a design document, but rather a request for comments.

Substrate Native Runtime Optimization is an optimization that we perform by we leverage the fact that the both runtime and the node are written in Rust and with some effort could be cross-compiled.

Basically, we take the runtime Rust source code and use it as a regular dependency to the substrate node. This optimization can lead to more than 2x speedups of the runtime code execution.

This design decision was accepted at the very beginning of Substrate, back then Polkadot, AFAIR.

However, I'd like to argue that this optimization doesn't deliver on the promises. Take the following two aspects

  1. Sync speed.
  2. Transaction throughput

One of the key features of substrate is forkless upgrades. I.e. a chain can update itself without resorting to forking. Typically a healthy chain lives past several runtime upgrades. However, note that the native runtime can be compiled only once.

From point of view of syncing that means that throughout the whole chain history only the part with the runtime version that happened to be compiled into the node will have the speed up. Block production doesn't actually benefit from the native runtime either since wasm is the canonical runtime and validators use that.

Costs

Turns out, the cost we pay for supporting the native runtime is non-negligable. I identify two major groups of costs associated with supporting the native runtime:

  • First group of problems are slight differences between the substrate wasm environment and compiled in.
  • The second group is related to the complexity the substrate native runtime requires.

The first category is essentially leaks of abstractions caused in the process of translating the same high-level code to two very different environments. The second category is complexity we introduce to bridge this gap.

Here are some instances of such differences:

  1. x86_64 vs. wasm32
  2. std vs. no_std
  3. multithreaded (and multitasked) vs. exclusively singlethreaded
  4. panic=unwind vs. panic=abort
  5. shared address space vs. exclusively owned by the sandbox address space

Those might seem small, the significance of those is not to be underestimated though since they still bear the risk of consensus errors.

Let's examine each of these.

memory and allocator

The native runtime essentially has access to unlimited amounts of memory and the allocator doesn't matter for it.

The wasm runtime has access to finite amounts of memory. Moreover, amount of memory available to the wasm runtime is made unpredictable because of the inefficiencies of the allocator.

The amount of memory available doesn't matter as matters the fact that one environment has a sharp limit and another doesn't. Reaching this sharp edge by the wasm runtime is a potential consensus issue.

behavior of mutable globals

From the perspective of wasm, Rust global variables, be it thread_locals or statics or whathave you, are essentially compiled down to be globals in a single-threaded context or thread-locals in multithreaded context.

From the perspective of the native runtime, however, the translation is direct. thread_local will be translated to a thread_local and static global will end up as a static global. That's a problem since the runtime writers would have to be careful and respect the threading aspect.

A more worrying difference though is that the globals in wasm are always restored. I.e. when the wasm runtime receives control it can assume that all globals are initialized to their initial values.

In the native runtime, the behavior depends on the exact type of a global. In case of a thread_local it would be the value the last thread left it in. You better not use the static globals in the native runtime.

word size differences

While we try to avoid any dependencies on the usize in our codebase, the difference can still be observed in some edge cases.

For example. There was a recent event when a person raised a question whether sort and sort_unstable give the same results. There were different answers to this question from different people.

AFAIR, somebody pointed out that sort unstable is using the pattern-defeating quicksort which defeats patterns based on random shuffling and thus cannot be used in the deterministic environment. I was surprised at the time, like, how would it obtain entropy in wasm and at first thought it wasn't a problem.

Then my investigation showed that there is indeed PRNG in action which is seeded deterministically. However, they generate usize using different code paths for 32 bit platforms and 64 bit platforms. I haven't dig deeper to find out whether this actually would lead to a problem, especially considering that we migrated from sort_unstable just in case.

The thing I want to point your attention to is how subtle this difference is and what traitorous trick the libcore played with us here.

A more worrying issue though is that some other person pointed out a that the same results are not guaranteed between platforms. I guess that extends even to the point that different versions of the rustc (or rather libcore) can have different behavior and the compilers do differ between native and wasm runtimes.

panics

The coding guidelines state that panics in runtime must be avoided at all costs. Exploitation of a panic leads to a potential DoS vector. It is not the game over though since there are still additional mitigations in place. For instance, IIUC we ban a peer that sent a panicking transaction to us.

That's however also has its cost. We must compile the Substrate node in panic=unwind. While it doesn't have a direct impact on performance (the mechanism is designed to be zero-cost), it does have all chances to affect the performance indirectly through code-bloat and trashing the icaches.

My very quick and dirty benchmark shows that if you compile the node with panic=abort the syncing will get slightly faster (0.8.24 on rustc 1.48.0-nightly (fc2daaae6 2020-09-28))

Apart from performance, panics actually also suffer from abstraction bleeding. We compile rust code into wasm with panic=abort. This translates into a wasm trap which in turn tears down the instance safely. In native runtime we emulate this behavior by wrapping calls into the native runtime in panic::catch_unwind. A panic raised inside the native runtime will be caught there. Simple, that is in theory.

The first complication is that double-panic aborts. That is, during panic the call stack is unwound to the nearest enclosing catch_unwind walking the stack destructing all values found on stack, potentially calling the Drop implementation if any. If the drop panics then the whole process is brought down at once. It might sound unlikely but this indeed happened.

The second complication raises due to the fact that we seek to present the user like the following:

Version: 0.7.0-3778e05-x86_64-macos

   0: backtrace::backtrace::trace
   1: backtrace::capture::Backtrace::new
   2: sp_panic_handler::set::{{closure}}
   3: std::panicking::rust_panic_with_hook
   4: std::panicking::begin_panic
   5: frame_executive::Executive<System,Block,Context,UnsignedValidator,AllModules,COnRuntimeUpgrade>::execute_block
   ... <snip>
  36: tokio::runtime::context::enter
  37: std::sys_common::backtrace::__rust_begin_short_backtrace
  38: core::ops::function::FnOnce::call_once{{vtable.shim}}
  39: std::sys::unix::thread::Thread::new::thread_start
  40: __pthread_start


Thread 'tokio-runtime-worker' panicked at 'Storage root must match that calculated.', /Users/kun/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/macros.rs:13

This is a bug. Please report it at:

	https://github.com/paritytech/substrate/issues/new

Hash: given=1fb606cbe8cf369d3ff130647d53ff61f6a677d0288b6b2c1ac6fb9ed87dc3cc, expected=f7e930bcbf0380e9c1c30b8125e471f2756680b4b37d7f9e94798c144e7821ab

It works like this: there is process wide hook maintained by sp-panic-handler. Whenever a panic occurs the hook print the message. Apart from that the panic handler either exits the process or not depending on a special thread local flag.

This flag is by default set to abort. However, before entering into the native runtime it is set to just unwind, so the already mentioned mechanism above handles the panic appropriately. However, we need to set a special guard again when the native runtime calls back into the node again through Substrate Runtime Interface - this is because we assume that the substrate runtime interface implementation doesn't panic, it has a special path to return errors to the node, but if it does panic we want to treat it as a node error and abort.

For example, we expect that backend can always return the storage entries requested by the runtime. We even have this proof

const EXT_NOT_ALLOWED_TO_FAIL: &str = "Externalities not allowed to fail within runtime";

Except, this can indeed happen in a light-client. Think of a light-client that reexecutes a runtime call with a witness that lacks (inadvertendly or maliciously) some trie nodes. In that case, the backend legitly returns an error but because EXT_NOT_ALLOWED_TO_FAIL we bring down the whole error. (We also cannot change the interface of storage functions since that would be even worse, because the wasm runtime would have to deal with inherently unrecoverable errors)

To mitigate this another flag was introduced called never_abort. This flag is used for exactly this case. So after all we have this tri-state panic handler with quite non obvious and far reaching semantics. I assume should a legit error take place in the host function implementations - it will be attributed to the untrusted backend.

with_std and without_std

Because of the differences in the different environments and the way how the node interacts with the runtime and the runtime with node differs depending on which environment we are dealing with.

The codepaths do differ between the environments. For instance, the way how the parameters are passed between the environments in runtime_interface may be different in.

Ironically though most of our tests are going through the native path. We will touch this point more later.

runtime_version

The hassle around bumping the runtime_version is primarily needed for the answering the question: can I pass the control to the native runtime to handle this call?

If the runtime version doesn't alter any behavior then we could demote it to something more convenient to use and that will do the job: e.g. maintain a simple upgrade counter, use the block number of upgrade, or after all, fetch the crate number from the Cargo.toml for the runtime crate.

Life without the native runtime optimization

Hopefully I managed to convince that the costs of the native runtime are far from trivial.

But what would we gain if we removed the native runtime optimization?

Compile Times

First of all, we won't have to compile the runtime dependency graph twice which should be a nice improvement.

More so, considered that runtime is heavy on generics.

Decouple Runtime Releases from Node Releases

Removing the native runtime would allow us to decouple runtime upgrades from node upgrades.

Apart from that, we might gain ability to introduce The Substrate Node. I.e. a compiled universal node (aka Bring Your Own Runtime) that would serve a go-to solution for blockchains that are happy with a default out-of-box FRAME experience.

Complications

Testing

One thing that won't let us get rid of all of the complexity associated with the native runtime environment is that all of our unit testing for runtime code happens primarily in native.

While that doesn't stop us proceeding with ripping out the native code, this fact indeed will make us to leave support for native environment. At the very least, we could reserve native/std within the context of runtime exclusively for testing. That means we could simplify and deoptimize, perhaps trading for more diagnostics, these codepaths since testing doesn't require utmost efficiency.

However, the fact that we are exercising the native codepaths during testing and not the wasm paths is a bit alarming. It actually was a big source of errors in early days of seal contracts pallet, when it finally was started to be used within wasm runtimes.

In ideal world, we would have a way to write runtime tests that indistingushable, or better, than we have right now, but such that exercises more parts that indeed present on production paths, i.e. run in wasm.

That can be introduced incrementally though.

RPC

Theoretically, an RPC substrate gateway equipped with a node with the latest runtime version compiled in can handle more 2x+ more throughtput compared to a node that relies exclusively on wasm runtime. I don't I have a good solution for this problem.

Offchain Workers

To be honest I am not sure about this, but I think one of the selling points of offchain workers is that they are upgradable.

In that case, one would think that it's not a big problem to remove the native runtime, however, in presentations the offchain workers are presented like they could opt-out of wasm.

I guess this still has some advantages over external processes, I am not sure. I.e. a close access to the trie.

For instance, there is substrate-archive, an external process that parses all the data and stuffs it into a relational database. In order to access the data of blockchain it goes... directly to rocksdb.

Which makes me thinking if we should provide special APIs for those use-cases, and leave offchain workers only in wasm.

Discussion

Now, I probably have missed somethings and some other things I got wrong. This decision permeates substrate and perhaps we made some things ossified. So please speak up or add if you have anything.

The goal of this discussion is to ultimately agree whether it would be a good change or not and under which circumstances. Only if we reach the consensus we can start talking about the particular steps to achieve this.

@xlc
Copy link
Contributor

xlc commented Oct 9, 2020

I am all in for this.

Native runtime should be for testing and development purpose only. Although it is crucial to keep the ability to run with native runtime because otherwise I have no idea how to debug wasm code.

I won’t be worried too much about RPC performance. This is something can be solved by throwing more RPC nodes.

A release build shouldn’t build any runtime. It already embedded chain spec which includes the wasm runtime.

@pepyakin
Copy link
Contributor Author

pepyakin commented Oct 9, 2020

Good point on debugging, I was a bit biased because I never spun up a debugger for the runtime code and only got away with println like debugging.

I think the ideal solution is to plainly have a debugger for wasm. Anecdotally, there was one time when @bkchr was debugging some storage root mismatch between wasm and native and I tried to help him. I won't be sharing what we've done to diagnose the issue because it was embarrassing for a living being, but I say that took a lot of time and ultimately lead us on a wrong path. If we had a wasm debugger it would have taken not a week but rather a few hours.

I've already got and shared in some circles about the idea of a super powerful debugging approach for wasm runtimes, contracts, etc. I will share it more publicly when it crystalizes a bit more. But at least there is visible path forward to solve this exact issue.

@bkchr
Copy link
Member

bkchr commented Oct 9, 2020

Theoretically lldb should be supported by wasmtime already. I think it also worked in some way for me, but wasn't perfect yet. I assume that this will be improved in the future or we could also throw someone at it, but this shouldn't stop us from doing this here.

@kianenigma
Copy link
Contributor

kianenigma commented Oct 12, 2020

Reaching this sharp edge by the wasm runtime is a potential consensus issue.

Indeed we've faced this in benchmarks already. Some benchmarks run (-- or "used to run") out of memory with default parameters when executed in wasm but ran fine in native, which is rather a confusing pickle.

A more worrying difference though is that the globals in wasm are always restored. I.e. when the wasm runtime receives control it can assume that all globals are initialized to their initial values.

This point was totally new to me. I think it even needs to be documented somewhere.

That means we could simplify and deoptimize, perhaps trading for more diagnostics, these codepaths since testing doesn't require utmost efficiency.

Probably one of the main issues here is that we use lots of types and trait implementations in testing that are only enabled in std. But hypothetically, can't we just remove all these feature flags and then have a testing env which is exactly the same in wasm and native. Then we could use native for times when a module is under diagnosis or development, and wasm as a part of CI for example when the pallet is not being changed anymore.

(Or am I missing something else?)


All in all, I think this discussion is beyond my scope of vision in substrate to have a categorical opinion about.

But, I can state that anecdotally the executor and runtime interface parts of substrate client/runtime have been always very confusing to me and I learned over time that this is to a high degree because of the dual native+wasm support.

Also a similar comment recently was mentioned in @tomaka's substrate-lite experiment:

No native runtime. The execution time of wasmtime is satisfying enough that having a native runtime isn't critical anymore. Relying only on the on-chain Wasm code considerably simplifies the code.

@athei
Copy link
Member

athei commented Oct 12, 2020

I am all for this. The dual runtime made some parts of substrate very hard to understand for me.

I don't think that what we gain by native justifies the complexity, headaches and potential consensus errors. Especially because it does not bring us any more transactions per second or sync speed.

To address the complications:

Testing

Yes. Right now we need to keep around native in order to run tests. But long term we could move all tests to wasm in order to exercise the correct code paths.

RPC

As already stated by @xlc : This can be scaled horizontally without any issue.

Debugging

There is no fundamental reason that prevents us from debugging wasm code. As stated by @bkchr there is already debugger support in wasmtime 1 2

Offchain-Workers

I cannot comment on those.

@cheme
Copy link
Contributor

cheme commented Oct 12, 2020

I realy like the possibility to run native, but it sure is not trivial and always error prone.

That's partly because I see non upgradable blockchain with substrate as a possible use case (which it is probably not :), And partly because I like the technical details involved.

Still some of the points mentioned are really relevant: I am thinking tests and compile time.
I am also interested by the performance gain from running with 'panic: abort', not sure what is 'slightly' though.
I don't know if all the goals mentioned strictly requires to drop native, but sure native is a big additional cost.

Theoretically lldb should be supported by wasmtime already. I think it also worked in some way for me, but wasn't perfect yet. I assume that this will be improved in the future or we could also throw someone at it, but this shouldn't stop us from doing this here.

My experience with it is that it is indeed far from perfect (I did not manage to read variable content and got some odd cursor positioning sometime), but it certainly is already very useful.

@pepyakin
Copy link
Contributor Author

pepyakin commented Oct 15, 2020

@cheme

I assumed that native-exclusive runtimes is not a goal for Substrate, I didn't mention it though in my write up because I haven't found any prior conversations on this. But luckily Gav today on Sub0 confirmed that this is the case. I am not sure what do you mean by "partly because I like the technical details involved."

I am also interested by the performance gain from running with 'panic: abort', not sure what is 'slightly' though.

So my mickey mousey benchmark was just to compare two nodes built with panic=abort and panic=unwind.
This is panic=unwind

2020-10-08 13:00:19.923 tokio-runtime-worker INFO substrate  ⚙️  Syncing 393.2 bps, target=#1934198 (10 peers), best: #3143 (0xb83c…7325), finalized #3072 (0xa4d5…5251), ⬇ 316.2kiB/s ⬆ 27.9kiB/s
2020-10-08 13:00:20.108 tokio-runtime-worker INFO sub-libp2p  🔍 Discovered new external address for our node: /ip4/10.0.0.2/tcp/30333/p2p/12D3KooWGvqpWUNrWLqRCykiPdkqzvg9RaYeTnhX1H3SATYcTH4z
2020-10-08 13:00:24.924 tokio-runtime-worker INFO substrate  ⚙️  Syncing 387.0 bps, target=#1934198 (16 peers), best: #5078 (0x283f…49b5), finalized #4608 (0x1079…df89), ⬇ 201.0kiB/s ⬆ 20.2kiB/s
2020-10-08 13:00:29.925 tokio-runtime-worker INFO substrate  ⚙️  Syncing 388.1 bps, target=#1934199 (17 peers), best: #7019 (0x8548…b085), finalized #6656 (0xb3e2…77ff), ⬇ 153.5kiB/s ⬆ 25.3kiB/s
2020-10-08 13:00:34.925 tokio-runtime-worker INFO substrate  ⚙️  Syncing 384.0 bps, target=#1934200 (21 peers), best: #8939 (0x162e…b300), finalized #8704 (0xd5bc…1949), ⬇ 99.5kiB/s ⬆ 33.5kiB/s
2020-10-08 13:00:39.924 tokio-runtime-worker INFO substrate  ⚙️  Syncing 234.6 bps, target=#1934201 (25 peers), best: #10112 (0x83e9…d3c7), finalized #9728 (0xbec3…1d66), ⬇ 227.3kiB/s ⬆ 23.7kiB/s
2020-10-08 13:00:43.871 tokio-runtime-worker INFO sub-libp2p  🔍 Discovered new external address for our node: /ip4/10.20.5.1/tcp/30333/p2p/12D3KooWGvqpWUNrWLqRCykiPdkqzvg9RaYeTnhX1H3SATYcTH4z
2020-10-08 13:00:44.926 tokio-runtime-worker INFO substrate  ⚙️  Syncing 332.7 bps, target=#1934201 (25 peers), best: #11776 (0x9de1…32fb), finalized #11264 (0x6059…4449), ⬇ 40.4kiB/s ⬆ 17.3kiB/s
2020-10-08 13:00:49.925 tokio-runtime-worker INFO substrate  ⚙️  Syncing 200.6 bps, target=#1934203 (25 peers), best: #12779 (0xbaf4…fa94), finalized #12288 (0x006d…48ea), ⬇ 220.8kiB/s ⬆ 13.6kiB/s
2020-10-08 13:00:54.925 tokio-runtime-worker INFO substrate  ⚙️  Syncing 337.0 bps, target=#1934204 (25 peers), best: #14464 (0x10a4…8f87), finalized #14336 (0x6f13…3643), ⬇ 87.4kiB/s ⬆ 10.2kiB/s
2020-10-08 13:00:59.925 tokio-runtime-worker INFO substrate  ⚙️  Syncing 239.0 bps, target=#1934204 (25 peers), best: #15659 (0xd84a…743b), finalized #15360 (0x0e41…a064), ⬇ 137.6kiB/s ⬆ 6.9kiB/s
2020-10-08 13:01:04.926 tokio-runtime-worker INFO substrate  ⚙️  Syncing 196.1 bps, target=#1934205 (25 peers), best: #16640 (0x53f7…d708), finalized #16384 (0x3f65…5f8b), ⬇ 164.1kiB/s ⬆ 6.4kiB/s
2020-10-08 13:01:09.926 tokio-runtime-worker INFO substrate  ⚙️  Syncing 378.2 bps, target=#1934206 (25 peers), best: #18531 (0x5545…4b56), finalized #18432 (0x5ad4…58de), ⬇ 4.7kiB/s ⬆ 1.4kiB/s

and this is with panic=abort


2020-10-08 13:30:25.541 tokio-runtime-worker INFO substrate  ⚙️  Syncing 407.8 bps, target=#1934499 (17 peers), best: #3750 (0x93f5…7e6a), finalized #3584 (0xd1d9…2716), ⬇ 417.8kiB/s ⬆ 40.9kiB/s
2020-10-08 13:30:30.542 tokio-runtime-worker INFO substrate  ⚙️  Syncing 395.6 bps, target=#1934500 (24 peers), best: #5728 (0x6c5b…fc67), finalized #5632 (0x9ed9…5a2a), ⬇ 55.8kiB/s ⬆ 11.7kiB/s
2020-10-08 13:30:35.544 tokio-runtime-worker INFO substrate  ⚙️  Syncing 390.2 bps, target=#1934500 (25 peers), best: #7680 (0x028f…5be1), finalized #7168 (0x3e4e…f3d1), ⬇ 176.7kiB/s ⬆ 17.5kiB/s
2020-10-08 13:30:40.543 tokio-runtime-worker INFO substrate  ⚙️  Syncing 389.3 bps, target=#1934501 (25 peers), best: #9626 (0x8d27…bfe0), finalized #9216 (0xd9aa…8ef9), ⬇ 81.0kiB/s ⬆ 14.2kiB/s
2020-10-08 13:30:41.765 tokio-runtime-worker INFO sub-libp2p  🔍 Discovered new external address for our node: /ip4/10.0.0.2/tcp/30333/p2p/12D3KooWGvqpWUNrWLqRCykiPdkqzvg9RaYeTnhX1H3SATYcTH4z
2020-10-08 13:30:45.543 tokio-runtime-worker INFO substrate  ⚙️  Syncing 388.8 bps, target=#1934502 (25 peers), best: #11570 (0x551f…4682), finalized #11264 (0x6059…4449), ⬇ 240.2kiB/s ⬆ 26.4kiB/s
2020-10-08 13:30:50.543 tokio-runtime-worker INFO substrate  ⚙️  Syncing 285.6 bps, target=#1934503 (25 peers), best: #12998 (0x144e…e664), finalized #12800 (0xad3b…05f5), ⬇ 195.2kiB/s ⬆ 16.6kiB/s
2020-10-08 13:30:55.543 tokio-runtime-worker INFO substrate  ⚙️  Syncing 398.4 bps, target=#1934504 (25 peers), best: #14990 (0x9462…3df9), finalized #14848 (0xdf8f…4f2e), ⬇ 68.0kiB/s ⬆ 20.7kiB/s
2020-10-08 13:31:00.543 tokio-runtime-worker INFO substrate  ⚙️  Syncing 403.0 bps, target=#1934504 (25 peers), best: #17005 (0xc903…ccb4), finalized #16896 (0x7763…777a), ⬇ 313.3kiB/s ⬆ 7.4kiB/s
2020-10-08 13:31:05.544 tokio-runtime-worker INFO substrate  ⚙️  Syncing 418.2 bps, target=#1934505 (25 peers), best: #19096 (0x0449…3c64), finalized #18944 (0x48b9…c40b), ⬇ 158.8kiB/s ⬆ 4.0kiB/s
2020-10-08 13:31:10.544 tokio-runtime-worker INFO substrate  ⚙️  Syncing 401.2 bps, target=#1934506 (25 peers), best: #21102 (0xc46e…d267), finalized #20992 (0x229e…a59f), ⬇ 194.9kiB/s ⬆ 1.4kiB/s

I do not think those are reliable though. Proper benchmarks must be conducted to test this point.


@kianenigma

I think it even needs to be documented somewhere.

Yeah, that's a good point! Ideally we should document all the specifics of the runtime environment provided by substrate

@gnome32
Copy link

gnome32 commented Dec 10, 2020

Substrate has so far demonstrated alternative use cases other than PoA/PoS models (see utxo-workshop and PoW consensus). Forkless upgrades is not a compatible upgrade model for a Bitcoin-type blockchains (ie on-chain governance via Democracy pallet etc). Wasm-only runtime is bad for PoW models where node-performance is king. Native is also better for embedded or slower hardware. Performance is also a key reason for using Rust in the first place.

A major strength of Substrate is its ability to have pluggable consensus, along with being a very modular framework in general. To date, Substrate truly has been agnostic towards other chain models and consensus. Be aware you may push alternative models away by going down this route. I am not pushing for specific scaling methods, crypto politics, and/or specific consensus mechanisms etc. I am simply pointing out that Substrate has not really forced anything too much so far. In fact, it has been very supportive and has actively researched other models (ie utxos, PoW).

As it is, the chain state already appears to lock in the wasm blob into the genesis state. Can these be easily separated? If a developer wanted to go and hard-code the genesis state (like Bitcoin) and be responsible for maintaining consensus themselves (using static native runtimes and softforking etc), they should be able to do so.

Anyways, keep up the great work.

@tomaka
Copy link
Contributor

tomaka commented Dec 14, 2020

Wasm-only runtime is bad for PoW models where node-performance is king. Native is also better for embedded or slower hardware. Performance is also a key reason for using Rust in the first place.

We know that right now the Wasm-only runtime is indeed slower in Substrate, as we are for example doing a lot of extra memcpies at the Wasm VM boundary.
In theory, however, assuming we are optimizing these out, apart from the time compiling the runtime when an on-chain runtime upgrade happens (which I assume wouldn't happen if you're creating a native-only chain), there is no reason to believe that the Wasm-only runtime should actually be slower.

The winner between the native and Wasm-only runtimes theoretically only depends on the winner between the Rust+LLVM compiler and the cranelift compiler.

@pepyakin
Copy link
Contributor Author

While I like the idea of supporting all possible models, I think there is a point where we need to draw a line. Here, I think we have reached the point where supporting this does involve a non trivial of investment. Not only in terms of immediate support, but also for the future evolution. And as I mentioned, apparently, supporting the forkless upgrades are the first priority.

And yes, there is a plenty of opportunity to optimize the wasm execution.

One thing I wanted to add to the list of costs is the binary size:

In the polkadot project, we support 4 flavours of runtimes: polkadot, kusama, rococo, westend. All four are all compiled and linked into the final binary. AFAIU, all four are embedded inside the final executable as wasm runtimes. That's also influences the compile times.

@athei
Copy link
Member

athei commented Jan 11, 2021

Wasm-only runtime is bad for PoW models where node-performance is king. Native is also better for embedded or slower hardware. Performance is also a key reason for using Rust in the first place.

I would think that the act of hashing wouldn't be done inside the runtime anyways. It would most probably be made available by the client as runtime interface.

@gnome32
Copy link

gnome32 commented Jan 11, 2021

I would think that the act of hashing wouldn't be done inside the runtime anyways. It would most probably be made available by the client as runtime interface.

Node performance. Not the PoW itself. Slow block verification increase chance of stales etc. If wasm can indeed reach native performance, then that would be good.

Do forkless upgrades imply absolute runtime determinism? Could a substrate-built blockchain support non-substrate clients?

@nuke-web3
Copy link
Contributor

nuke-web3 commented Jan 18, 2021

I just noticed this note on debugging and logging that I think is relevant: as without native runtime support - we force bloat and possibly slow down of runtimes where we can no longer use native.

One solution could be a global build flag to --dev or something along those lines that would include any logging/debugging, but we would educate on releases globally trying to remove logging (or minimize it) to improve overall runtime size and performance (especially on runtime upgrades)

@pepyakin
Copy link
Contributor Author

pepyakin commented Jan 18, 2021

All the logging statements are compiled in wasm already, so removing the native runtime wouldn't make anything worse in that regard. So therefore this issue is orthogonal.

UPD: A follow up on this matter #7927

@arkpar
Copy link
Member

arkpar commented May 25, 2021

+1 For removing native runtime. I'd like to point out that native block execution does not really help with performance in real world usage. When syncing individual blocks at the top of the chain, performance does not really matter as long as the node can keep up with the network. And since authoring is done with WASM anyway and weights are computed for WASM execution, having native execution has no benefit. And when syncing historical blocks it is also mostly useless, because 99.9% percent of blocks are authored with an older runtime and native execution is not applicable.

Debugging argument is valid, but I'd trade it for faster compile times any day.

@athei
Copy link
Member

athei commented May 25, 2021

Keeping native does not help with throughput. The only thing one could argue is that the regular full node that is keeping up with the network is running more efficiently and thus consumes less energy.

@stale
Copy link

stale bot commented Jul 7, 2021

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 7, 2021
@pepyakin pepyakin added J1-meta A specific issue for grouping tasks or bugs of a specific category. and removed A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. labels Jul 7, 2021
@kianenigma
Copy link
Contributor

seems like #8893 is a step toward removing native execution already.

@dartdart26
Copy link

I've been thinking about chain extensions and ability to use external std libraries. What I am getting at is that the native runtime (very conveniently) allows usage of any Rust library (std or no_std) and expose functionalities from it to smart contracts via a chain extension. Can that be achieved by having the Wasm runtime only?

@athei
Copy link
Member

athei commented Jun 7, 2022

You would need to add a new host function to your client in order to do that. However, by making use of that host function your runtime would become incompatible to be run on a relay chain validator. But using the native runtime would have the same effect. So this is essentially only useful for stand alone chains. But keep in mind that everything put in native makes forkless runtime upgrades much harder because it requires validators to update their binary.

@dartdart26
Copy link

Thank you for the feedback!

Yes, I agree it has limitations. I guess my point was that using the native runtime with chain extensions in that way can be thought of as a Substrate feature in the same way as forkless runtime upgrades. I think it adds more flexibility to Substrate and allows for implementing different use cases.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J1-meta A specific issue for grouping tasks or bugs of a specific category.
Projects
None yet
Development

Successfully merging a pull request may close this issue.