Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

[Optimization] Speed-up boot time #5519

Closed
crystalin opened this issue May 12, 2022 · 17 comments
Closed

[Optimization] Speed-up boot time #5519

crystalin opened this issue May 12, 2022 · 17 comments

Comments

@crystalin
Copy link

Starting the polkadot node (also it is the same for parachains) takes 20 cpu-seconds which is quite a lot for some cases.
With a high-end CPU, it takes ~3s which is fine for humans. However for automated CI requiring to launch 500-1000 times a node, it adds up quickly.

└--╼ time ./target/release/polkadot --dev --tmp
2022-05-12 18:19:10 Parity Polkadot    
2022-05-12 18:19:10 ✌️  version 0.9.19-2c934ed2cdd    
2022-05-12 18:19:10 ❤️  by Parity Technologies <[email protected]>, 2017-2022    
2022-05-12 18:19:10 📋 Chain specification: Development    
2022-05-12 18:19:10 🏷  Node name: certain-pocket-4508    
2022-05-12 18:19:10 👤 Role: AUTHORITY    
2022-05-12 18:19:10 💾 Database: RocksDb at /tmp/substrateLQblx7/chains/dev/db/full    
2022-05-12 18:19:10 ⛓  Native runtime: polkadot-9200 (parity-polkadot-0.tx12.au0)    
2022-05-12 18:19:11 [0] 💸 generated 1 npos voters, 1 from validators and 0 nominators    
2022-05-12 18:19:11 Took active validators from set with wrong size    
2022-05-12 18:19:11 Took active validators from set with wrong size    
2022-05-12 18:19:11 Took active validators from set with wrong size.    
2022-05-12 18:19:11 Took active validators from set with wrong size    
2022-05-12 18:19:12 🔨 Initializing Genesis block/state (state: 0x19c3…2642, header-hash: 0x3db5…d36c)    
2022-05-12 18:19:12 👴 Loading GRANDPA authority set from genesis on what appears to be first startup.    
2022-05-12 18:19:12 👶 Creating empty BABE epoch changes on what appears to be first startup.    
2022-05-12 18:19:12 🏷  Local node identity is: 12D3KooWQTrgvEvYwcFkLf8c3GgDbwmPGnAkR4ZdFk2Gapw9NFMm    
2022-05-12 18:19:13 💻 Operating system: linux    
2022-05-12 18:19:13 💻 CPU architecture: x86_64    
2022-05-12 18:19:13 💻 Target environment: gnu    
2022-05-12 18:19:13 💻 CPU: AMD Ryzen 9 5950X 16-Core Processor    
2022-05-12 18:19:13 💻 CPU cores: 16    
2022-05-12 18:19:13 💻 Memory: 32068MB    
2022-05-12 18:19:13 💻 Kernel: 5.10.60.1-microsoft-standard-WSL2    
2022-05-12 18:19:13 💻 Linux distribution: Ubuntu 20.04.3 LTS    
2022-05-12 18:19:13 💻 Virtual machine: yes    
2022-05-12 18:19:13 📦 Highest known block at #0    
2022-05-12 18:19:13 〽️ Prometheus exporter started at 127.0.0.1:9615    
2022-05-12 18:19:13 Running JSON-RPC HTTP server: addr=127.0.0.1:9933, allowed origins=None    
2022-05-12 18:19:13 Running JSON-RPC WS server: addr=127.0.0.1:9944, allowed origins=None    
2022-05-12 18:19:13 🏁 CPU score: 1330MB/s    
2022-05-12 18:19:13 🏁 Memory score: 6680MB/s    
2022-05-12 18:19:13 🏁 Disk score (seq. writes): 1282MB/s    
2022-05-12 18:19:13 🏁 Disk score (rand. writes): 590MB/s    
2022-05-12 18:19:13 👶 Starting BABE Authorship worker    
^C
real    0m3.127s
user    0m21.210s
sys     0m0.764s

If you have any suggestion to improve this, that would be great :)
(cc @rphmeier )

@shawntabrizi
Copy link
Member

in this case --dev generates a chain specification on the fly, then generates genesis. Have you tried timing using an exported chain spec, and then importing it with --chain path/to/spec.json.

Make sure the chain spec is raw to save time there too.

@crystalin
Copy link
Author

crystalin commented May 13, 2022

Yes @shawntabrizi , it takes longer with the chain specs (I think reading/parsing the json is longer than from binary)

└--╼ time ./target/release/polkadot --chain=westend-local.raw.json --tmp 
2022-05-12 21:48:43 Parity Polkadot    
2022-05-12 21:48:43 ✌️  version 0.9.19-2c934ed2cdd    
2022-05-12 21:48:43 ❤️  by Parity Technologies <[email protected]>, 2017-2022    
2022-05-12 21:48:43 📋 Chain specification: Westend Local Testnet    
2022-05-12 21:48:43 🏷  Node name: hesitant-sound-6664    
2022-05-12 21:48:43 👤 Role: FULL    
2022-05-12 21:48:43 💾 Database: RocksDb at /tmp/substrate4j9QrU/chains/westend_local_testnet/db/full    
2022-05-12 21:48:43 ⛓  Native runtime: westend-9200 (parity-westend-0.tx11.au2)    
2022-05-12 21:48:46 🔨 Initializing Genesis block/state (state: 0x6b29…a425, header-hash: 0x4361…e9f2)    
2022-05-12 21:48:46 👴 Loading GRANDPA authority set from genesis on what appears to be first startup.    
2022-05-12 21:48:46 👶 Creating empty BABE epoch changes on what appears to be first startup.    
2022-05-12 21:48:46 🏷  Local node identity is: 12D3KooWSv5sWKyLyYQBDpekFdVazTgzrUyBe8iYhonuBZadDGJz    
2022-05-12 21:48:46 💻 Operating system: linux    
2022-05-12 21:48:46 💻 CPU architecture: x86_64    
2022-05-12 21:48:46 💻 Target environment: gnu    
2022-05-12 21:48:46 💻 CPU: AMD Ryzen 9 5950X 16-Core Processor    
2022-05-12 21:48:46 💻 CPU cores: 16    
2022-05-12 21:48:46 💻 Memory: 32068MB    
2022-05-12 21:48:46 💻 Kernel: 5.10.60.1-microsoft-standard-WSL2    
2022-05-12 21:48:46 💻 Linux distribution: Ubuntu 20.04.3 LTS    
2022-05-12 21:48:46 💻 Virtual machine: yes    
2022-05-12 21:48:46 📦 Highest known block at #0    
2022-05-12 21:48:46 〽️ Prometheus exporter started at 127.0.0.1:9615    
2022-05-12 21:48:46 Running JSON-RPC HTTP server: addr=127.0.0.1:9933, allowed origins=Some(["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"])    
2022-05-12 21:48:46 Running JSON-RPC WS server: addr=127.0.0.1:9944, allowed origins=Some(["http://localhost:*", "http://127.0.0.1:*", "https://localhost:*", "https://127.0.0.1:*", "https://polkadot.js.org"])    
2022-05-12 21:48:46 🏁 CPU score: 1348MB/s    
2022-05-12 21:48:46 🏁 Memory score: 19461MB/s    
2022-05-12 21:48:46 🏁 Disk score (seq. writes): 1973MB/s    
2022-05-12 21:48:46 🏁 Disk score (rand. writes): 789MB/s    
2022-05-12 21:48:46 creating instance on iface 172.26.200.125    
^C
real    0m3.546s
user    0m21.041s
sys     0m1.515s

@arkpar
Copy link
Member

arkpar commented May 13, 2022

IIRC most of that time is spent compiling the current runtime WASM code. That's why it scales well with the number of CPUs.
Try running with --wasm-execution=interpreted-i-know-what-i-do
You can also use --no-hardware-benchmarks to skip performance benchmarks on start.

@crystalin
Copy link
Author

Thank you @arkpar this is a huge difference yes:

└--╼ time ./target/release/polkadot --dev --tmp --wasm-execution=interpreted-i-know-what-i-do 
2022-05-13 08:21:40 Parity Polkadot    
2022-05-13 08:21:40 ✌️  version 0.9.19-2c934ed2cdd    
2022-05-13 08:21:40 ❤️  by Parity Technologies <[email protected]>, 2017-2022    
2022-05-13 08:21:40 📋 Chain specification: Development    
2022-05-13 08:21:40 🏷  Node name: bashful-arithmetic-4273    
2022-05-13 08:21:40 👤 Role: AUTHORITY    
2022-05-13 08:21:40 💾 Database: RocksDb at /tmp/substrateLy5yQi/chains/dev/db/full    
2022-05-13 08:21:40 ⛓  Native runtime: polkadot-9200 (parity-polkadot-0.tx12.au0)    
2022-05-13 08:21:42 [0] 💸 generated 1 npos voters, 1 from validators and 0 nominators    
2022-05-13 08:21:42 Took active validators from set with wrong size    
2022-05-13 08:21:42 Took active validators from set with wrong size    
2022-05-13 08:21:42 Took active validators from set with wrong size.    
2022-05-13 08:21:42 Took active validators from set with wrong size    
2022-05-13 08:21:42 🔨 Initializing Genesis block/state (state: 0x19c3…2642, header-hash: 0x3db5…d36c)    
2022-05-13 08:21:42 👴 Loading GRANDPA authority set from genesis on what appears to be first startup.    
2022-05-13 08:21:42 👶 Creating empty BABE epoch changes on what appears to be first startup.    
2022-05-13 08:21:42 🏷  Local node identity is: 12D3KooWB1yKj1FtjWi98ZFqTXAqhgkpmaU54iMAihp8Mp6enMST    
2022-05-13 08:21:42 💻 Operating system: linux    
2022-05-13 08:21:42 💻 CPU architecture: x86_64    
2022-05-13 08:21:42 💻 Target environment: gnu    
2022-05-13 08:21:42 💻 CPU: AMD Ryzen 9 5950X 16-Core Processor    
2022-05-13 08:21:42 💻 CPU cores: 16    
2022-05-13 08:21:42 💻 Memory: 32068MB    
2022-05-13 08:21:42 💻 Kernel: 5.10.60.1-microsoft-standard-WSL2    
2022-05-13 08:21:42 💻 Linux distribution: Ubuntu 20.04.3 LTS    
2022-05-13 08:21:42 💻 Virtual machine: yes    
2022-05-13 08:21:42 📦 Highest known block at #0    
2022-05-13 08:21:42 〽️ Prometheus exporter started at 127.0.0.1:9615    
2022-05-13 08:21:42 Running JSON-RPC HTTP server: addr=127.0.0.1:9933, allowed origins=None    
2022-05-13 08:21:42 Running JSON-RPC WS server: addr=127.0.0.1:9944, allowed origins=None    
2022-05-13 08:21:42 🏁 CPU score: 1348MB/s    
2022-05-13 08:21:42 🏁 Memory score: 18231MB/s    
2022-05-13 08:21:42 🏁 Disk score (seq. writes): 1909MB/s    
2022-05-13 08:21:42 🏁 Disk score (rand. writes): 816MB/s    
2022-05-13 08:21:42 👶 Starting BABE Authorship worker    
^C
real    0m2.329s
user    0m0.750s
sys     0m0.687s

With --no-hardware-benchmarks, the real time goes down to

real    0m1.507s
user    0m0.528s
sys     0m0.293s

@crystalin
Copy link
Author

@arkpar how stable is it to use interpreted wasm ?

@bkchr
Copy link
Member

bkchr commented May 13, 2022

How stable is what? :P The node? Interpreted should always work and be more "stable" than compiled.

@crystalin
Copy link
Author

crystalin commented May 13, 2022

@bkchr "Stable" in the sense of being consistent with compiled wasm ? We know that "native" for exemple doesn't always provide the same result as the compiled wasm. I'm wondering if the interpreter is in the same case, or is it considered to provide always the same output as the compiled one :)

@crystalin
Copy link
Author

@arkpar also, isn't it possible to provide a system to pre-compile the wasm (into a file) to be loaded directly by the binary ?
This would also allow for more optimization that it is currently used I believe. IIRC the compiler was not using some optimization to avoid "Validator" to spend too much time when compiling the parachain runtimes.

@bkchr
Copy link
Member

bkchr commented May 13, 2022

@bkchr "Stable" in the sense of being consistent with compiled wasm ? We know that "native" for exemple doesn't always provide the same result as the compiled wasm. I'm wondering if the interpreter is in the same case, or is it considered to provide always the same output as the compiled one :)

As I already said, interpreted is "more correct". So, yes, it should always provide the same results.

@bkchr
Copy link
Member

bkchr commented May 13, 2022

@arkpar also, isn't it possible to provide a system to pre-compile the wasm (into a file) to be loaded directly by the binary ?

Please no :P I mean wasmtime supports this and we somehow also support this, but currently not for "node runtimes".

@crystalin
Copy link
Author

@bkchr I see. Are you afraid this would lead to more issues with inconsistent runtime overall (wrong file, wrong version, wrong compilation) ?

@bkchr
Copy link
Member

bkchr commented May 13, 2022

I'm more afraid of adding features for niche use cases. ;)

@crystalin
Copy link
Author

Sounds good :)
I don't think there is much to do so I'm closing the ticket

@arkpar
Copy link
Member

arkpar commented May 13, 2022

on-disk WASM compilation cache is something we considered for POV/parachain code. But I'm not sure what's the status of it. CC @pepyakin

@pepyakin
Copy link
Contributor

This is done in multiple flavors.

Now PVF host executor implements its own cache. Previously, we used the cache provided by wasmtime. It's possible enable it by passing the cache_path to a directory inside the db path.

@arkpar
Copy link
Member

arkpar commented May 13, 2022

Is there a way to use it for the relay chain runtime in substrate as well?

@pepyakin
Copy link
Contributor

For the former, no 1. For the latter, yes: the client should just specify Some with the cache dir.

Footnotes

  1. well, there is https://github.com/paritytech/polkadot-sdk/issues/2344 but it's just an idea at this point which, even if accepted, will take quite some time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants