Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Weights to u64 + Balances Weights #5446

Merged
merged 62 commits into from
Apr 16, 2020
Merged

Conversation

shawntabrizi
Copy link
Member

@shawntabrizi shawntabrizi commented Mar 28, 2020

This PR begins the updates to Weights in Substrate FRAME.

polkadot companion: paritytech/polkadot#994

  • Use u64 for Weights
  • Set FRAME standard that 1_000_000_000_000 weight represents 1 second of compute and is 1:1 with gas in Contracts.
    • So 1 ns of compute is 1_000 weight
  • Set MaximumBlockWeight for Node and Node Template to 2_000_000_000_000 for 2 seconds of compute out of an average 6 second block time.
  • Introduce DbWeight type configuration in the System pallet, allowing you to define the weight of DB operations in your runtime for your selected database.
  • Introduce RuntimeDbWeight weight struct which easily integrates into existing Weight syntax to allow you to extend your weight definition.
  • Introduce weights to all extrinsics in Balances.
  • Increase all weights of all functions by 1_000 to keep their weight on the system mostly the same (I did allow a double of the max weight in a block, but this should be okay).
  • Reduced weight to fee coefficient to 1 to keep the fees of all existing pallets the same.
  • Removes SimpleDispatchInfo::default() and introduces const MINIMUM_WEIGHT

High level overview

Ignoring the bullet points above, the high level goal of this PR is to start to standardize the Substrate Runtime Weight system and give practical weights to extrinsics in the runtime.

Standard Weight System

Substrate has the goal to create an open and fluid Pallet ecosystem, where many developers can design and publish their pallets, and other developers can use those pallets in a clean and simple way. From a technical standpoint, we are already there, but from a financial standpoint, the Weight system has not been standardized such that different pallets from different developers would all work on the same runtime and be economically secure.

To fix this, we establish a standard definition of the FRAME Weight unit.

  • Weight is a u64 value
  • 1_000 weight represents 1 nanosecond of computation on the following device specifications:
    • Digital Ocean: ubuntu-s-2vcpu-4gb-ams3-01
    • 2x Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
    • 4GB RAM
    • Ubuntu 19.10 (GNU/Linux 5.3.0-18-generic x86_64)
    • rustc 1.42.0 (b8cedc004 2020-03-09)
Full Details

root@ubuntu-s-2vcpu-4gb-ams3-01:~# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping : 1
microcode : 0x1
cpu MHz : 2199.998
cache size : 4096 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt md_clear
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips : 4399.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
stepping : 1
microcode : 0x1
cpu MHz : 2199.998
cache size : 4096 KB
physical id : 1
siblings : 1
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt md_clear
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips : 4399.99
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

root@ubuntu-s-2vcpu-4gb-ams3-01:~# fio --randrepeat=1 --ioengine=posixaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=64
fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=12.5MiB/s,w=4388KiB/s][r=3191,w=1097 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=9576: Wed Apr 1 22:03:41 2020
read: IOPS=3304, BW=12.9MiB/s (13.5MB/s)(3070MiB/237807msec)
bw ( KiB/s): min= 2248, max=17040, per=100.00%, avg=13221.27, stdev=2214.00, samples=475
iops : min= 562, max= 4260, avg=3305.31, stdev=553.50, samples=475
write: IOPS=1104, BW=4418KiB/s (4524kB/s)(1026MiB/237807msec); 0 zone resets
bw ( KiB/s): min= 824, max= 6000, per=100.00%, avg=4418.67, stdev=754.61, samples=475
iops : min= 206, max= 1500, avg=1104.66, stdev=188.65, samples=475
cpu : usr=1.03%, sys=0.14%, ctx=203171, majf=0, minf=38
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=9.4%, 16=25.0%, 32=53.1%, >=64=12.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=98.6%, 8=0.1%, 16=0.0%, 32=0.0%, 64=1.4%, >=64=0.0%
issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: bw=12.9MiB/s (13.5MB/s), 12.9MiB/s-12.9MiB/s (13.5MB/s-13.5MB/s), io=3070MiB (3219MB), run=237807-237807msec
WRITE: bw=4418KiB/s (4524kB/s), 4418KiB/s-4418KiB/s (4524kB/s-4524kB/s), io=1026MiB (1076MB), run=237807-237807msec

Disk stats (read/write):
vda: ios=785372/262750, merge=0/86, ticks=178704/44878, in_queue=15464, util=95.25%

With this definition, we give the default Substrate runtime 2 seconds of total block compute time (2_000_000_000_000 Maximum Block Weight), which is 1/3 of the 6 second total block time of the default substrate node. This allows for delay caused by network propagation, etc...

DbWeight Usage

With this definition of Runtime Weight, we now need to define the different weights of the different extrinsics in the runtime.

We have found that the underlying DB has a significant impact on the overall performance of the runtime. Since Substrate is a fully modular framework, we should expect that different chains may choose to use a different DB layer. As such we have integrated DB performance directly into the weight calculation of extrinsics.

An extrinsic weight should be calculated as:

extrinsic_weight =
    computation_and_memory_weight +
    num_unique_db_reads * db_read_weight_factor +
    num_unique_db_writes * db_write_weight_factor

Here we separate out any general computation and memory usage from the raw read/write db operations that an extrinsic can invoke.

Measurement of the read/write operations of an extrinsic can be measured empirically by creating benchmarks and using trace logging from the benchmarking and database layers.

--log state-trace=trace,benchmark=trace

NOTE: We only count unique read/write operations for that block as part of the extrinsic weight. Things like the sender nonce, block number, event storage, etc... are assumed to be updated every block, and does not need to be included as a unique read/write operation for that extrinsic.

Since they are already accessed once per block (at least), we can treat them as part of the Runtime Storage Overlay, which is included as part of the computation/memory usage weight.

Then the runtime developer can define a single global DbWeight configuration trait in FRAME System to define the cost of reads and writes for their respective database.

// In this example each read operation an extrinsic performs adds 100 weight, while
// each write operation adds 1000.
parameter_types! {
	pub const DbWeight: RuntimeDbWeight = RuntimeDbWeight {
		read: 100,
		write: 1000,
	};
}

Computation and memory weight can be calculated using the same benchmarking framework, but using the in-memory DB, isolating the computation + memory weight from any effect that the DB may have.

Using these two pieces of information, a user can then define the weight of their extrinsic like so:

// Transfer has (1 read, 1 write) to the underlying runtime DB, and takes 200 microseconds
// to execute with an in-memory db.
#[weight = T::DbWeight::get().reads_writes(1, 1) + 200_000_000]
pub fn transfer(
	origin,
	dest: <T::Lookup as StaticLookup>::Source,
	#[compact] value: T::Balance
) {
	let transactor = ensure_signed(origin)?;
	let dest = T::Lookup::lookup(dest)?;
	<Self as Currency<_>>::transfer(&transactor, &dest, value, ExistenceRequirement::AllowDeath)?;
}

Here, DbWeight is of type RuntimeDbWeight which implements the following functions:

impl RuntimeDbWeight {
	pub fn reads(self, r: Weight) -> Weight {
		self.read.saturating_mul(r)
	}

	pub fn writes(self, w: Weight) -> Weight {
		self.write.saturating_mul(w)
	}

	pub fn reads_writes(self, r: Weight, w: Weight) -> Weight {
		let read_weight = self.read.saturating_mul(r);
		let write_weight = self.write.saturating_mul(w);
		read_weight.saturating_add(write_weight)
	}
}

By default, this weight as assumed to have DispatchClass::Normal and PaysFee = true. Any more complex weight definition can use more verbose syntax such as FunctionOf.

Balances Weights

We have created and ran benchmarks on all pallets as described above. The results can be found here: https://www.shawntabrizi.com/substrate-graph-benchmarks/

Extracting the data for just the Balances pallet we have the following:

Reads Writes Computation Notes
transfer 1 1 ~200 µs Assumed that sender is already in overlay and updated for the block, so only recipient storage item gets read and written to.
set_balance 1 1 ~100 µs
force_transfer 2 2 ~200 µs Exactly the same as transfer but sender is not assumed to be in overlay.
transfer_keep_alive 1 1 ~150 µs Computation is slightly less than transfer because the sender account is never killed.

The weights of the Balances pallet has been updated to reflect this.

@shawntabrizi shawntabrizi added the A3-in_progress Pull request is in progress. No review needed at this stage. label Mar 28, 2020
@shawntabrizi shawntabrizi changed the title Updates to Weights (u64, Fees) Weights to u64 + Balances Weights Mar 28, 2020
@gavofyork
Copy link
Member

Set FRAME standard that 1_000_000_000 weight represents 1 second of compute

This is fine, but we should be clear that this only holds in the impossibly specific circumstances we have now:

  • with the current rustc,
  • and libraries like wasmtime, rocksdb, libc, ...,
  • on a specific piece of hardware,
  • running a particular OS with specific updates.

In general, it will not hold and we will need to rebalance the weights over time, likely with every major version update. However the idea that 1 unit of weight === 1 ns is, I think, very useful since it allows many different chains with their own blocktimes to make use of the same underlying weights easily.

@athei
Copy link
Member

athei commented Apr 9, 2020

However the idea that 1 unit of weight === 1 ns is, I think, very useful since it allows many different chains with their own blocktimes to make use of the same underlying weights easily.

I really like the idea. However, for the contract modules it would be very helpful to scale up the resolution. In my opinion it would be ideal to unify the concept of gas and weight. That means we describe the costs of the individual wasm instructions in weight.

Having the smallest unit be 1ns is to coarse. We can assume that most CPUs are > 1GHZ (1ns per cycle) that have an effective instruction count per cycle that is > 1.

One 1 weight == 1 picosecond would be better. Of course we can work with a scaling factor but that would make things more complicated than just using bigger numbers.

@shawntabrizi shawntabrizi requested a review from sorpaas as a code owner April 9, 2020 17:32
@shawntabrizi
Copy link
Member Author

@athei @pepyakin I have added an extra 1_000 to all weights, so 1 ns = 1_000 weight

@xlc
Copy link
Contributor

xlc commented Apr 15, 2020

Just throwing out some ideas.

Instead of #[weight = T::DbWeight::get().reads_writes(1, 1) + 200_000_000]

How about

#[weight = T::DbWeight::add_reads_writes(1, 1).add(200_000_000)]

or

#[weight = T::DbWeight::add_reads(1).add_writes(1).add(200_000_000)]

or

#[weight = T::Weight::add_db_reads(1).add_db_writes(1).add_ns(200_000)]

or

#[weight = Weight::add_db_reads(1).add_db_writes(1).add_compute(200_000)]

or

#[weight = db_reads(1) + db_writes(1) + compute(200_000)]

and for later two syntax, pallet-system convert this struct Weight { read, write, compute } to actual weight number using System::DbWeight & System::ComputeWeight

@kianenigma kianenigma added A8-mergeoncegreen and removed A0-please_review Pull request needs code review. labels Apr 15, 2020
@shawntabrizi
Copy link
Member Author

shawntabrizi commented Apr 15, 2020

@xlc happy to make any changes to the syntax (I actually expect that things will evolve anyway as we put weights on non-constant functions), but can you provide some context to what is gained for you with the change?

@shawntabrizi
Copy link
Member Author

@xlc happy to make any changes to the syntax (I actually expect that things will evolve anyway as we put weights on non-constant functions), but can you provide some context to what is gained for you with the change?

Lets take any feedback though about this syntax into a new PR. Need to keep moving forward here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants