Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Refactor evm Instruction to be a c-like enum #8914

Merged
merged 11 commits into from
Jun 27, 2018
Merged

Conversation

sorpaas
Copy link
Collaborator

@sorpaas sorpaas commented Jun 18, 2018

rel #6744

This allows some more type checking for the type, and also allows to put functions in a more Rust-y way.

  • A small macro is used to implement from_u8 using match. A small part of this marco is from enum_derive crate. But that crate defines other functions (like from_u64, etc) which we don't use.
  • INSTRUCTIONS table is made private, and it's accessed from Instruction::info(&self).
  • Put exec_stack_instruction to exec_instruction. This allows Rust to type check match arms, and refuse to compile if new opcode is added, but its execution logic is not defined.

@sorpaas sorpaas added A0-pleasereview 🤓 Pull request needs code review. F6-refactor 📚 Code needs refactoring. labels Jun 18, 2018
@sorpaas sorpaas added this to the 1.12 milestone Jun 18, 2018
arr[REVERT as usize] = InstructionInfo::new("REVERT", 2, 0, GasPriceTier::Zero);
static ref INSTRUCTIONS: [Option<InstructionInfo>; 0x100] = {
let mut arr = [None; 0x100];
arr[STOP as usize] = Some(InstructionInfo::new("STOP", 0, 0, GasPriceTier::Zero));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tab formatting displays correctly on my Emacs, but looks like it doesn't work on Github interface.
I'm thinking about maybe it's better to just use a single space as separator -- we don't gain much readability by using multiple tabs.

@sorpaas sorpaas added the M4-core ⛓ Core client code / Rust. label Jun 18, 2018
@debris debris requested a review from tomusdrw June 19, 2018 13:10
Copy link
Collaborator

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, although I would insists on having some representative benchmarks for the baseline before merging any EVM changes.
The earlier we catch performance regressions or improvements the better.

#[doc = "Convert from u8 to the given enum"]
pub fn from_u8(value: u8) -> Option<Self> {
match value {
$( $discriminator => Some($variant) ),+,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure it's equally performant to borrowing from the static array?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instruction is repr(u8), and here we use match to convert a raw u8 to an repr(u8). We're still borrowing instruction info from the INSTRUCTIONS static array, in info(&self).

Anyway I'll try a benchmark on the new code.

if let Some(instruction) = instruction {
if instruction == instructions::JUMPDEST {
jump_dests.insert(position);
} else if instruction.is_push() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change that to if let Some(xx) = instructions.push_bytes() { ?


while reader.position < code.len() {
let instruction = code[reader.position];
let opcode = code[reader.position];
let instruction = Instruction::from_u8(opcode);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mapping the code to Instructions during the first phase along with jumpdestination analysis.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current issue is that we only calculate valid jump destinations upon first JUMP instruction, so on L129 that info may not be available.

I'll try and see (probably in a future PR) whether incrementally calculating jumpdests along with the exec loop would be possible, or another idea is to store jumpdest result to the db -- that info is static and never changes for a particular code array.

@sorpaas
Copy link
Collaborator Author

sorpaas commented Jun 20, 2018

Here's the result got from evmbin/bench.sh:

This PR:

Output: 0x606060405200
Gas used: 29fb170
Time: 0.511926861s
^^^^ usize
Output: 0x606060405200
Gas used: 29fb170
Time: 0.519975655s
^^^^ U256
Output: 0x606060405200
Gas used: 8865053
Time: 1.749029145s
^^^^ usize
Output: 0x606060405200
Gas used: 8865053
Time: 1.787731034s
^^^^ U256

Master Branch:

Output: 0x606060405200
Gas used: 29fb170
Time: 0.573969108s
^^^^ usize
Output: 0x606060405200
Gas used: 29fb170
Time: 0.562567334s
^^^^ U256
Output: 0x606060405200
Gas used: 8865053
Time: 2.61541081s
^^^^ usize
Output: 0x606060405200
Gas used: 8865053
Time: 2.23390354s
^^^^ U256

Looks like somehow the conversion to enum makes it run just slightly faster..

Don't know whether this is representative enough. We can also try some benchmarks using some of the jsontests.

@sorpaas sorpaas mentioned this pull request Jun 21, 2018
@sorpaas
Copy link
Collaborator Author

sorpaas commented Jun 25, 2018

I got some benchmarks using #8944. Currently, running the comparison requires manually checking out two branches so it's not hassle-free. Another issue is that many tests only run for a really short time, and I cannot find a reliable way to make some of the results really comparable.

Below is a tuple of tests that run faster in this PR, 0-10% slower, 10-20% slower, 20-30% slower, 30-40% slower, 40-50% slower or 50%+ slower:

(256, 150, 66, 23, 43, 62, 73)

This tuple varies a lot each time I run the tests and gather the csv.

In particular, the performance tests in vm folder might be representive:

{'ackermann31-INT': 0.9897959183673469,
 'ackermann32-INT': 0.9167478091528725,
 'ackermann33-INT': 0.9596136962247586,
 'fibonacci10-INT': 1.0860832137733143,
 'fibonacci16-INT': 0.9300284017557449,
 'loop-add-10M-INT': 0.9226587206771758,
 'loop-divadd-10M-INT': 0.9229079794970425,
 'loop-divadd-unr100-10M-INT': 0.8998631955175739,
 'loop-exp-16b-100k-INT': 1.0188090390141178,
 'loop-exp-1b-1M-INT': 0.9923591588497622,
 'loop-exp-2b-100k-INT': 0.9466506030511994,
 'loop-exp-32b-100k-INT': 1.0991487959483188,
 'loop-exp-4b-100k-INT': 1.12556930650952,
 'loop-exp-8b-100k-INT': 1.1088218849334053,
 'loop-exp-nop-1M-INT': 0.8219694300785856,
 'loop-mul-INT': 0.8684106995830124,
 'loop-mulmod-2M-INT': 0.9754786210274101,
 'manyFunctions100-INT': 0.8409570724841661}

The value is timing in this PR / timing on master.

Those results are gathered by running evm_jsontests_bench.sh and then using a half-finished Jupyter notebook. I don't think that notebook is publishable, but I'll put it up as a gist once I cleared it up.

@tomusdrw
Copy link
Collaborator

@sorpaas Great, that's good enough for me. I think the performance tests and perhaps evmbin/benches.rs should be a good indicator, so I'd focus on running those.
Maybe we add some more tests to evmbin/benches (separate PR) and use only this in future? Will simplify comparing the changes in the future.

Copy link
Contributor

@jimpo jimpo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor! (and speedup)

GasPriceTier::Invalid
impl GasPriceTier {
/// Returns the index in schedule for specific `GasPriceTier`
pub fn idx(&self) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these also be numbered with a C-style enum instead of having the separate mapping method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That "c-style enum" is actually just GasPriceTier. :)

We still need this idx function. It's used in schedule, where we dispatch based on the index to know about the default gas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't realize the values were assigned implicitly. In that case, seems like this method could just be implemented as *self as usize instead of the match, or inlined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure about that -- the performance is mostly same, but I'm a little bit concerned about clarity.

Unlike Instruction, We don't have enum-to-number natural corresponding for GasPriceTier. I tried adding the index to enum and make GasPriceTier a c-like enum. The issue, however, is that there're other number corresponding for GasPriceTier (like default gas cost). And the result might be a little bit confusing.

instructions::LOG0...instructions::LOG4 => {
let no_of_topics = instructions::get_log_topics(instruction);
instructions::LOG0 | instructions::LOG1 | instructions::LOG2 | instructions::LOG3 | instructions::LOG4 => {
let no_of_topics = instruction.log_topics().expect("log_topcis always return some for LOG* instructions; qed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/topcis/topics/

self.verify_instruction(ext, instruction, info, &stack)?;

// Calculate gas cost
let requirements = gasometer.requirements(ext, instruction, info, &stack, self.mem.size())?;
if do_trace {
ext.trace_prepare_execute(reader.position - 1, instruction, requirements.gas_cost.as_u256());
ext.trace_prepare_execute(reader.position - 1, instruction as u8, requirements.gas_cost.as_u256());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could use opcode instead of instruction as u8.

if info.tier == instructions::GasPriceTier::Invalid {
return Err(vm::Error::BadInstruction {
instruction: instruction
instruction: instruction as u8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could use opcode instead of instruction as u8.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have opcode variable in verify_instruction. Because instruction as u8 is basically a noop, I think this would be okay?

@5chdn 5chdn added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Jun 26, 2018
@5chdn
Copy link
Contributor

5chdn commented Jun 26, 2018

Please rebase on latest master to fix CI.

Copy link
Contributor

@ascjones ascjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@5chdn 5chdn merged commit 0bed597 into master Jun 27, 2018
@5chdn 5chdn deleted the sorpaas/evm-instructions branch June 27, 2018 11:33
ordian added a commit to ordian/parity that referenced this pull request Jun 27, 2018
dvdplm added a commit that referenced this pull request Jun 27, 2018
* master:
  Refactor evm Instruction to be a c-like enum (#8914)
  Fix deadlock in blockchain. (#8977)
  snap: downgrade rust to revision 1.26.2, ref snapcraft/+bug/1778530 (#8984)
  Use local parity-dapps-glue instead of crate published at crates.io (#8983)
  parity: omit redundant last imported block number in light sync informant (#8962)
  Disable hardware-wallets on platforms that don't support `libusb` (#8464)
  Bump error-chain and quick_error versions (#8972)
dvdplm added a commit that referenced this pull request Jun 27, 2018
* master:
  Refactor evm Instruction to be a c-like enum (#8914)
  Fix deadlock in blockchain. (#8977)
  snap: downgrade rust to revision 1.26.2, ref snapcraft/+bug/1778530 (#8984)
  Use local parity-dapps-glue instead of crate published at crates.io (#8983)
  parity: omit redundant last imported block number in light sync informant (#8962)
  Disable hardware-wallets on platforms that don't support `libusb` (#8464)
  Bump error-chain and quick_error versions (#8972)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. F6-refactor 📚 Code needs refactoring. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants