Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vyper jumptables #253

Open
karmacoma-eth opened this issue Jan 26, 2024 · 3 comments
Open

Support vyper jumptables #253

karmacoma-eth opened this issue Jan 26, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@karmacoma-eth
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Newer versions of vyper use jumptable-based dispatchers. We don't currently handle them very gracefully because of the mod operation.

For instance say a vyper contract implements 3 functions, the dispatcher will do something like this:

jump_table_offset = XXX   # (typically at the end of bytecode)
jump_table_index = input % 3
jump_target = codecopy(jump_table_offset + jump_table_index * 2)
JUMP(jump_target)

Because of the mod operation, halmos will fail with NotConcreteError(f"symbolic JUMP target: {dst}") or a symbolic CODECOPY error

Doing this from memory, so I'm not sure exactly what input is used, but if it is concrete then we should be able to resolve it. If it is symbolic, we may have to run a solver query to know which are the possible values, or identify the pattern and loop over the jumptable values

cc @zobront @charles-cooper

@karmacoma-eth karmacoma-eth added the enhancement New feature or request label Jan 26, 2024
@karmacoma-eth
Copy link
Collaborator Author

Here is an annotated example of a jump table dispatcher, looks like an example with 27 functions. And the input is the usual first 4 bytes of calldata (4byte)

PUSH0 CALLDATALOAD PUSH1 0xE0 SHR    || stack: [4byte]
PUSH1 0x2 PUSH1 0x1B DUP3 MOD        || stack: [4byte % 27, 2, 4byte]
PUSH1 0x1 SHL                        || stack: [2 * (4byte % 27), 2, 4byte]
PUSH2 0x125 ADD                      || stack: [0x125 + 2 * (4byte % 27), 2, 4byte]
PUSH1 0x1E CODECOPY PUSH0 MLOAD JUMP || stack: [4byte] => jump to loc at `0x125 + 2 * (4byte % 27)`

@karmacoma-eth
Copy link
Collaborator Author

Not perfect, but here is a potential idea for how to solve this in CODECOPY:

    elif opcode == EVM.CODECOPY:
        loc: int = ex.st.mloc()
        offset = ex.st.pop()

        if is_concrete(offset):
            concrete_offset = int_of(offset)
            print(green(f"XXX CONCRETE CODECOPY {hexify(ex.this)}@{ex.pc} XXX"))

            size: int = int_of(ex.st.pop(), "symbolic CODECOPY size")
            wextend(ex.st.memory, loc, size)

            codeslice = ex.pgm[concrete_offset : concrete_offset + size]

            actual_size = byte_length(codeslice)
            if actual_size != size:
                raise HalmosException(
                    f"CODECOPY: expected {size} bytes but got {actual_size}"
                )

            ex.st.memory[loc : loc + size] = iter_bytes(codeslice)

        else:
            offsetSolver = SolverFor("QF_BV")
            codesize = con(len(ex.pgm))

            # TODO: we might want to also add a solution check for offset > codesize
            offsetSolver.add(ULE(offset, codesize))

            offsets = set([])
            result = offsetSolver.check()
            while result == sat:
                value = offsetSolver.model().evaluate(offset).as_long()
                offsetSolver.add(offset != value)
                offsets.add(value)
                result = offsetSolver.check()

            if self.options.get("debug"):
                print(f"Splitting execution for CODECOPY with possible offsets {offsets}")

            # say we found 2 possible offset values a and b, we need to
            # to create 2 new executions where we're going to reexecute
            # this CODECOPY instruction (same pc), but one for each offset
            for offset_value in offsets:
                new_ex = self.create_branch(ex, offset == offset_value, ex.pc)

                # replace the items we have consumed on the stack
                new_ex.st.push(con(offset_value))
                new_ex.st.push(con(loc))

                stack.append((new_ex, step_id))

            # we're done with this execution
            continue

note the use of a separate offsetSolver to enumerate possible values

some problems with this:

  • we may have to deal with uninterpreted functions like evm_bvurem -- might need to replace with the real SMT bvurem operation?
  • the solver may return unknown, for completeness we may want to keep iterating until result == unsat (meaning that we know there is no other possible solution)

Potentially, we could also fallback to another mechanism. Specifically, we could do pattern matching on the symbolic expression and see that it corresponds to something like offset + evm_bvurem(4byte, num_functions) * 2 and avoid calling a solver for this

@charles-cooper
Copy link

btw the layout of the dispatcher is in the initcode cbor metadata. using this you should be able to reliably tell what kind (-Ocodesize vs -Ogas) of jumptable it is. here's an example signature with -Ocodesize:

cbor2.loads(bytes.fromhex(initcode.removeprefix("0x"))[-22:])
[1264, [5, 84], 0, {'vyper': [0, 4, 0]}]

and with -Ogas:

(1364, [24], 0, {'vyper': (0, 4, 0)})

(you can ignore the tuple vs list distinction; they are equivalent in cbor).

the codesize optimized bytecode has 2 data sections; the gas-optimized has 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants