Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply strict uint64 casting #1746

Merged
merged 15 commits into from
Jul 15, 2020
Merged

Apply strict uint64 casting #1746

merged 15 commits into from
Jul 15, 2020

Conversation

hwwhww
Copy link
Contributor

@hwwhww hwwhww commented Apr 23, 2020

Child of #1707, #1739, and #1701.

Issue

We've already stated that all calculations should be in uint64 domain (#1739). However, it doesn't apply for Python native int.

This draft applies some strict uint64 castings to known function parameter typings by fixing #1707 int v.s. uint64 incompatible errors that mypy found.

Pros:

  • We can have more safety checks from SSZ classes (remerkleable) layer!

Cons:

  • Annoying casting everywhere.
  • Clients may already handle the uint64 in different ways with different languages' features. This PR may cause some implementation burden.
  • Note that this PR probably didn't find all calculations that need casting.

Set it to draft since I'm wondering if it would introduce a serious implementation burden for the client teams.

@hwwhww hwwhww added general:enhancement New feature or request phase0 general:RFC Request for Comments labels Apr 23, 2020
@djrtwo
Copy link
Contributor

djrtwo commented Apr 23, 2020

I doubt this would introduce any requisite changes on clients.

Note that this PR probably didn't find all calculations that need casting.

What do you mean? It did add a bunch of casts. Are you saying that the linter didn't force any of these?

Annoying casting everywhere.

Slightly resistant for this reason. Does the uint64 provide any additional functionality other than static analysis? Does the type actually overflow or does can it still grow past 2**64?

@hwwhww
Copy link
Contributor Author

hwwhww commented Apr 24, 2020

@djrtwo

Does the uint64 provide any additional functionality other than static analysis? Does the type actually overflow or does can it still grow past 2**64?

Thanks to @protolambda, it does check the boundaries and raise ValueError. For example:

>>> uint64(2**64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hwwang/.pyenv/versions/env38a/lib/python3.8/site-packages/remerkleable/basic.py", line 43, in __new__
    raise ValueError(f"value out of bounds for {cls}")
ValueError: value out of bounds for <class 'remerkleable.basic.uint64'>
>>> uint64(-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/hwwang/.pyenv/versions/env38a/lib/python3.8/site-packages/remerkleable/basic.py", line 40, in __new__
    raise ValueError(f"unsigned type {cls} must not be negative")
ValueError: unsigned type <class 'remerkleable.basic.uint64'> must not be negative

What do you mean? It did add a bunch of casts. Are you saying that the linter didn't force any of these?

I only added casting for the variables where mypy panicked. It's still possible that we may write the computations in some Python native variables that cause overflow/underflow to uint64. For example the edge case:

x = 1
y = -1
return unit64(x + y)

The linter can't detect that y is not uint64 type.

We could cover more cases once (i) remerkleable also checks __mul__ and __floordiv__ (ii) all constants are converted to uint64 as this PR.

@protolambda
Copy link
Collaborator

@hwwhww Remerkleable already does that, I implemented that shortly after that issue. See changelog: https://github.com/protolambda/remerkleable/blob/master/CHANGELOG.rst
Just update the version referenced in the spec 👍

@hwwhww hwwhww force-pushed the hwwhww/strict-uint64 branch from 051a9ee to b8e5941 Compare April 28, 2020 10:33
@protolambda
Copy link
Collaborator

Is this ready to go out of draft mode? And which release would we like to target with this?

@hwwhww hwwhww force-pushed the hwwhww/strict-uint64 branch from b8e5941 to 224ef35 Compare May 15, 2020 13:50
@hwwhww hwwhww marked this pull request as ready for review May 15, 2020 13:54
@hwwhww
Copy link
Contributor Author

hwwhww commented May 15, 2020

@protolambda @djrtwo
Ready to review now!

It does make the spec a bit more lengthy.

  • I'd say we should at least add castings to the constant definition section.
  • For the rest castings, it's good to have the protection from remerkleable with these casting. What do you think?

Copy link
Collaborator

@protolambda protolambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with some of the changes, but feel uneasy about some of the other changes. Would like a review from others here. At least it's all cosmetic/presentation/type checks.

@@ -729,14 +728,18 @@ def compute_shuffled_index(index: uint64, index_count: uint64, seed: Bytes32) ->

# Swap or not (https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf)
# See the 'generalized domain' algorithm on page 3
for current_round in range(SHUFFLE_ROUND_COUNT):
pivot = bytes_to_int(hash(seed + int_to_bytes(current_round, length=1))[0:8]) % index_count
for current_round in map(uint64, range(SHUFFLE_ROUND_COUNT)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the type of casting I'm most uneasy with. It just feels like a Python hack.
Does the following work for mypy?

current_round: uint64
for current_round in range(SHUFFLE_ROUND_COUNT):

Or is that too verbose for spec also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work for mypy since the issue is that the range(SHUFFLE_ROUND_COUNT) generator returns int.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current_round is also only converted to a single byte (see other comment about uint8). Avoiding this map would be nice.

Copy link
Contributor

@ericsson49 ericsson49 Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about changing the type of the first parameter of int_to_bytes to int ?
current_round can be kept int then.
Or the goal is to get read of unnecessary ints?

Alternatively one can introduce something like urange method, which will be the same map applied to range, though.

def urange(*args):
   return map(uint64, range(*args))

As uint64 is heavily used in pyspec, introducing unsigned range looks pretty reasonable.

Copy link
Contributor Author

@hwwhww hwwhww Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@protolambda

  1. Agreed with that current_round can be uint8.

  2. Currently, uint8 is not compatible with uint64 for mypy in remerkerble layer when we want to pass it to int_to_bytes. Although it's fine to set current_round to uint8 type in pyspec right now since we haven't changed our CI linter setting to as strict as Add SSZ utils to mypy checking scope #1707.

  3. For implementation-wise, I feel having int_to_bytes be an instance function in SSZ level would be neater.

    • We can define uint.uint_to_bytes in remerkerble:
    def uint_to_bytes(self, byteorder: str) -> bytes:
         return self.to_bytes(self.type_byte_length, byteorder=byteorder)
    • To decrease the requirements for pyspec readers to understand remerkerble, we can modify int_to_bytes to:
    def int_to_bytes(n: uint) -> bytes:
        """
        Return the serialization of ``n`` in ``ENDIANNESS``-endian in the byte length of ``uint`` type.
        """
        return n.uint_to_bytes(byteorder=ENDIANNESS)

^^^^ Updated: should be n.uint_to_bytes(byteorder=ENDIANNESS) instead of uint.uint_to_bytes(byteorder=ENDIANNESS)

What do you think?


@ericsson49
Hi!

  1. To me, the goal is setting a boundary of all variables in pyspec since Python int is unbounded. I hope it would decrease the ambiguity and make other implementers' lives better?
  2. IMHO It seems better to have urange supports all uint types. Especially if we want to set current_round to uint8 as @protolambda suggested. I'd say if we really want to avoid map for readability, the simplest way is:
for current_round in range(SHUFFLE_ROUND_COUNT):
    current_round = uint8(current_round)  # cast `int` to `uint8` type

What do you think?

  1. What do you think about the above proposal for int_to_bytes change?

Thank you both! 🙂

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define uint.uint_to_bytes in remerkerble:

We already have that, as well as stream variants, for every type :).

But for spec style and less hidden things, we generally avoid it.

The methods:

  • def encode_bytes(self) -> bytes:
  • def serialize(self, stream: BinaryIO) -> int:
  • def decode_bytes(cls: Type[V], bytez: bytes) -> V:
  • def deserialize(cls: Type[V], stream: BinaryIO, scope: int) -> V:

So uint8(123).encode_bytes() would just work to get it as bytes.

And there are a ton more util methods for size bounds, working with merkle backings, etc., but it's meant for tooling more than the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@protolambda

Oh great!

But for spec style and less hidden things, we generally avoid it.

Agreed with that we should generally avoid exposing SSZ APIs. Another solution is that we can describe int_to_bytes(n: uint) in the same way we describe hash_tree_root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an alternative solution PR: #1935

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwwhww
Hi! Sorry, I missed a notification, so responding somewhat late.

To me, the goal is setting a boundary of all variables in pyspec since Python int is unbounded. I hope it would decrease the ambiguity and make other implementers' lives better?

That's a great goal. It definitely helps. Basically, a typical developer would prefer using a bounded integer type whenever possible. And resolving possible range can be a problem.
In the particular case, it's however obvious that current_round is bounded.
But as current_round is intended to be uint8 it definitely worth using an explicit type/conversion here.

However, unbounded int cannot be completely avoided, I guess, e.g. legendre_bit uses it and private keys are sometimes used. But it would be definitely great to separate cases where bounded ints are enough from those, where arbitrary precision ints are more appropriate.

IMHO It seems better to have urange supports all uint types. Especially if we want to set current_round to uint8 as @protolambda suggested. I'd say if we really want to avoid map for readability, the simplest way is:

I think it's a matter of taste. Basically, all variants are clear, but some slightly less concise.

specs/phase0/beacon-chain.md Show resolved Hide resolved
specs/phase0/beacon-chain.md Outdated Show resolved Hide resolved
specs/phase0/beacon-chain.md Outdated Show resolved Hide resolved
index=state.eth1_deposit_index,
root=state.eth1_data.deposit_root,
)

# Deposits must be processed in order
state.eth1_deposit_index += 1
state.eth1_deposit_index = uint64(state.eth1_deposit_index + 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm it's so strange. exit_queue_epoch += Epoch(1) is okay, but state.eth1_deposit_index += uint64(1) returns error:

Incompatible types in assignment (expression has type "uint", variable has type "uint64")

Strange dark mypy hole. 🤦‍♀️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integer additions etc. are implemented on the uint base class in remerkleable, to not duplicate it everywhere. I will try to add typevar annotations that might help mypy understand it, then you can try this in the spec.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that epoch thing works, then state.eth1_deposit_index += DepositIndex(1) should also work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nvm, DepositIndex doesn't have its own type in the pyspec. Hmm, that error is weird.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still wonder about this thing, but suppose the other typing fixes outweigh this style issue.

@hwwhww hwwhww requested a review from djrtwo May 18, 2020 13:48
@hwwhww hwwhww force-pushed the hwwhww/strict-uint64 branch from 9a789cd to cd91380 Compare May 19, 2020 19:05
@hwwhww hwwhww changed the base branch from v012x to dev May 19, 2020 19:05
@protolambda
Copy link
Collaborator

Hey, we didn't include this issue since the v0.12 release was kept small where possible, but it could still be useful for better type checking. I'll try and resolve the merge conflicts, and I think we can define a def copy(v: V) -> V helper function to maybe clean up that added BeaconState type annotation.

@protolambda
Copy link
Collaborator

See strict-uint64-merge-dev branch to resolve merge conflict. This PR builds on a fork of the specs repository, so I pushed it to another branch.

@hwwhww hwwhww force-pushed the hwwhww/strict-uint64 branch from 50252ec to 4428a6a Compare June 18, 2020 05:50
@hwwhww hwwhww requested a review from protolambda June 18, 2020 07:03
Copy link
Collaborator

@protolambda protolambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typing fixes are generally good, but we need to iterate some more on things like the shuffle rounds type, otherwise we only confuse implementers with the new typing.

@@ -729,14 +728,18 @@ def compute_shuffled_index(index: uint64, index_count: uint64, seed: Bytes32) ->

# Swap or not (https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf)
# See the 'generalized domain' algorithm on page 3
for current_round in range(SHUFFLE_ROUND_COUNT):
pivot = bytes_to_int(hash(seed + int_to_bytes(current_round, length=1))[0:8]) % index_count
for current_round in map(uint64, range(SHUFFLE_ROUND_COUNT)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current_round is also only converted to a single byte (see other comment about uint8). Avoiding this map would be nice.

source = hash(seed + int_to_bytes(current_round, length=1) + int_to_bytes(position // 256, length=4))
source = hash(
seed
+ int_to_bytes(current_round, length=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this count as a limitation to the shuffle-round counts, making its maximum value 255 (inclusive)? Should we change it to a uint8 type? And the SHUFFLE_ROUND_COUNT too? I don't think clients actually use uint64 here at least. E.g. lighthouse uses uint8.

Copy link
Contributor Author

@hwwhww hwwhww Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my response above #1746 (comment)

@protolambda
Copy link
Collaborator

For some of the int -> uintN casts, it may be unnecessary in a safe way when protolambda/remerkleable#4 is tested and integrated. We have to find some balance, in some places the explicit conversions still help the reader.

@protolambda
Copy link
Collaborator

Merged new arithmetic operators for better typing into remerkleable master branch. Let's try them out here, and if it works well, I'll cut a remerkleable release with the feature and we can do final polish + merge here.

@hwwhww
Copy link
Contributor Author

hwwhww commented Jun 25, 2020

@protolambda

There are some issues of the remerkleable master branch:

It disallows the Python sugar syntax like:

    num_validators = spec.SLOTS_PER_EPOCH * 8
    return [spec.MAX_EFFECTIVE_BALANCE] * num_validators

Error message:

eth2spec/test/context.py:120: in default_balances
    return [spec.MAX_EFFECTIVE_BALANCE] * num_validators
../../../../remerkleable/remerkleable/basic.py:95: in __rmul__
    return self.__mul__(other)
../../../../remerkleable/remerkleable/basic.py:92: in __mul__
    return self.__class__(super().__mul__(self.__class__.coerce_view(other)))
../../../../remerkleable/remerkleable/basic.py:178: in coerce_view
    return cls(v)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'remerkleable.basic.uint64'>, value = [32000000000]

    def __new__(cls, value: int):
>       if value < 0:
E       TypeError: '<' not supported between instances of 'list' and 'int'

../../../../remerkleable/remerkleable/basic.py:72: TypeError

We can change it to:

    num_validators = spec.SLOTS_PER_EPOCH * 8
    default_balances = []
    for _ in range(num_validators):
        default_balances.append(spec.MAX_EFFECTIVE_BALANCE)
    return default_balances

But it would be much verbose than before. Is it possible to allow [spec.MAX_EFFECTIVE_BALANCE] * num_validators?

@protolambda
Copy link
Collaborator

Ah yes, forgot about non-integer __mul__ there. I'll fix that after the eth2 call today.

@protolambda
Copy link
Collaborator

Updated remerkleable to do multiplication python-style if the other input is not an int. And tested with lists.

@protolambda
Copy link
Collaborator

Made remerkleable pass mypy checks (within remerkleable, not yet the spec). Hopefully it helps sort out some typing issues here. PR: protolambda/remerkleable#6

@protolambda
Copy link
Collaborator

Update: releasing a new remerkleable version which infers a lot of the types here, making the changes in the spec less invasive.

@protolambda
Copy link
Collaborator

Released remerkleable 0.1.17, integrated into spec, removed redundant type casts, added type coercions for new places. Test speed took a hit however, the constant coercions in uints and length checks make things slower. At least it catches more things now though.

See type-infer branch, branched off from this PR. https://github.com/ethereum/eth2.0-specs/tree/type-infer

remerkleable==0.1.17 + infer type
@protolambda
Copy link
Collaborator

Updating some python code that uses the spec, and noticed that with these new changes, we can change previous v0.12 type hacks like state.slot = Slot(state.slot + 1) in the spec back to state.slot += 1

@protolambda protolambda added the general:typing Spec typing, no breaking changes label Jul 8, 2020
@djrtwo
Copy link
Contributor

djrtwo commented Jul 15, 2020

Where are we at on this PR @hwwhww?

I'm interested in getting this into v0.12.2

@eisaeidy
Copy link

Ok

@hwwhww
Copy link
Contributor Author

hwwhww commented Jul 15, 2020

@djrtwo
It is ready for me! I just merge dev back to it again in case anything has changed these days.
Need approval from you or @protolambda here.

I'm slightly inclined to merge this one before #1935. But it's okay either way.

Also, to double confirm with you and @protolambda, do you think #1935 is ready to go? I haven't seen any objections from client teams so far. 😄

Copy link
Collaborator

@protolambda protolambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@djrtwo I think this can go into next release, when would that be?

@djrtwo
Copy link
Contributor

djrtwo commented Jul 15, 2020

within the week.
Want to get in gossipsub params #1958
and clean up phase 1 test gen #1957

@djrtwo djrtwo merged commit 6a7a47d into dev Jul 15, 2020
@djrtwo djrtwo deleted the hwwhww/strict-uint64 branch July 15, 2020 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general:enhancement New feature or request general:RFC Request for Comments general:typing Spec typing, no breaking changes phase0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants