Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VIP: uint256 as a first class citizen #764

Closed
jacqueswww opened this issue Apr 6, 2018 · 21 comments
Closed

VIP: uint256 as a first class citizen #764

jacqueswww opened this issue Apr 6, 2018 · 21 comments
Labels
VIP: Approved VIP Approved VIP: Discussion Used to denote VIPs and more complex issues that are waiting discussion in a meeting

Comments

@jacqueswww
Copy link
Contributor

jacqueswww commented Apr 6, 2018

Preamble

VIP: #764
Title: Uint256 as a first class citizen.
Author: Jacques Wagener
Type: Standard Track 
Status: Draft
Created: 2018-04-06
Requires (*optional): <VIP number(s)>
Replaces (*optional): <VIP number(s)>

Simple Summary

Allow generic arithmetic operations to work instead of using specific functions.

Abstract

Currently using uint256 can be very tedious and difficult because all arithmetic has to occur with

Motivation

Initially it was thought that working with unsigned integer are needed and the very large signed integer types could be used and converted. However time, and time again the need for uint256 to exist as arisen. Using uint256_* functions do do any arbitrary calculation becomes a serious issue for readability.

Specification

function operator
bitwise_and remains
bitwise_or remains
bitwise_xor remains
bitwise_not remains
uint256_add +
uint256_sub -
uint256_mul *
uint256_div /
uint256_exp **
uint256_mod %
uint256_addmod remains
uint256_mulmod remains
uint256_gt >
uint256_ge >=
uint256_lt <
uint256_le <=
shift remains

NB comparison between uint256 and other types should not be implicit.

It is important double check SafeMath is inserted, however working of the functions templates should insert the existing SafeMath behaviour, as seen in:
https://github.com/ethereum/vyper/blob/master/vyper/functions/functions.py#L559

Literal integer assignment cases to handled on assignment (as discussed below):

Case 1. L < -2**127. Reject (preferably compile-time).
Case 2. -2**127 <= L < 0. L must be of int128. No conversion.
Case 3. 0 <= L < 2**127. The type of L will be identified based on its context.
Case 4. 2**127 <= L < 2**256. L must be of uint256. No conversion.
Case 5. L >= 2**256. Reject (preferably compile-time).

Now, in Case 3, the type of the literal can be identified as follows.

First, we reject an integer expression (arithmetic, comparison, etc.) whose sub-expressions are all integer literals. (This restriction should be acceptable by users, as @fubuloubu pointed out.)

Backwards Compatibility

We can consider keeping uint256_ functions. But I think removing those functions is best to avoid confusion for new comers.

Copyright

Copyright and related rights waived via CC0

@jacqueswww jacqueswww added the VIP: Discussion Used to denote VIPs and more complex issues that are waiting discussion in a meeting label Apr 6, 2018
@jacqueswww jacqueswww changed the title uint256 as a first class citizen VIP: uint256 as a first class citizen Apr 6, 2018
@fubuloubu
Copy link
Member

Agree with remove functions.

All type comparisons and assignments will no longer perform any implict conversions under this suggestion, therefore conversions will have to be explicitly provided.

This should round out our type system upgrade for the beta release.

@jacqueswww
Copy link
Contributor Author

jacqueswww commented Apr 9, 2018

From the meeting it become clear this is some syntax to consider:

x: uint256
x += 1

should not be allowed.

x: uint256
x += convert(1, 'uint256')
x += uint256(1) ??

should be allowed.

Thoughts?

@fubuloubu
Copy link
Member

I disagree.

I think x += 1 we could figure out the implicit casting and do a compilation range check against that type.

x += convert(1, 'uint256') is much more opaque from a readability standpoint, affecting auditibility.

@yzhang90
Copy link

yzhang90 commented Apr 9, 2018

Implicit casting may introduce some complexity in the compiler because you need to check different range for number literal under different context. For example,
x:unit256 x+= 2^127
should be allowed because 2^127 is within the range of uint256.
However,
x:int128 x+= 2^127
should be rejected because 2^127 is not within the range of int128.
This implies that parsing a integer literal depends on the context it is in.

@daejunpark
Copy link
Contributor

daejunpark commented Apr 9, 2018

There are cases where it is not trivial to identify (or define a semantics of) the "context".

For example, is the following inequality true or rejected?

-1 < 170141183460469231731687303715884105728 - 170141183460469231731687303715884105728 + 1

There are two possible interpretations of the type of the expression:

170141183460469231731687303715884105728 - 170141183460469231731687303715884105728
  • uint256: Since 170141183460469231731687303715884105728 is of uint256, their arithmetic expression can be seen as uint256. But in this interpretation, the above should be rejected since the RHS is of uint256 while the LHS is of int128.
  • int128: Since the computation result is 0, the type of expression can be seen as int128. In this case, the above will be true. But this interpretation requires additional reasoning (e.g., considering computation results), which could be not easily implemented (or error-prone).

One more cute example. The following would probably be rejected, although it seems trivially true:

-1 < 170141183460469231731687303715884105728    # -1 < 2**127

@fubuloubu
Copy link
Member

I think these corner cases could throw "undetermined type" by an implict literal type checking algorithm I hesitate to call it an implict conversion, because you use contextual information from the other types to determine what the type is of the literal conversion. I think at first it should even reject mathematical operations, preferring instead to explicitly write the literal value out. Together with the decimal demarcation differentiating decimal literals from integer literals, we should have enough information to determine type in a vast majority of cases without confusion on the reader's behalf.

@daejunpark
Copy link
Contributor

@fubuloubu I agree with the fact that my example is not common, and no sane developer will write this kind of programs. :)

OK. Then, just to make sure, let me spell out your type checking algorithm a bit.


Let's consider only the two types: int128 and uint256.
Let L be an integer literal.

Case 1. L < -2**127. Reject (preferably compile-time).
Case 2. -2**127 <= L < 0. L must be of int128. No conversion.
Case 3. 0 <= L < 2**127. The type of L will be identified based on its context.
Case 4. 2**127 <= L < 2**256. L must be of uint256. No conversion.
Case 5. L >= 2**256. Reject (preferably compile-time).

Now, in Case 3, the type of the literal can be identified as follows.

First, we reject an integer expression (arithmetic, comparison, etc.) whose sub-expressions are all integer literals. (This restriction should be acceptable by users, as @fubuloubu pointed out.)

Then, for any integer expression, there is a non-literal sub-expression whose type can be deterministically identified, and we can infer the type of a literal based on the type of its non-literal sibling.

(TODO: prove/check that any non-literal expression's type can be deterministically identified.)

@fubuloubu
Copy link
Member

I was picturing it more like this: a statement is a collection of literal and non literal members, connected through expressions and operators. On an assignment or assertion expression, there is a clearly identified type on the LHS. The RHS must therefore clearly evaluate it's type to match the LHS. Evaluation of the expression through it's operators must match down to the literal and non literal values that make it up. Since non literals have explicit types, that means it should be clearly defined through the operators what type the matching literal should have. This may become more complex if arbitrary combinations of literals are allowed to take place, but it should still be possible to figure it out if even one node of the AST is a statically typed non-literal value. Corner cases occur when an operator accepts more than one type and implicitly casts, but since we are removing that more generally we shouldn't have to worry as much.

@daejunpark
Copy link
Contributor

I think we are on the same page. I can take a look again once it is implemented.

@dani-corie
Copy link

I think the way Go handles literals should be followed post-beta.

(Ie. numeric literals have no type, they have arbitrary precision, operators between literals are executed at compile time with unlimited precision math. On assignment or comparison between a variable and a literal, it is checked if the literal cleanly converts to the variable's type (eg. is integer if the variable is integer, fits in variable range, etc.), and if yes, it's treated as that type, if not, a compile time error is thrown.)

@daejunpark
Copy link
Contributor

@daniel-jozsef that makes sense to me.

@tpmccallum
Copy link

tpmccallum commented May 14, 2018

Hi @jacqueswww , @daniel-jozsef , @DavidKnott could I please get some advice on a Vyper equivalent to Solidity’s SafeMath functionality. I would like to add a paragraph to this Vyper chapter < https://github.com/ethereumbook/ethereumbook/blob/develop/vyper.asciidoc > but need your help answering the following questions for the readers:

  1. Do Vyper developers have access to a SafeMath equivalent?
  2. Do Vyper developers have access to documentation on how to implement SafeMath syntax?

Alternatively to the above, perhaps the following is more accurate

  1. Does Vyper compilation deal with the nuances of SafeMath implementation on behalf of the developer?

I write about this in light of the recent overflow issue in a Solidity smart contract < https://medium.com/@peckshield/alert-new-batchoverflow-bug-in-multiple-erc20-smart-contracts-cve-2018-10299-511067db6536 > where a mistake was made even though SafeMath was available.

It would be great to make this crystal clear for future Vyper developers seeing’s how Vyper is so safety focussed.

Any advice (big or small) and links to code would be greatly appreciated.
Thanks so much guys
Kind regards
Tim

@ben-kaufman
Copy link
Contributor

@tpmccallum Vyper has implemented SafeMath at the compiler level so there is no need to worry about that.
I don't remember the exact part of the code for it but I can try to find it out later.

@tpmccallum
Copy link

tpmccallum commented May 14, 2018

Hi @ben-kaufman
Thank you so much!
Hey, I see that the Vyper functions
< https://github.com/ethereum/vyper/blob/master/vyper/functions/functions.py >
are able to enforce safety (SafeMath equivalent). Is this the area of code which I should be referencing? For example should developers be using these functions instead of writing their own functions? If these Vyper functions file are as deep as the safety implementation goes, what is stopping developers from by passing the functions with their own code?
Is the compiler really able to ensure safety if developers write bad code?
Sorry about the overload of questions, I am really interested in maximum safety of Vyper.
Kind regards
Tim

@jacqueswww
Copy link
Contributor Author

@tpmccallum

Yes SafeMath is built into vyper: https://github.com/ethereum/vyper/blob/master/vyper/parser/expr.py#L275

So any airithmetic with a uint256 will already include the necessary exception cases.

int128 will also be supported soon #475 (it's on the roadmap for the beta release).

@jacqueswww
Copy link
Contributor Author

To answer the question on safety, vyper has 2 methods of protecting against overflows.

1.) safemath on ints (as indicated above).
2.) clamps

Clamps are enforced whenever a literal constant is loaded, a value is passed into a function or a variable is assigned.
We have custom functions in LLL for this. So if you write a sample program and use the -f ir function you'll see the IR has statements such as uclamplt etc. in them.

Hope that answers your questions - I am on the gitter channel as well if you need further explanations 😄

@tpmccallum
Copy link

Hi @jacqueswww
Sorry to bother you, does this mean that only code compiled into LLL will be safe in relation to integer overflow/underflow?
For example can a smart contract be compiled into only bytecode (not LLL) and then be successfully executed on the network?
If so, does this still leave opportunities for Vyper code (not compiled to LLL) to have integer overflow problems?
Many thanks
Tim

@jacqueswww
Copy link
Contributor Author

jacqueswww commented May 18, 2018

Hey Tim!

LLL is the IR (https://en.m.wikipedia.org/wiki/Intermediate_representation) we use for vyper.
The stages of compilation are roughly AST -> LLL -> bytcode.

So that means these safety measures will always be included and can not be switched off if you use vyper (at beta release since int128 safemath work will be done soon, uint256 has already been done).

@tpmccallum
Copy link

Hi @jacqueswww
Thank you so much, that was a great response. I understand.

@tpmccallum
Copy link

Hi again @jacqueswww
Just a quick additional question. I see that the very early version of the parser.py file entertained variable typecasting. I see that the most recent version of that file (and any other files for that matter) do not entertain implicit or explicit variable typecasting.

Is it safe to say that Vyper does not support variable typecasting?

I would be pleased if this is the case, seeings how typecasting can cause odd behaviour and confusion (truncating etc). Just fact checking as I tick off the final TODO in the Vyper book chapter.
Thanks so much for your help.
Kind regards
Tim

@jacqueswww
Copy link
Contributor Author

jacqueswww commented May 18, 2018

Hi @tpmccallum ,

We do allow type casting (using the convert function), however if you are trying to convert a too big uint256 to uint128 for example the casting will cause a transaction error. So truncating will never occur, instead an error will be thrown (transaction error / runtime error).

Additionally we do precompilation of literals, and disallow assigning a too large uint256 to int128 at compile time (see the special cases in this VIP above).

Up until recently we did allow implicit conversions in arithmetic ie. between uint256/int128/decimal, but this is partly (and soon fully) removed.

Please checkout the discussions on the below VIPs, the comments are useful to the reasoning behind this removal ;)
#775
#716
#737

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VIP: Approved VIP Approved VIP: Discussion Used to denote VIPs and more complex issues that are waiting discussion in a meeting
Projects
None yet
Development

No branches or pull requests

7 participants