diff --git a/proposals/p0144.md b/proposals/p0144.md new file mode 100644 index 0000000000000..11aa139acd0a2 --- /dev/null +++ b/proposals/p0144.md @@ -0,0 +1,318 @@ +# Numeric literal semantics + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/144) + +## Table of contents + + + +## Table of contents + +- [Problem](#problem) +- [Background](#background) +- [Proposal](#proposal) +- [Details](#details) + - [Prelude support](#prelude-support) + - [Implicit conversions](#implicit-conversions) + - [Examples](#examples) +- [Alternatives considered](#alternatives-considered) + - [Use an ordinary integer or floating-point type for literals](#use-an-ordinary-integer-or-floating-point-type-for-literals) + - [Use same type for all literals](#use-same-type-for-all-literals) + - [Allow leading `-` in literal tokens](#allow-leading---in-literal-tokens) + + + +## Problem + +When a numeric literal appears in a program, we need to understand its +semantics: + +- What type does it have? +- What value is produced by operations on it? +- When can it validly be used to initialize an object? + +## Background + +In C++, numeric literals have either an integral type or a floating-point type. +C++ provides permission for implementations to add extended integral types, but +in practice (for bad reasons relating to `intmax_t`) implementations do not do +so, so there are a small finite set of types that any given numeric literal +might have: + +- `int`, `long`, `long long`, or `unsigned` versions of these +- `float`, `double`, or `long double` + +The choice of type is determined solely by the literal. + +The C++ approach is error-prone and problematic: + +- Lossy conversions from literals in initializers are permitted. +- Lossy operations on literals are permitted; for example, on a typical + implementation, `1 << 60` has value `0` because `1` is a 32-bit type. +- Attempting to naturally express some values has undefined behavior; for + example, `int x = -2147483648;` typically results in undefined behavior even + when -2147483648 is a valid `int` value. +- Integer literals with value 0 have special semantics that are lost when the + integer is passed to a function: "perfect" forwarding doesn't work for such + literals. +- The built-in types are privileged: only the types listed above have + literals. There is no syntax for a 64-bit integer literal, only for (for + example) a `long int` literal, which may or may not 64 bits wide. +- The type of a literal can be unpredictable in portable code, as it can + depend on which type a particular value happens to fit into. + +## Proposal + +Numeric literals have a type derived from their value, and can be converted to +any type that can represent that value. + +Simple operations such as arithmetic that involve only literals also produce +values of literal types. + +## Details + +Numeric literals have a type derived from their value. Two integer literals have +the same type if and only if they represent the same integer. Two real number +literals have the same type if and only if they represent the same real number. + +That is: + +- For every integer, there is a type representing literals with that integer + value. +- For every rational number, there is a type representing literals with that + real value. +- The types for real numbers are distinct from the types for integers, even + for real numbers that represent integers. `var x: i32 = 1.0;` is invalid. + +Primitive operators are available between numeric literals, and produce values +with numeric literal types. For example, the type of `1 + 2` is the same as the +type of `3`. + +Numeric types can provide conversions to support initialization from numeric +literals. Because the value of the literal is carried in the type, a type-level +decision can be made as to whether the conversion is valid. + +The integer types defined in the standard library permit conversion from integer +literal types whose values are representable in the integer type. The +floating-point types defined in the Carbon library permit conversion from +integer and rational literal types whose values are between the minimum and +maximum finite value representable in the floating-point type. + +### Prelude support + +The following types are defined in the Carbon prelude: + +- An arbitrary-precision integer type. + + ``` + class BigInt; + ``` + +- A rational type, parameterized by a type used for its numerator and + denominator. + + ``` + class Rational(T:! Type); + ``` + + The exact constraints on `T` are not yet decided. + +- A type representing integer literals. + + ``` + class IntLiteral(N:! BigInt); + ``` + +- A type representing floating-point literals. + + ``` + class FloatLiteral(X:! Rational(BigInt)); + ``` + +All of these types are usable during compilation. `BigInt` supports the same +operations as `Int(n)`. `Rational(T)` supports the same operations as +`Float(n)`. + +The types `IntLiteral(n)` and `FloatLiteral(x)` also support primitive integer +and floating-point operations such as arithmetic and comparison, but these +operations are typically heterogeneous: for example, an addition between +`IntLiteral(n)` and `IntLiteral(m)` produces a value of type +`IntLiteral(n + m)`. + +### Implicit conversions + +`IntLiteral(n)` converts to any sufficiently large integer type, as if by: + +``` +impl [template N:! BigInt, template M:! BigInt] + IntLiteral(N) as ImplicitAs(Int(M)) + if N >= Int(M).MinValue as BigInt and N <= Int(M).MaxValue as BigInt { + ... +} +impl [template N:! BigInt, template M:! BigInt] + IntLiteral(N) as ImplicitAs(Unsigned(M)) + if N >= Int(M).MinValue as BigInt and N <= Int(M).MaxValue as BigInt { + ... +} +``` + +The above is for exposition purposes only; various parts of this syntax are not +yet decided. + +Similarly, `IntLiteral(x)` and `FloatLiteral(x)` convert to any sufficiently +large floating-point type, and produce the nearest representable floating-point +value. Conversions in which `x` lies exactly half-way between two values are +rejected, as +[previously decided](/docs/design/lexical_conventions/numeric_literals.md#ties). +Conversions in which `x` is outside the range of finite values of the +floating-point type are also represented, rather than saturating to the finite +range or producing an infinity. + +### Examples + +```carbon +// This is OK: the initializer is of the integer literal type with value +// -2147483648 despite being written as a unary `-` applied to a literal. +var x: i32 = -2147483648; + +// This initializes y to 2^60. +var y: i64 = 1 << 60; + +// This forms a rational literal whose value is one third, and converts it to +// the nearest representable value of type `f64`. +var z: f64 = 1.0 / 3.0; + +// This is an error: 300 cannot be represented in type `i8`. +var c: i8 = 300; + +fn f[template T:! Type](v: T) { + var x: i32 = v * 2; +} + +// OK: x = 2_000_000_000. +f(1_000_000_000); + +// Error: 4_000_000_000 can't be represented in type `i32`. +f(2_000_000_000); + +// No storage required for the bound when it's of integer literal type. +struct Span(template T:! Type, template BoundT:! Type) { + var begin: T*; + var bound: BoundT; +} + +// Returns 1, because 1.3 can implicitly convert to f32, even though conversion +// to f64 might be a more exact match. +fn G() -> i32 { + match (1.3) { + case _: f32 => { return 1; } + case _: f64 => { return 2; } + } +} + +// Can only be called with a literal 0. +fn PassMeZero(_: IntLiteral(0)); + +// Can only be called with integer literals in the given range. +fn ConvertToByte[template N:! BigInt](_: IntLiteral(N)) -> i8 + if N >= -128 and N <= 127 { + return N as i8; +} + +// Given any int literal, produces a literal whose value is one higher. +fn OneHigher(L: IntLiteral(template _:! BigInt)) -> auto { + return L + 1; +} +// Error: 256 can't be represented in type `i8`. +var v: i8 = OneHigher(255); +``` + +## Alternatives considered + +### Use an ordinary integer or floating-point type for literals + +We could decide on a fixed-width type based on the form of the literal, for +example using a type suffix with some rules to determine what type to pick for +unsuffixed literals. + +Advantages: + +- This follows what C++ does. +- Can determine the type of a floating-point number without requiring + contextual information. + +Disadvantages: + +- Surprising behavior when applying an operator to a literal would result in + overflow. Even if we diagnose this, a diagnostic that `-2147483648` is + invalid because it overflows is surprising. +- Creates additional literal syntax that users will need to understand. +- May select types that don't match the programmer's expectations. +- Whatever types we pick are privileged. + +### Use same type for all literals + +We could give literals a single, arbitrary-precision type (say, `Integer` for +integer literals and `Rational` for real literals). + +Advantages: + +- Only introduces two new types, not an unbounded parameterized family of + types. +- Writing a function that takes any integer literal can be done with more + obvious syntax and less syntactic overhead. Instead of: + ``` + fn OneHigher(L: IntLiteral(template _:! BigInt)); + ``` + we could write + ``` + fn OneHigher(template L:! Integer); + ``` + However, with this proposal, a function taking any integer expression that + can be evaluated to a constant can be written as + ``` + fn F(template N:! BigInt); + ``` + and such a function would accept all integer literals, as well as + non-literal constants. + +Disadvantages: + +- Our mechanism for specifying the behavior of operations such as arithmetic + is based on interface implementations, which are looked up by type. + Supporting `impl` selection based on values would introduce substantial + complexity. +- If we introduce an arbitrary-precision integer type, it would be + inconsistent to support it only during compilation. However, if we allow its + use at runtime, programs may use it accidentally, with an invisible + performance cost. For example, `var x: auto = 123;` would result in `x` + having an infinite-precision type, possibly involving invisible dynamic + allocation. + - Under this proposal, the type of `x` is a type that can only represent + the value `123`; as such, `x` is effectively immutable. The + arbitrary-precision integer type introduced in this proposal can only be + used explicitly by programs naming it. + +### Allow leading `-` in literal tokens + +We could treat a leading `-` character as part of a numeric literal token, so +that -- for example -- `-123` would be a single `-123` token rather than a unary +negation applied to a literal `123`. + +Advantages: + +- This would narrowly solve the problem that `INT_MIN` cannot be written + directly, without any of the other implications of this proposal. + +Disadvantages: + +- Makes the behavior of unary `-` less uniform. +- Prevents the introduction of infix or postfix operators that bind more + tightly than unary `-`, such as an infix exponentiation operator: `-2**2` + may be expected to evaluate to -4, not to +4.