Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Untyped number literals #2995

Closed
oprypin opened this issue Jul 14, 2016 · 28 comments · Fixed by #6074
Closed

Untyped number literals #2995

oprypin opened this issue Jul 14, 2016 · 28 comments · Fixed by #6074

Comments

@oprypin
Copy link
Member

oprypin commented Jul 14, 2016

I suggest that number literals like 7 should not be immediately bound to the type Int32, but instead they should be implicitly convertible to any Number. Similarly, fractional number literals like 3.43 should be implicitly convertible to any Float. To be more exact, no "conversion" would ever take place, ideally the literals would be untyped until they're actually used.

In situations where a type is required (e.g. when writing a = 0) the literals default to Int32/Float64 like they used to.
Typed literals like 1.2f32 keep working like they used to.

This change is expected to be backwards compatible, just leads to less verbose code and more permissive compilation — with no downsides (other than compiler complexity), in my opinion.


In other words, I'm sick of these errors:

def shoot(x : Float32, y : Float32)
end

shoot(1.23, 4.56)
# no overload matches 'shoot' with types Float64, Float64
record Point, x : Float64, y : Float64

Point.new(0, 0)
# no overload matches 'Point.new' with types Int32, Int32
record Color, r : UInt8, g : UInt8, b : UInt8, a : UInt8 = 0

Color.new(255, 128, 0)
# no overload matches 'Color.new' with types Int32, Int32, Int32

Color.new(255u8, 128u8, 0u8) 
# still broken, can you see why?
# instance variable '@a' of Color must be UInt8, not Int32

This change would make these examples work.

@refi64
Copy link
Contributor

refi64 commented Jul 14, 2016

To an extent, this would probably be classified as a basic form of type deduction.

@oprypin
Copy link
Member Author

oprypin commented Jul 14, 2016

This probably shouldn't change (still a union of two types):

a = 5i8
a = 10 if true

An interesting complicated case:

struct Point(T)
  def initialize(@x : T, @y : T)
  end
end
Point.new(1, 0.5)  # Should be Point(Float64) - this is arguable

@asterite
Copy link
Member

Yes, literals should be special. It's a bit hard to implement, but it's definitely a good idea, and we'd also like the language to work this way.

Question: should 1 pass as a BigInt too? :-)

@oprypin
Copy link
Member Author

oprypin commented Jul 14, 2016

@asterite I think we should consider only numbers that have literals.

@asterite
Copy link
Member

I mean:

def foo(x : BigInt)
end

foo(1)

@asterite
Copy link
Member

We could probably make it work in those cases too, by hardcoding the rules in the compiler. That should cover most of the cases. I still need to think how to implement this...

@oprypin
Copy link
Member Author

oprypin commented Jul 14, 2016

@asterite I mean you can't create a BigInt directly with a literal, so it does not apply. It's not the reason why it shouldn't apply, just a criterion.

I think data types that are not tied to the compiler shouldn't be considered. It's very easy for this to get out of hand with all these special cases.

@ozra
Copy link
Contributor

ozra commented Jul 14, 2016

I wouldn't mind also seeing BigInt / BigFloat get some more status - it would make scientific computing with Crystal a breeze!

@trans
Copy link

trans commented Sep 25, 2016

Could there be a set of literal types, e.g. Int::Any, Float::Any, etc. These types are never used concretely but are coerced using conversion methods defined in them. That way it would be extensible. Is that a helpful way to approach it?

@ozra
Copy link
Contributor

ozra commented Sep 25, 2016

I think a compiler level, pragmatic, "literal to type"-mapping procedure would be most appropriate, instead of involving types and conversion methods. This would be deterministic and reasonable enough imo:

Subject: number literal without kind-suffix:

Order of Choice:

  • If it's a real number (has . or e): try the literal as one of these, in order: Float64, Float32, BigDecimal?
  • If it's a whole number, try, in order: Int32, Int64, Float64, Float32, UInt64, UInt32, Int8, UInt8, BigInt?, Int16, UInt16

Contexts:

  • When used as arg in call: there could of course be a bunch of permutations and signature match attempts when there are several arguments, unless the first match immediately. Often Int is used as restriction, and then, there will be no retries, so compilation time will hardly be bogged down. As for combinations, same type on all loose args should be tried first, step by step, before different types combinations (most reasonable earliest match, less processing)
  • When the lvalue is an ivar: look up the possible types on the ivar (which is available since the top-level phase), select first matching type according to above precedence list (normally int-ivars are likely to have just one type, not being a union of ints though). If none matching, the ivar is obviously of no number type, error as usual.

I find the orders above to be reasonable for preferred match (should there exist several possible signature matches). 16-bits really are in the second room, since they're not likely to ever be used in signatures except for in cases where there is an overload for every int-type (low-level stuff) - and then it's still not the preferred one when slapping on an unspecified literal. The reasonable use-case for those are just compatct data in structs, arrays and such.
Int32 or Int64 first could be debatable. Right now it's Int32 "all the way", so that's why I imagined it like that.

As I've already mentioned, I like the idea of letting BigNums in on the game here :-)

What do you think?

@trans
Copy link

trans commented Sep 25, 2016

Thinking about it some more, me thinks it's complicated 😉

For instance, would the compiler be smart enough to know:

200.times do |i|
    i  # is Int8
end

But then

200.times do |i|
    j = i + 57  # ruroh
end

And it would have no idea for:

def x(n)
  n + 1
end
x(1)

And then

def x(n)
  n + 5000000000
end
x(1)

If n is int32 we are almost certainly going to get the wrong answer there. We could put the literal first, but most literals are put at the end in my experience. And in any case it is weird that + would not be commutative.

Maybe this is all easier to do than I realize, but it sure seems like a hairy mess.

Maybe a simple way to handle it to always default to the smallest size (for speed) than operations always up-size to the next size (for accuracy),

Int8 + Int8 -> Int16

And then at certain points in execution the compiler can override this if it can figure where they can be reduced. But the programmer can also help by telling it where to do it, e.g.

    def x(n : Int8)
       ...
    end

    a = 10         # Int8
    b = a + a      # Int16
    x(b)           # error no x(Int16)
    ~b             # pseudo-code for type reduce b, if possible
    x(b)           # ok, x(Int8)

I would expect the upscaling to max out at UInt64, but BigInt could get in on the act if a compiler flag is set?

Just sort of thinking out loud here.

@ozra
Copy link
Contributor

ozra commented Sep 25, 2016

That's why I specifically left local variables out of possible contexts: Only call-signature and ivar assign cases makes it simple.

@lbguilherme
Copy link
Contributor

Order of Choice:

  • If it's a real number (has . or e): try the literal as one of these, in order: Float64, Float32, BigDecimal?
  • If it's a whole number, try, in order: Int32, Int64, Float64, Float32, UInt64, UInt32, Int8, UInt8, BigInt?, Int16, UInt16

I would go as far as saying there are multiple overloads taking different types of Int and the literal fits in more than one of them, then there should be an ambiguity error, not a predefined order. This is what I mean:

def foo(x : Int)
  p typeof(x)
end

foo(5) # Int32
foo(3.1) # Float64
foo(999999999999999999) # Int64

def bar(x : Int8)
  p typeof(x)
end

def bar(x : UInt16)
  p typeof(x)
end

def bar(x : Float32)
  p typeof(x)
end

bar(5) # ambiguos (between Int8 and UInt16)
bar(500) # UInt16
bar(-10) # Int8
bar(1.5) # Float32
bar(99999999) # Float32

def baz(x : Int8)
  p typeof(x)
end

baz(99) # Int8
baz(9999) # no overload matches

@RX14
Copy link
Contributor

RX14 commented Sep 26, 2016

Shouldn't Int64 be the default now 64bit is the standard word size in most of the world's computers?

@lbguilherme
Copy link
Contributor

@RX14 There is still advantage for using Int32 instead of Int64. They take less memory on classes/structs, they are more cache friendly and integer multiplication is a little faster on int32 than on int64, even on 64-bit hardware. I don't think those are reasons to keep using int32, but I don't see a reason to change the default.

@ozra
Copy link
Contributor

ozra commented Sep 27, 2016

@lbguilherme I don't agree it should error in that situation.

I think @RX14 is on the right track with Int64 as default, but, this would lead to classes, arrays, etc. in existing crystal programs to grow space-wise to potentially the double, as @lbguilherme mentions, since many people rely on compiler guessing of ivars.

Personally I'd like:

  • ivars must be type declared or default assigned - guessing will only be done from that.
  • if default assigned, thereby resulting in a guess: any number literals in the expression used to init with must be suffixed to explicit type.
  • literal numbers can now be tried as Int64 first.

The literal-typing should also apply to container-literals (array, etc.) that might grow substantially in size. As long as at least one literal is typed, the rest would be inferred from that in a given (T)-context (an array literal for instance).

A small amount of work in type-defs (class, struct, container-literals) lets us get away with less unnecessary explicitness in general code. It also makes it clearer what the type members are.

Regarding Int64 vs Int32, @lbguilherme - I did extensive benchmarks with real programs (doing copious amounts of calcs on large amounts of data, thereby affecting cycles and caching) in C++ a whole bunch of years ago, keeping all data-structures the same, only switching main int-size for code, and I found no measurable differences - except for integer division, where Int32 was noticeably faster. That last point might have changed since then with new processors though. (Important note: x86 tests only, and only three different machines)

So in general, using Int64 in code is only a pro ("higher roof").

@RX14
Copy link
Contributor

RX14 commented Sep 27, 2016

Hmmn, yes. I can't imagine space is really an issue because we're only changing the typing of local vars. They live on the stack or in registers. Anything that makes it off the stack's size has to be defined anyway (class, etc).

@ozra
Copy link
Contributor

ozra commented Sep 28, 2016

@RX14 - no, as mentioned: with the current way of crystal guessing (preliminary inference) ivar types in classes, it would affect size of a lot of classes in existing code bases. Or did you mean in the context of a change similar to what I mentioned?

In that case, yep. The biggest jump in space eating on 64 bit arch, is the unavoidable pointers since they make up a huge amount of the data structures as "references". But the joy of 64 bit (48, really) addressing does make that a price worth paying.

@RX14
Copy link
Contributor

RX14 commented Sep 28, 2016

ivars must be type declared or default assigned - guessing will only be done from that.

Isn't this currently the case?

@ozra
Copy link
Contributor

ozra commented Sep 28, 2016

No, they're also guessed from assigns in initialize() etc, and literals are not required to be specifically typed in that context. That would make all @my_ivar = 1 initializations change its' type from In32 to Int64, if the literal would be deemed "Int64 primarily", without any other changes.

@faustinoaq
Copy link
Contributor

faustinoaq commented Sep 25, 2017

Why not use Num as alias?

I mean

alias Num = Int | Float
def shoot(x : Num, y : Num)
end

shoot(1.23, 4.56)
record(Point, x : Num, y : Num)

Point.new(0, 0)
record(Color, r : Num, g : Num, b : Num, a : Num = 0)

Color.new(255, 128, 0)

Currently the above alias outputs:

Error in line 1: can't use Int in unions yet, use a more specific type

However I can use Int | Float union in def parameters:

def shoot(x : Int | Float, y : Int | Float)
end

shoot(1.23, 4.56)
shoot(1.23, 4)
shoot(1.23_f32, 4u8)

https://carc.in/#/r/2sis

@lbguilherme
Copy link
Contributor

@faustinoaq This is slow! The compiler will have to keep runtime information about which type of int it is, and for every method call on it, check all possible runtime types.
For method arguments this works because they are restrictions, not casts. But then the literal values will simply be Int32 and Float64 and nothing changes from the original issue here.

@andy-twosticks
Copy link

There's a lot of clever stuff going on here, but I all I really want, personally, is this:

def foo(x : Int8)
  puts x
end

foo(4)

4 is a perfectly valid Int8, and the compiler knows that foo takes an Int8. So why do I need to put 4_i8?

@watzon
Copy link
Contributor

watzon commented Mar 15, 2018

Really the same could be said for many literals. Take this simple example:

class FooClass
  
  def initialize(@foo : Hash(String, String | Int32))
  end
  
  def puts_foo
    puts @foo
  end
  
end

foo = FooClass.new({"hello" => "world"})
foo.puts_foo

which throws the error

Error in line 12: instantiating 'FooClass:Class#new(Hash(String, String))'

in line 3: instance variable '@foo' of FooClass must be Hash(String, Int32 | String), not Hash(String, String)

even though the hash clearly should fit the constraints. The same thing happens with Arrays.

@straight-shoota
Copy link
Member

@watzon What you're describing is about generic variance (see #3803). That's got nothing to do with literals. FooClass.new Hash(String, String).new would not match as well.

@RX14
Copy link
Contributor

RX14 commented Mar 15, 2018

@straight-shoota no, that can be solved. Just in the same way as you'd promote the integer type to the literal, you'd promote the hash type to the literal.

For example

FooClass.new({"hello" => "world"})

could compile but

hash = {"hello" => "world"}
foo = FooClass.new(hash)

would not.

I would support a simple first fix: when a literal is in method args, it's exact type is taken from the method definition if defined and if ambiguous.

@straight-shoota
Copy link
Member

@RX14 To me it makes no sense for literals to behave differently depending on context. When you refactor an argument out to a variable, everything breaks. And you just ask WTF?

@RX14
Copy link
Contributor

RX14 commented Mar 15, 2018

@straight-shoota well would you prefer something or nothing, because tracking types through variables is far harder than just through method args. And likely even more fragile. Would you rather something broke because you took it out of some method args or because you used the variable in an expression and the compiler couldn't change the variable type because of an edgecase.

I know which one I'd prefer, at least being a literal in or out of method args is predictable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.