Certain negative integer literals are broken #302

perlun · 2022-03-13T07:28:23Z

I just realized that this logic is flawed:

Lines 484 to 503 in 11f48ce

    
           if (value < Int32.MaxValue) 
        
           { 
        
               AddToken(NUMBER, (int)value); 
        
           } 
        
           else if (value < UInt32.MaxValue) 
        
           { 
        
               AddToken(NUMBER, (uint)value); 
        
           } 
        
           else if (value < Int64.MaxValue) 
        
           { 
        
               AddToken(NUMBER, (long)value); 
        
           } 
        
           else if (value < UInt64.MaxValue) 
        
           { 
        
               AddToken(NUMBER, (ulong)value); 
        
           } 
        
           else // Anything else remains a BigInteger 
        
           { 
        
               AddToken(NUMBER, value); 
        
           }

The problem is that this does not handle all negative numbers correctly, which is evidenced by running the following statement in the REPL:

> var v = -2147483648;
[line unknown] Unsupported target type System.UInt32

The text was updated successfully, but these errors were encountered:

perlun · 2022-03-14T18:49:53Z

The problem is that this does not handle all negative numbers correctly, which is evidenced by running the following statement in the REPL:

This seems to have been a problem specifically with -2147483648 in fact. 😂 The reason for this is that "it's complicated":

The scanning takes place in the code described above, i.e. this snippet.

perlang/src/Perlang.Parser/Scanner.cs

Lines 484 to 503 in 11f48ce

    
           if (value < Int32.MaxValue) 
        
           { 
        
               AddToken(NUMBER, (int)value); 
        
           } 
        
           else if (value < UInt32.MaxValue) 
        
           { 
        
               AddToken(NUMBER, (uint)value); 
        
           } 
        
           else if (value < Int64.MaxValue) 
        
           { 
        
               AddToken(NUMBER, (long)value); 
        
           } 
        
           else if (value < UInt64.MaxValue) 
        
           { 
        
               AddToken(NUMBER, (ulong)value); 
        
           } 
        
           else // Anything else remains a BigInteger 
        
           { 
        
               AddToken(NUMBER, value); 
        
           }

This will encounter 2147483648 which is larger than Int32.MaxValue and hence (with our current master codebase) will yield a UInt32 value.

Then, the code in PerlangInterpreter will do the actual conversion to a negative number:

perlang/src/Perlang.Interpreter/PerlangInterpreter.cs

Lines 704 to 705 in 11f48ce

case UInt32 value:

return -value;

I have two things to say about this:

The reason is breaks is specifically because -2147483648 is a nasty outlier. This is because of the two's complement method being used to denote negative integers in most computer architectures, which has the interesting semantic detail that the signed integer range (for 32-bit values) goes from -2147483648 to 2147483647, i.e. the negative "number space" is 1 number larger than the positive one. This breaks the algorithm above.
I'm starting to question the idea of using the MINUS + Expr.Literal approach here. It would, in fact, be simpler (from the above POV) if the whole number (including the negative sign) would be a single, unified expression instance instead. This is unfortunately a much bigger rewrite though. I'll see if I can find some reasonable pragmatic approach forward here for now...

perlun · 2022-03-18T21:23:46Z

Some more details on why this breaks, while I remember it:

The scanning code above parses the number to "smallest integer available". 2147483646 => Int32. 2147483647 (because of off-by-one bug in the existing code) => UInt32. 2147483648 => UInt32 because this number is unrepresentable as a (signed) Int32.
The implementation for the unary prefix operator will then throw the Unsupported target type exception.

The easiest way to "get rid of" this problem is to not convert the number to numeric data type until we know whether it is negative or positive.

As for my previous comment:

This is unfortunately a much bigger rewrite though. I'll see if I can find some reasonable pragmatic approach forward here for now...

I am going for the "much bigger rewrite" in this case. PR upcoming as soon as I have time to complete it.

This commit changes the logic to do the parsing of numeric literals at parse-time instead of scan-time, where we have better access to the actual context of how the literal is being used. This makes it possible to avoid making expressions like "-100" be `Expr.UnaryPrefix` and instead be a simple `Expr.Literal` with the value of the expected type, even for the edge cases described in #302. Fixes #302

perlun added the bug Something isn't working as expected label Mar 13, 2022

perlun added this to the 0.2.0 milestone Mar 13, 2022

perlun mentioned this issue Mar 20, 2022

(parser) Fix broken support for certain negative integer literals #306

Merged

perlun changed the title ~~Fix support for negative numbers~~ Fix broken support for certain negative integer literals Mar 20, 2022

perlun changed the title ~~Fix broken support for certain negative integer literals~~ Certain negative integer literals are broken Mar 20, 2022

perlun added the language Language features (or bugs) label Mar 20, 2022

perlun closed this as completed in #306 Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Certain negative integer literals are broken #302

Certain negative integer literals are broken #302

perlun commented Mar 13, 2022 •

edited

Loading

perlun commented Mar 14, 2022

perlun commented Mar 18, 2022

Certain negative integer literals are broken #302

Certain negative integer literals are broken #302

Comments

perlun commented Mar 13, 2022 • edited Loading

perlun commented Mar 14, 2022

perlun commented Mar 18, 2022

perlun commented Mar 13, 2022 •

edited

Loading