Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain negative integer literals are broken #302

Closed
perlun opened this issue Mar 13, 2022 · 2 comments · Fixed by #306
Closed

Certain negative integer literals are broken #302

perlun opened this issue Mar 13, 2022 · 2 comments · Fixed by #306
Labels
bug Something isn't working as expected language Language features (or bugs)
Milestone

Comments

@perlun
Copy link
Collaborator

perlun commented Mar 13, 2022

I just realized that this logic is flawed:

if (value < Int32.MaxValue)
{
AddToken(NUMBER, (int)value);
}
else if (value < UInt32.MaxValue)
{
AddToken(NUMBER, (uint)value);
}
else if (value < Int64.MaxValue)
{
AddToken(NUMBER, (long)value);
}
else if (value < UInt64.MaxValue)
{
AddToken(NUMBER, (ulong)value);
}
else // Anything else remains a BigInteger
{
AddToken(NUMBER, value);
}

The problem is that this does not handle all negative numbers correctly, which is evidenced by running the following statement in the REPL:

> var v = -2147483648;
[line unknown] Unsupported target type System.UInt32
@perlun perlun added the bug Something isn't working as expected label Mar 13, 2022
@perlun perlun added this to the 0.2.0 milestone Mar 13, 2022
@perlun
Copy link
Collaborator Author

perlun commented Mar 14, 2022

The problem is that this does not handle all negative numbers correctly, which is evidenced by running the following statement in the REPL:

This seems to have been a problem specifically with -2147483648 in fact. 😂 The reason for this is that "it's complicated":

  1. The scanning takes place in the code described above, i.e. this snippet.
    if (value < Int32.MaxValue)
    {
    AddToken(NUMBER, (int)value);
    }
    else if (value < UInt32.MaxValue)
    {
    AddToken(NUMBER, (uint)value);
    }
    else if (value < Int64.MaxValue)
    {
    AddToken(NUMBER, (long)value);
    }
    else if (value < UInt64.MaxValue)
    {
    AddToken(NUMBER, (ulong)value);
    }
    else // Anything else remains a BigInteger
    {
    AddToken(NUMBER, value);
    }
    This will encounter 2147483648 which is larger than Int32.MaxValue and hence (with our current master codebase) will yield a UInt32 value.
  2. Then, the code in PerlangInterpreter will do the actual conversion to a negative number:
    case UInt32 value:
    return -value;

I have two things to say about this:

  • The reason is breaks is specifically because -2147483648 is a nasty outlier. This is because of the two's complement method being used to denote negative integers in most computer architectures, which has the interesting semantic detail that the signed integer range (for 32-bit values) goes from -2147483648 to 2147483647, i.e. the negative "number space" is 1 number larger than the positive one. This breaks the algorithm above.

  • I'm starting to question the idea of using the MINUS + Expr.Literal approach here. It would, in fact, be simpler (from the above POV) if the whole number (including the negative sign) would be a single, unified expression instance instead. This is unfortunately a much bigger rewrite though. I'll see if I can find some reasonable pragmatic approach forward here for now...

@perlun
Copy link
Collaborator Author

perlun commented Mar 18, 2022

Some more details on why this breaks, while I remember it:

  • The scanning code above parses the number to "smallest integer available". 2147483646 => Int32. 2147483647 (because of off-by-one bug in the existing code) => UInt32. 2147483648 => UInt32 because this number is unrepresentable as a (signed) Int32.
  • The implementation for the unary prefix operator will then throw the Unsupported target type exception.

The easiest way to "get rid of" this problem is to not convert the number to numeric data type until we know whether it is negative or positive.


As for my previous comment:

  • This is unfortunately a much bigger rewrite though. I'll see if I can find some reasonable pragmatic approach forward here for now...

I am going for the "much bigger rewrite" in this case. PR upcoming as soon as I have time to complete it.

perlun added a commit that referenced this issue Mar 20, 2022
This commit changes the logic to do the parsing of numeric literals at
parse-time instead of scan-time, where we have better access to the
actual context of how the literal is being used. This makes it possible
to avoid making expressions like "-100" be `Expr.UnaryPrefix` and
instead be a simple `Expr.Literal` with the value of the expected type,
even for the edge cases described in #302.

Fixes #302
@perlun perlun changed the title Fix support for negative numbers Fix broken support for certain negative integer literals Mar 20, 2022
@perlun perlun changed the title Fix broken support for certain negative integer literals Certain negative integer literals are broken Mar 20, 2022
@perlun perlun added the language Language features (or bugs) label Mar 20, 2022
perlun added a commit that referenced this issue Mar 20, 2022
This commit changes the logic to do the parsing of numeric literals at
parse-time instead of scan-time, where we have better access to the
actual context of how the literal is being used. This makes it possible
to avoid making expressions like "-100" be `Expr.UnaryPrefix` and
instead be a simple `Expr.Literal` with the value of the expected type,
even for the edge cases described in #302.

Fixes #302
perlun added a commit that referenced this issue Mar 21, 2022
This commit changes the logic to do the parsing of numeric literals at
parse-time instead of scan-time, where we have better access to the
actual context of how the literal is being used. This makes it possible
to avoid making expressions like "-100" be `Expr.UnaryPrefix` and
instead be a simple `Expr.Literal` with the value of the expected type,
even for the edge cases described in #302.

Fixes #302
perlun added a commit that referenced this issue Mar 21, 2022
This commit changes the logic to do the parsing of numeric literals at
parse-time instead of scan-time, where we have better access to the
actual context of how the literal is being used. This makes it possible
to avoid making expressions like "-100" be `Expr.UnaryPrefix` and
instead be a simple `Expr.Literal` with the value of the expected type,
even for the edge cases described in #302.

Fixes #302
perlun added a commit that referenced this issue Mar 21, 2022
This commit changes the logic to do the parsing of numeric literals at
parse-time instead of scan-time, where we have better access to the
actual context of how the literal is being used. This makes it possible
to avoid making expressions like "-100" be `Expr.UnaryPrefix` and
instead be a simple `Expr.Literal` with the value of the expected type,
even for the edge cases described in #302.

Fixes #302
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected language Language features (or bugs)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant