Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Operators.string to be used in more places (prevent FS0670); optimize generated IL code #890

Closed
5 tasks done
abelbraaksma opened this issue Jun 28, 2020 · 9 comments

Comments

@abelbraaksma
Copy link
Member

abelbraaksma commented Jun 28, 2020

Allow use of 'string' everywhere, and optimize generated IL

I propose:

  1. that we change ^T to 'T from the Operators.string function. This allows its usage in places where you would now get an error that it "escapes its scope" (report: Using string obj vs obj.ToString() leads to problems: the type parameter can escape its scope dotnet/fsharp#7958). Consider:

    type Either<'L, 'R> =
        | Left of 'L 
        | Right of 'R
        override this.ToString() =
            match this with
            | Left x -> string x    // not possible
            | Right x -> string x   // not possible

    this code fails to compile with:

    FS0670 This code is not sufficiently generic. The type variable ^T could not be generalized because it would escape its scope.

  2. that we optimize the generated IL. Currently, due to what can be considered a bug, regardless of input, the generated IL is huge (report: The 'string' operator produces suboptimal code when the object's type is known. dotnet/fsharp#9153 Since this is a statically-inlined function, each time you call string, it expands to this:

    object obj = 4;
    if (obj != null)
    {
        IFormattable formattable = obj as IFormattable;
        if (formattable != null)
        {
            IFormattable formattable2 = formattable;
            return formattable2.ToString(null, CultureInfo.InvariantCulture);
        }
        object obj2 = obj;
        return obj2.ToString();
    }
    return "";

    But need to be only this in most cases:

    myValue.ToString(null, CultureInfo.InvariantCulture);
  3. that we improve speed. Currently, a double null-check is done. This is unnecessary, in fact, many code paths don't need a null-check at all. It would also be nice to remove callvirt and use call instead (or a constrained callvirt), in cases where this is possible.

I was already implementing this (see dotnet/fsharp#9549), but while doing so, and discussing it with @KevinRansom, we found that there's a tension between the above requests and that not all of them can be supported for all scenarios. This is in part due to the nature of enum: it's treated as one of eight integer types.

The existing way of approaching this problem in F# is: roll your own string function (this has long been my approach in my own code, and this is easy if you don't have to deal with case-printing for enum).

Details are in a tentative RFC.

Pros and Cons

The advantages of making this adjustment to F# are:

  • Able to use string everywhere (except with refs, but that's another discussion)
  • Removal of dead code: current implementation has special paths for int, float etc that are never hit
  • Improved performance in most scenarios
  • Inlining will be done by the JIT in most cases (I tested this, this is true)
  • Much, much shorter generated IL code

The disadvantages of making this adjustment to F# are:

  • Some people may frown on losing inline
  • Code that ends with string x, where x is an input parameter, if never used will be inferred as type obj. After this change, it will remain generic (this is more of advantage than a disadvantage, I think)

Extra information

Estimated cost (XS, S, M, L, XL, XXL): XS

While researching this was more challenging than I thought, the final changes will be just a few lines, and probably some new test cases.

Related suggestions: None (but see the linked issues for the bug reports that lead to this)

Of note:

Affidavit (please submit!)

Please tick this by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.

Please tick all that apply:

@abelbraaksma
Copy link
Member Author

@KevinRansom, a specific note to you. After our talk on the current PR, you asked me to investigate if we could have better performance and less IL for the native types. I thought at the time that it was not possible to have (A) static resolution (needed for special-casing native types) and (B) allowing string in more contexts where normally inlined functions are disallowed.

It turns out that getting both (A) and (B) are possible by using the compiler-only syntax of when 'T: type instead of when ^T: type. I didn't know this was supported, and it behaves surprisingly well.

However, statically distinguishing between enum and any type of int is not possible as far as I can tell (@dsyme, or do you know of a tricky here? I think the limitation comes from Enum being technically an abstract class, and we cannot statically test polymorphically). Since an enum is printed by its case-name (string ConsoleKey.Backspace gives "Backspace"), we still need to take care of that case. However, knowing that a struct will never be null, allows to optimize the code a little bit more.

I'll update in the next hour or so in the PR, so you can see more detailed what I mean.

@TIHan
Copy link

TIHan commented Jun 29, 2020

@abelbraaksma , I don't know if we could un-inline it without suffering a performance degradation. string is currently defined as follows:

        [<CompiledName("ToString")>]
        let inline string (value: ^T) = 
             anyToString "" value
             // since we have static optimization conditionals for ints below, we need to special-case Enums.
             // This way we'll print their symbolic value, as opposed to their integral one (Eg., "A", rather than "1")
             when ^T struct = anyToString "" value
             when ^T : float      = (# "" value : float      #).ToString("g",CultureInfo.InvariantCulture)
             when ^T : float32    = (# "" value : float32    #).ToString("g",CultureInfo.InvariantCulture)
             when ^T : int64      = (# "" value : int64      #).ToString("g",CultureInfo.InvariantCulture)
             when ^T : int32      = (# "" value : int32      #).ToString("g",CultureInfo.InvariantCulture)
             when ^T : int16      = (# "" value : int16      #).ToString("g",CultureInfo.InvariantCulture)
             when ^T : nativeint  = (# "" value : nativeint  #).ToString()
             when ^T : sbyte      = (# "" value : sbyte      #).ToString("g",CultureInfo.InvariantCulture)
             when ^T : uint64     = (# "" value : uint64     #).ToString("g",CultureInfo.InvariantCulture)
             when ^T : uint32     = (# "" value : uint32     #).ToString("g",CultureInfo.InvariantCulture)
             when ^T : int16      = (# "" value : int16      #).ToString("g",CultureInfo.InvariantCulture)
             when ^T : unativeint = (# "" value : unativeint #).ToString()
             when ^T : byte       = (# "" value : byte       #).ToString("g",CultureInfo.InvariantCulture)

It's mainly because of statically resolved type parameters that the error occurs. But, we need to use statically resolved type parameters in order to use the internal/FSharp.Core-only generic specializations on the primitive types.; this is for performance optimizations. Otherwise, we would have to fallback to reflection.

@abelbraaksma
Copy link
Member Author

abelbraaksma commented Jun 29, 2020

@TIHan, that was my original thought as well. However, have a look at the code you just posted, the first two lines. The first is hit for classes, the second four structs. None of the others are ever hit (confirmed by testing).

anyToString "" value  // general case and body of function
when ^T struct = anyToString "" value   // special case for structs, lines after this are not hit
when ^T : float = (# "" value : float #).ToString("g",CultureInfo.InvariantCulture)  // special case for float, never hit, because it is also a struct
when ^T: float32 ... // also not hit, etc

So the status quo has a bug. But removing the struct line and placing it below creates a compat issue, because enums will not be printed correctly.

I found a way to remove the 'inline' mark and still have the compiler issue the compiler optimizations. The linked PR has an implementation of that (I need to issue new timings though, they were from the original, simpler idea of only removing 'inline')

@abelbraaksma
Copy link
Member Author

abelbraaksma commented Jun 29, 2020

For reference, the current PR looks as follows UPDATED on July 2, there's now no removal of inline, which proved controversial, and the current solution still solves the underlying problem. Note that the compiler still very much inlines this at the call-site, even though the inline modifier is gone n/a anymore: it is guaranteed to inline always.

It now won't raise an error when used in places where the "type could escape its scope", because of the change from ^T (SRTP) to 'T syntax.

[<CompiledName("ToString")>]
let inline string (value: 'T) = 
        anyToString "" value
        when 'T : string     = (# "" value : string #)     // force no-op

        // Using 'let x = (# ... #) in x.ToString()' leads to better IL, without it, an extra stloc and ldloca.s (get address-of)
        // gets emitted, which are unnecessary. With it, the extra address-of-variable is not created
        when 'T : float      = let x = (# "" value : float #)      in x.ToString(null, CultureInfo.InvariantCulture)
        when 'T : float32    = let x = (# "" value : float32 #)    in x.ToString(null, CultureInfo.InvariantCulture)
        when 'T : decimal    = let x = (# "" value : decimal #)    in x.ToString(null, CultureInfo.InvariantCulture)
        when 'T : BigInteger = let x = (# "" value : BigInteger #) in x.ToString(null, CultureInfo.InvariantCulture)
             
        // no IFormattable
        when 'T : char       = let x = (# "" value : char #)       in x.ToString()
        when 'T : bool       = let x = (# "" value : bool #)       in x.ToString()
        when 'T : nativeint  = let x = (# "" value : nativeint #)  in x.ToString()
        when 'T : unativeint = let x = (# "" value : unativeint #) in x.ToString()

        // For the int-types:
        // It is not possible to distinguish statically between Enum and (any type of) int.
        // This way we'll print their symbolic value, as opposed to their integral one 
        // E.g.: 'string ConsoleKey.Backspace' gives "Backspace", rather than "8")
        when 'T : sbyte      = (box value :?> IFormattable).ToString(null, CultureInfo.InvariantCulture)
        when 'T : int16      = (box value :?> IFormattable).ToString(null, CultureInfo.InvariantCulture)
        when 'T : int32      = (box value :?> IFormattable).ToString(null, CultureInfo.InvariantCulture)
        when 'T : int64      = (box value :?> IFormattable).ToString(null, CultureInfo.InvariantCulture)

        // unsigned integral types have equal behavior with ToString() vs IFormattable::ToString
        // this allows us to issue the 'constrained' opcode with 'callvirt'
        when 'T : byte       = let x = (# "" value : 'T #) in x.ToString()
        when 'T : uint16     = let x = (# "" value : 'T #) in x.ToString()
        when 'T : uint32     = let x = (# "" value : 'T #) in x.ToString()
        when 'T : uint64     = let x = (# "" value : 'T #) in x.ToString()


        // other common mscorlib System struct types
        when 'T : DateTime         = let x = (# "" value : DateTime #) in x.ToString(null, CultureInfo.InvariantCulture)
        when 'T : DateTimeOffset   = let x = (# "" value : DateTimeOffset #) in x.ToString(null, CultureInfo.InvariantCulture)
        when 'T : TimeSpan         = let x = (# "" value : TimeSpan #) in x.ToString(null, CultureInfo.InvariantCulture)
        when 'T : Guid             = let x = (# "" value : Guid #) in x.ToString(null, CultureInfo.InvariantCulture)
        when 'T struct = 
        match box value with
        | :? IFormattable as f -> f.ToString(null, CultureInfo.InvariantCulture)
        | _ -> value.ToString()

        // other commmon mscorlib reference types
        when 'T : StringBuilder = let x = (# "" value : StringBuilder #) in x.ToString()
        when 'T : IFormattable = let x = (# "" value : IFormattable #) in x.ToString(null, CultureInfo.InvariantCulture)

Before, when you did string 42, the IL (converted to C#) looked as point (2) in the issue.
After, it just looks like: ((IFormattable) ((uint) x)).ToString(null, CultureInfo.InvariantCulture);. Still not ideal, I'm checking with @gusty, @dsyme and @TIHan if we can do something about that and turn it into a call instead of callvirt. EDIT: this proves to be challenging and may go in the optimizer, see this comment by @dsyme: dotnet/fsharp#9549 (comment)

Note that the code above will optimize where it can:

  • string types get erased completely
  • signed integral types like int etc still have boxing with callvirt semantics because of enum issue
  • unsigned integral types like uint now use the constrained callvirt, which doesn't need boxing
  • float, decimal etc use call semantics and no boxing
  • some other known structs like char just become myChar.ToString() and issue the constrained IL opcode
  • some well-known reference types do not issue the runtime type-check for IFormattable
  • anything else, including unknown structs, work as before and still have the type-check (we cannot statically check for presence of IFormattable).

@abelbraaksma abelbraaksma changed the title Allow Operators.string to be used in more places, by removing "inline"; optimize generated IL code Allow Operators.string to be used in more places (prevent FS0670); optimize generated IL code Jul 2, 2020
@abelbraaksma
Copy link
Member Author

Issue updated, previous comment updated, as new insights show that we can reach the same behavior without removing inline (while the effective generated code is equal in both cases due to when-optimizer syntax, there were worries that some cases might be missed by the optimizer). Tx @dsyme for pointing this out: dotnet/fsharp#9549 (comment).

@abelbraaksma
Copy link
Member Author

abelbraaksma commented Jul 23, 2020

@TIHan, I think this is now ready. I verified the RFC to make sure your last comments have been reflected. Can this be "approved in principle"? It is a benign change (if it is a change at all).

RFC: fsharp/fslang-design#479

@abelbraaksma
Copy link
Member Author

abelbraaksma commented Aug 23, 2020

@cartermp / @TIHan and others, the PR is accepted and merged, concerns have been addressed, I've updated the RFC to reflect this. Can this be approved-in-principle and closed, and the RFC merged?

@cartermp
Copy link
Member

Yes, for some reason I had thought that was already done

@abelbraaksma
Copy link
Member Author

RFC was merged and can now be found here: https://github.com/fsharp/fslang-design/blob/master/preview/FS-1089-allow-string-everywhere.md.

I'll close this, as it's now done and implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants