Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should obsolete the String.Copy method #27515

Closed
GrabYourPitchforks opened this issue Oct 1, 2018 · 18 comments
Closed

We should obsolete the String.Copy method #27515

GrabYourPitchforks opened this issue Oct 1, 2018 · 18 comments
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

The String.Copy static method takes an input string instance and duplicates it, creating a new reference and copying the original contents into the new object. The method is contracted per MSDN to return a new object. (Contrast this with String.Clone, which returns the original instance.)

The only real use case for this is to create a new string instance so that the caller can use unsafe code to modify the string contents. However, as discussed in https://github.com/dotnet/coreclr/issues/14208 (the string deduping feature), it is technically illegal to mutate string instances. Furthermore, once the string deduping feature comes online, there'd no longer be any distinction between two different strings that happen to contain the same data. In that world it would make sense for String.Copy just to return its input argument as its return value.

Given all of this, there seems to be no legitimate use for the method going forward. I propose we obsolete it both to discourage new code from calling it and to signal to existing code bases that its behavior may imminently change.

One alternative would be to change its behavior to return its input argument without also obsoleting it, but I do not believe this is a good course of action. It means that developers who rely on this method to return a duplicate instance (because they're about to perform an unsafe operation) won't ever be notified that the method is no longer safe for their particular use case.

@benaadams
Copy link
Member

It means that developers who rely on this method to return a duplicate instance (because they're about to perform an unsafe operation)

string.Create(int length, TState state, SpanAction<char, TState> action) formalizes the mutate newly allocated zero'd string prior to use pattern; while introducing safety via the Span and ensuring its not used prior to mutation.

For passing to native code which may mutate the string; shouldn't one of the Marshal.StringToXxx methods (+ free) be used instead for a defensive copy?

This info could be included in the obsoletion message.

@zpodlovics
Copy link

Could somebody please clarify why the public delegate void SpanAction<T, in TArg>(Span<T> span, TArg arg); designed in this way? The Span<T> argument itself is a readonly and immutable struct - at least from F# point of view - so we can change the content with the indexers but cannot change the span itself (so slicing is not possible). For some reason this is allowed in C# but not allowed in F#. More details here: dotnet/fsharp#5856

@stephentoub
Copy link
Member

stephentoub commented Nov 2, 2018

Could somebody please clarify why the public delegate void SpanAction<T, in TArg>(Span span, TArg arg); designed in this way?

What clarification are you looking for? It's equivalent to if Action<Span<T>, TArg> were used, but Span<T> can't be used as a generic type argument, so a custom delegate was added to match.

so we can change the content with the indexers but cannot change the span itself

Yes. The whole point of this usage is so that a method like string.Create can allocate the string and allow the delegate to fill it in. The string has already been allocated; there's nothing to slice. The caller needs to tell string.Create how big a string to create.

If you just want to slice up some existing span and create a string from it, you can just use new string(span) or span.ToString().

@zpodlovics
Copy link

zpodlovics commented Nov 3, 2018

Thank you for the clarification. So the original intention was to have a preallocated memory + writable memory window (Span) and allow the delegate to fill it. Thats perfectly fine. The only missing piece to me is how / why C# allowed to mutate the passed in argument (the writable memory window itself in the line: span = span.Slice(segment.Length);) when based on the public delegate void SpanAction<T, in TArg>(Span<T> span, TArg arg); type signature should not allow the argument mutation. In the generated IL the span argument clearly mutated as of IL_0033. However an another signature would allow the argument mutation: public delegate void SpanAction<T, in TArg>(ref Span<T> span, TArg arg);.

        string GetAsciiString(ReadOnlySequence<byte> buffer)
        {
            if (buffer.IsSingleSegment)
            {
                return Encoding.ASCII.GetString(buffer.First.Span);
            }

            return string.Create((int)buffer.Length, buffer, (span, sequence) =>
            {
                foreach (var segment in sequence)
                {
                    Encoding.ASCII.GetChars(segment.Span, span);

                    span = span.Slice(segment.Length);
                }
            });
        }
    .method assembly hidebysig instance void 
            '<GetAsciiString>b__0_0'(valuetype [System.Runtime]System.Span`1<char> span,
                                     valuetype [System.Memory]System.Buffers.ReadOnlySequence`1<uint8> sequence) cil managed
    {
      // Code size       63 (0x3f)
      .maxstack  3
      .locals init (valuetype [System.Memory]System.Buffers.ReadOnlySequence`1/Enumerator<uint8> V_0,
               valuetype [System.Runtime]System.ReadOnlyMemory`1<uint8> V_1)
      IL_0000:  ldarga.s   sequence
      IL_0002:  call       instance valuetype [System.Memory]System.Buffers.ReadOnlySequence`1/Enumerator<!0> valuetype [System.Memory]System.Buffers.ReadOnlySequence`1<uint8>::GetEnumerator()
      IL_0007:  stloc.0
      IL_0008:  br.s       IL_0035

      IL_000a:  ldloca.s   V_0
      IL_000c:  call       instance valuetype [System.Runtime]System.ReadOnlyMemory`1<!0> valuetype [System.Memory]System.Buffers.ReadOnlySequence`1/Enumerator<uint8>::get_Current()
      IL_0011:  stloc.1
      IL_0012:  call       class [System.Runtime]System.Text.Encoding [System.Runtime]System.Text.Encoding::get_ASCII()
      IL_0017:  ldloca.s   V_1
      IL_0019:  call       instance valuetype [System.Runtime]System.ReadOnlySpan`1<!0> valuetype [System.Runtime]System.ReadOnlyMemory`1<uint8>::get_Span()
      IL_001e:  ldarg.1
      IL_001f:  callvirt   instance int32 [System.Runtime]System.Text.Encoding::GetChars(valuetype [System.Runtime]System.ReadOnlySpan`1<uint8>,
                                                                                         valuetype [System.Runtime]System.Span`1<char>)
      IL_0024:  pop
      IL_0025:  ldarga.s   span
      IL_0027:  ldloca.s   V_1
      IL_0029:  call       instance int32 valuetype [System.Runtime]System.ReadOnlyMemory`1<uint8>::get_Length()
      IL_002e:  call       instance valuetype [System.Runtime]System.Span`1<!0> valuetype [System.Runtime]System.Span`1<char>::Slice(int32)
      IL_0033:  starg.s    span
      IL_0035:  ldloca.s   V_0
      IL_0037:  call       instance bool valuetype [System.Memory]System.Buffers.ReadOnlySequence`1/Enumerator<uint8>::MoveNext()
      IL_003c:  brtrue.s   IL_000a

      IL_003e:  ret
    } // end of method '<>c'::'<GetAsciiString>b__0_0'

@benaadams
Copy link
Member

benaadams commented Nov 3, 2018

The span is passed in by value, so its just mutating the function local argument; byref would have a totally different meaning (changing the Span in the caller).

Without mutating the argument the example method would look like the following

string GetAsciiString(ReadOnlySequence<byte> buffer)
{
    if (buffer.IsSingleSegment)
    {
        return Encoding.ASCII.GetString(buffer.First.Span);
    }

    return string.Create((int)buffer.Length, buffer, (span, sequence) =>
    {
        Span<char> output = span;
        foreach (var segment in sequence)
        {
            Encoding.ASCII.GetChars(segment.Span, output);

            output = output.Slice(segment.Length);
        }
    });
}

Mutating the argument just skips the copy

Span<char> output = span;

@zpodlovics
Copy link

@benaadams Thank you for the clarification. So the missing piece was a Roslyn optimization for function local argument mutation.

@benaadams
Copy link
Member

Yeah, the Span is initially pointing at the window over the allocated string (of size buffer.Length).

The foreach loop is writing out each segment that makes up that length; then slicing the input span to advance its starting position (and reduce its remaining length) for the next segment to write into.

It doesn't want to be passed byref as that would suggest it was changing where the string is allocated when it changes the span. Also at the end of the function the span would be of .Length zero (as there is none remaining to write to); which wouldn't make much sense for the string length.

@jkotas
Copy link
Member

jkotas commented Jan 23, 2019

Tracked by dotnet/corefx#33602

@jkotas jkotas closed this as completed Jan 23, 2019
@AdamJachocki
Copy link

I am a little shocked seeing that Copy method is now obsolete :| What about deep cloning a object that contains strings? We can do now new string(oldStr), but was it only just some kind of weird cleaning that I don't understand?

@stephentoub
Copy link
Member

What about deep cloning

Strings are immutable. Why do you need to create a copy?

@GrabYourPitchforks
Copy link
Member Author

Do you have a tool that walks an object graph looking for the ICloneable interface and calling ICloneable.Clone() over and over again? That scenario should continue to work, as string.Clone() is not being obsoleted. The contract of that method is that it returns this.

@JohnGalt1717
Copy link

The reason you want to create a copy is simple:

let's say you load a value from a resource string.

And you want to go and do find and replace within that string.

Every iteration of a loop you copy the original string that was expensive to get out of resources so you got it before the string, and then you do find and replace on the original string.

This happens all of the time.

@GrabYourPitchforks
Copy link
Member Author

Find and replace operations always return a copy of the string. You don't need to make a manual copy.

Example:

string a = "Hello world!";
string b = a.Replace("Hello", "Goodnight");

Console.WriteLine(a); // "Hello world!" <-- still keeps its original value
Console.WriteLine(b); // "Goodnight world!" <-- new string with mutated value

@JohnGalt1717
Copy link

I thought this wasn't the case with StringBuilder? (i.e. create a stringbuilder off of a string, then act on it, I thought it was always working with pointers and mutating instead of copying?)

And since replace and all of the rest create copies, why wouldn't string.copy be valid for the same reason that Replace creates a copy instead of mutating the original like other languages?

@stephentoub
Copy link
Member

I thought this wasn't the case with StringBuilder?

System.String is immutable. That means in order to create one, you need to first mutate something that's mutable to contain the content you want, and then create an immutable string from it. One way to do that is to just fill up a char[] and pass that to a string ctor, but that lacks useful helpers for easily building up the content, means you need to manually manage the growth of the array, etc. StringBuilder is effectively a convenience wrapper around such a char[], providing those helper methods, managing the memory, and then creating the immutable string from what you built up (in practice StringBuilder is a bit smarter in its implementation, utilizing basically a linked list of char[]s, but it's the same idea).

String.Copy has nothing to do with StringBuilder, though.

why wouldn't string.copy be valid for the same reason that Replace creates a copy

Because Replace doesn't just create a copy; it creates a new string with the same content as the old but with the specified replacements performed. It'd be like creating a StringBuilder, populating the StringBuilder with the original string contents, making the replacements in the StringBuilder, and then getting the resulting immutable string from StringBuilder.ToString, but much more efficiently.

In contrast, String.Copy doesn't make any changes: it takes in the immutable string and creates a new immutable string instance with the exact same contents as the original. As such, there's no reason to use String.Copy... you might as well just use the immutable original.

Worse, the problem with String.Copy is it actually encourages breaking the safety of the system. The only reason you would use String.Copy is to violate the immutableness of string. That's why it's been obsoleted.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
@gewarren
Copy link
Contributor

This section of the docs relies on String.Copy to avoid string interning: https://learn.microsoft.com/en-us/dotnet/csharp/how-to/compare-strings#reference-equality-and-string-interning. Should we just remove this section? cc @BillWagner

@danmoseley
Copy link
Member

I think so.

@stephentoub
Copy link
Member

Yup

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

10 participants