Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes, strings: add Clone #45038

Closed
icholy opened this issue Mar 15, 2021 · 68 comments
Closed

bytes, strings: add Clone #45038

icholy opened this issue Mar 15, 2021 · 68 comments

Comments

@icholy
Copy link

icholy commented Mar 15, 2021

Working directly with []byte is very common and needing to copy it comes up fairly often. A Clone helper would be nice to have.

What did you expect to see?

dup := bytes.Clone(data)

What did you see instead?

dup := make([]byte, len(data))
copy(dup, data)

Implementation

package bytes

// Clone returns a copy of b
func Clone(b []byte) []byte {
  b2 := make([]byte, len(b))
  copy(b2, b)
  return b2
}
@bradfitz
Copy link
Contributor

If accepted, Clone would probably be a more clear and matching name:

go1.txt:pkg html/template, method (*Template) Clone() (*Template, error)
go1.txt:pkg text/template, method (*Template) Clone() (*Template, error)
go1.8.txt:pkg crypto/tls, method (*Config) Clone() *Config
go1.13.txt:pkg net/http, method (Header) Clone() Header
go1.13.txt:pkg net/http, method (*Transport) Clone() *Transport
go1.13.txt:pkg net/http, method (*Request) Clone(context.Context) *Request

Copy makes it sounds like the built-in copy.

@ianlancetaylor ianlancetaylor added this to the Proposal milestone Mar 15, 2021
@jimmyfrasche
Copy link
Member

Would this be a better fit for the likely generic slices package, so it could work on any []T?

@ulikunitz
Copy link
Contributor

ulikunitz commented Mar 15, 2021

There is already a generic way to do it in one line.

b2 := append([]byte(nil), b...)

This approach is even faster because the allocated slice will not be initialized to zero values.

@icholy
Copy link
Author

icholy commented Mar 15, 2021

@ulikunitz I think that's an argument for adding a bytes.Copy function. Most of the code I've seen uses the two step approach. A helper would probably result in more people using the faster approach.

@martisch
Copy link
Contributor

martisch commented Mar 16, 2021

@ulikunitz

b2 := append([]byte(nil), b...)

This approach is even faster because the allocated slice will not be initialized to zero values.

For simple cases like in the example code at the thread start (no complex computation in length argument) make + copy should be faster than append since go1.15: #26252 (comment) as the go compiler avoids the zero value intialization there too.

@DeedleFake
Copy link

I think that even with append() being available for this, a simple generic Clone() or Copy() function in a slices package that basically just wraps that would be worth it simply to avoid the need to match the type manually:

b2 := append([]byte(nil), b...)
// vs.
b2 := slices.Clone(b)

@zigo101
Copy link

zigo101 commented Mar 16, 2021

@DeedleFake
Yout don't need to write out the type []byte manually in an append call. Just do append(b[:0:0], b...). So personally, I think the append call is more graceful the slices.Clone call. The only drawback of append calls is that they might zero some redundant elements so they are often a little slower than make+copy calls since Go toolchain 1.15 (as mentioned by @martisch above).

@ericlagergren
Copy link
Contributor

Just do append(b[:0:0], b...).

While that works, I think it's definitely confusing, especially if somebody else owns b (like io.Reader). I'd personally object to that in a code review.

That said, does this proposal also require strings.Copy? IIRC there has been some confusion both in the gophers slack and in GitHub issues about whether string([]byte(s)) will always make a copy of s.

@icholy
Copy link
Author

icholy commented Mar 16, 2021

@ericlagergren strings are immutable, there's no need to copy them.

@ericlagergren
Copy link
Contributor

ericlagergren commented Mar 16, 2021

There is need to copy them. For example:

s := "big long string"
x := s[:3]
// no more uses of s, but x is kept
// alive for a long time, keeping the
// entirety of s in memory until x is
// collected because the GC does not
// collect partial objects. 

This has implications for things like string interning.

@zigo101
Copy link

zigo101 commented Mar 16, 2021

In Go programming, there are many cases in which a small piece of a memory block is active prevents the whole memory block being released, not limited to slices and strings.

@josharian
Copy link
Contributor

See #25834.

@earthboundkid
Copy link
Contributor

earthboundkid commented Mar 24, 2021

The append trick works well, but you have to have read the Go slices tricks wiki page to know to do it. If generics add a slices package, it would make sense to have slices.Copy[T any](s []T) []T as a function if only because it would have better discoverability for new gophers. That said, I don't think it makes any sense to add bytes.Copy today because we're so close to generics and the append trick works already.

@rsc
Copy link
Contributor

rsc commented Apr 7, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@hherman1
Copy link

hherman1 commented Apr 7, 2021

It would be nice to get a sense of how widely used this idiom is. As it stands, a Clone() function saves almost no lines of code, and it's a new API which contributes to the system complexity of the STL. I think this can only be worthwhile if it is extremely widely used.

@dotaheor
Copy link

dotaheor commented Apr 8, 2021

IMHO, a builtin merge function will have more use scenarios than clone.

@rsc
Copy link
Contributor

rsc commented Apr 14, 2021

Whatever we add here will need to be in strings too (forcing a copy, to address #40200) for symmetry.
And whatever we add here will probably need to go into the yet-undefined generic slices package, for symmetry.
So we need to get the name right for all three.
Copy is no good because this is different from the builtin copy.

Clone seems OK.
In some cases Clone means "deep clone" (like in http.Request).
In other cases Clone means "shallow clone" (like in tls.Config).
This would be a shallow clone.

Any objections to Clone?

@DeedleFake
Copy link

DeedleFake commented Apr 14, 2021

bytes.Clone() sounds good to me. What would the point of strings.Clone() be? Decreasing the memory footprint of heavily substringed strings? That could be useful; just have to make sure that the documentation is clear about the intent.

That being said, I don't think that it has to exist just because bytes.Clone() does. strings.Equal() doesn't.

@dsnet
Copy link
Member

dsnet commented Apr 14, 2021

What would the point of strings.Clone() be? Decreasing the memory footprint of heavily substringed strings?

I think that has benefit in itself. I occasionally see string([]byte(s)) in source code and it's not clear to me that the author was trying to re-allocate a smaller string from some other massive string. Seeing strings.Clone(s) communicates intent much more cleanly.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Apr 14, 2021

strings.Equal(a, b) would be exactly the same as a == b. There is no way to write the equivalent of strings.Clone. Even string([]byte(s)) is not the same, as the compiler could in principle optimize that away (and it might even be a useful optimization after inlining).

@DeedleFake
Copy link

That's fair. I just used strings.Equal() as an example of something that isn't duplicated from one to the other. bytes.Buffer isn't exactly duplicated, either, as strings.Builder has a different name and API for obvious reasons, and it didn't exist at all for quite a while. bytes.Replacer also doesn't exist.

@creker
Copy link

creker commented Apr 14, 2021

@ericlagergren

There is need to copy them. For example:

s := "big long string"
x := s[:3]
// no more uses of s, but x is kept
// alive for a long time, keeping the
// entirety of s in memory until x is
// collected because the GC does not
// collect partial objects. 

This has implications for things like string interning.

  1. Currently string literals are stored in a read-only section. They're not collected and probably treated as globals by GC.
  2. Slicing string literals references the same read-only section. This also doesn't affect GC.

So there's no need to copy strings in that case. You probably meant cases where strings are built at runtime?

@ericlagergren
Copy link
Contributor

@creker yes. Sorry, I should have been more exacting.

@rsc rsc changed the title proposal: add bytes.Copy function proposal: bytes, strings: add Clone Apr 21, 2021
@rsc
Copy link
Contributor

rsc commented Apr 21, 2021

Based on the reactions to last week's comment, retitled to include bytes.Clone and strings.Clone, as in:

package bytes

// Clone returns a fresh copy of b.
// Specifically, the result is like b[0 : len(b) : len(b)] but in a new backing array.
func Clone(b []byte) []byte

and

package strings

// Clone returns a fresh copy of s.
// It guarantees to make a copy of s into a new allocation,
// which can be important when retaining only a small substring
// of a much larger string. Using Clone can help such programs
// use less memory. Of course, since using Clone makes a copy,
// overuse of Clone can make programs use more memory.
// Clone should typically be used only rarely, and only when
// profiling indicates that it is needed.
func Clone(s string) string

@martisch
Copy link
Contributor

I dont think we need more CLs for bytes.Clone. One has existed since Oct 2021: https://go-review.googlesource.com/c/go/+/359675 but was not approved for merge in go1.18. If approvable it can now.

@zigo101
Copy link

zigo101 commented Mar 14, 2022

The two PR differs at that one returns nil for an empty slice, the other returns an empty slice instead.

@martisch
Copy link
Contributor

martisch commented Mar 14, 2022

Both Clone functions in CL 392194 and CL359675 when given a nil slice as input return a nil slice as ouput. Which is aligned with the behaviour of String.Clone.

Update: They also return nil slices for empty slices.

@zigo101
Copy link

zigo101 commented Mar 14, 2022

Ah, yes, their behaviors are the same.

Which is aligned with the behaviour of String.Clone.

There is not nil strings, so this should be a slice specific detail.

Personally, I recommend to return an empty slice for an empty slice.

@martisch
Copy link
Contributor

martisch commented Mar 14, 2022

There is not nil strings, so this should be a slice specific detail.

There is no nil string but there is a stringheader with length and backing array pointer of 0 which can not be compared differently as string but is different as struct header value from a string with length 0 and backing array pointer that is not nil.

Personally, I recommend to return an empty slice for an empty slice.

What should the backing array point of that empty slice be? If its the same as input it is not really a clone. If its a different one each time it requires an allocation. When only returning nil for nil we only have that special case for nil.

I like the consistency with strings. If length is zero then nil struct header is returned both for strings and bytes clone.

@icholy
Copy link
Author

icholy commented Mar 14, 2022

Here's an example of the nil behaviour being confusing:

package main

import (
	"encoding/json"
	"fmt"
)

func Clone(b []byte) []byte {
	if len(b) == 0 {
		return nil
	}
	c := make([]byte, len(b))
	copy(c, b)
	return c
}

func main() {
	b := []byte{}

	data, _ := json.Marshal(b)
	fmt.Println("Original:", string(data))

	data, _ = json.Marshal(Clone(b))
	fmt.Println("Cloned:  ", string(data))
}

https://go.dev/play/p/v-1dAVqDoGt

What should the backing array point of that empty slice be?

cap(make([]byte, 0)) // is 0

No backing array is allocated.

@martisch
Copy link
Contributor

martisch commented Mar 14, 2022

No backing array is allocated.
Its not possible to see the backing array pointer with cap function. Two slices can have cap 0 and different backing array pointers.

Make with 0 uses mallocgc which uses runtime.zerobase as backing array. Which makes the property of not reusing the backing array when cloning not true anymore even for non 0 backing array pointers.

Here's an example of the nil behaviour being confusing

I guess this generally in the direction of APIs differing for nil and empty slices which is generally discouraged in code bases I work on. I guess it cant be changed for json anymore and it is how it is.

@earthboundkid
Copy link
Contributor

I think the questions about nil and whether bytes.Clone should use the same cap, cap == len, or cap == size class point in the direction of just doing nothing. Cloning a string is tricky because you might think you did it correctly, but then the something about the runtime changes in Go N+1 and it isn't copied anymore. Having strings.Clone ensures that you won't get bitten after Go changes. bytes.Clone is just a convenient way to do something you could already do, but at the cost of being potentially misleading if the user has an incorrect expectation about the behavior.

@icholy
Copy link
Author

icholy commented Mar 14, 2022

Which makes the property of not reusing the backing array when cloning not true anymore even for non 0 backing array pointers.

Why does this matter?

@martisch
Copy link
Contributor

martisch commented Mar 16, 2022

Why does this matter?

Because thats what Clone in this proposal is supposed to be in contrast to a copy according to the earlier comment and discussion: #45038 (comment)

// Clone returns a fresh copy of b.
// Specifically, the result is like b[0 : len(b) : len(b)] but in a new backing array.

The reason to have a new backing array is that even a zero len and cap slice with a pointer into a memory allocation keeps that allocation from being garbage collected by the Go runtime. If that pointer is from a former large slice it may keep a huge amount of memory from being reused.

To avoid those references users can use strings.Clone and bytes.Clone to make sure there isnt more memory used by the slice then needed and the old memory is not reused.

So we want the new slice to have a different backing array, but we also want to avoid unnecessary allocations. Nil is already a language concept with a special name and the null pointer at any rate will exist often in a program. There is usually also nothing new allocated or written to what it points to. So its easy to carve out the one exception in documentation and behaviour to when the new slice will not have a new backing array and that is when passing in a nil slice.

@martisch
Copy link
Contributor

martisch commented Mar 16, 2022

@rsc @ianlancetaylor

Now that slices.Clone exists I propose to not implement bytes.Clone at all.

The original requests intention of wanting a simple helper to create a copy of a slice has been met by slices.Clone.

The slices.Clone seems to be a slice copy with the added property that the new cap can be larger/smaller than the old cap depending on the length of the input. But gives no explicit guarantees of not reusing memory. e.g. a 0 len, 0 cap slice might just be returned as is.

So we seem to have 3 options:

  • align bytes.Clone with the semantics of slices.Clone (efficient cap usage and nilness preserving) -> bad because its redundant and doesnt add new semantics and misaligns in subtle ways with strings.Clone
  • align with strings.Clone and always (except for nil) have different backing array -> misaligns semantics with slices.Clone
  • not add bytes.Clone to not misalign it with either strings.Clone or slices.Clone

@icholy
Copy link
Author

icholy commented Mar 16, 2022

@martisch yes, I understand what bytes.Clone is for ... I'm asking why it matters for empty slices with zerobase as the data pointer.

@martisch
Copy link
Contributor

martisch commented Mar 16, 2022

@martisch yes, I understand what bytes.Clone is for ... I'm asking why it matters for empty slices with zerobase as the data pointer.

Because Clone would not have the property of returning a new backing array for slices with zerobase as it is also returns the slice with zerobase again. So it would return the identical slice, not a new one. That means the documentation needs to clarify that excemption and document that this isnt not met when one passes in special slices with a backing array pointing to zerobase. But that is an implementation detail of the current runtime and it will be hard to comprehend for users what that means. It also means the runtime can not change in that regard without changing the semantics of the Clone function.

Instead giving the property that the excemption is only for nil slices is runtime independent and its easier to comprehend what slices this applies to as nilness is defined in the language and can be tested.

Rather than exposing a function with semantics that need to document runtime internals such as runtime.zerobase I rather not expose such a function at all.

@icholy
Copy link
Author

icholy commented Mar 16, 2022

The slices.Clone docs don't mention any of that.

// Clone returns a copy of the slice.
// The elements are copied using assignment, so this is a shallow clone.
func Clone[S ~[]E, E any](s S) S {
	// Preserve nil in case it matters.
	if s == nil {
		return nil
	}
	return append(S([]E{}), s...)
}

@rsc
Copy link
Contributor

rsc commented Mar 16, 2022

It seems like we should still implement bytes.Clone, for parallelism with strings.Clone.
We want to keep package bytes and package strings as aligned as possible.

It's also worth noting that slices.Clone is in x/exp, not the standard library.

It also seems fine to me to have bytes.Clone([]byte{}) == []byte{} in the sense that both point at zerobase.

@martisch
Copy link
Contributor

The slices.Clone docs don't mention any of that.

Because slices.Clone has different semantics than discussed here and does not guarantee a new backing array is returned. It also can return a different cap.

I updated https://go-review.googlesource.com/c/go/+/359675 but only mentioned for len and cap != 0 a new backing array is returned as for other slices it might be the same as I do not see how we can describe without mentioning and pinning runtime internals (which we likely dont want to expose or forever keep the same) for which ones.

@icholy
Copy link
Author

icholy commented Jun 16, 2022

@martisch do you plan on continuing work on https://go-review.googlesource.com/c/go/+/359675 ?

@rsc rsc moved this to Accepted in Proposals Aug 10, 2022
@rsc rsc added this to Proposals Aug 10, 2022
@dmitshur dmitshur modified the milestones: Backlog, Go1.20 Sep 28, 2022
@icetech233

This comment was marked as abuse.

@ianlancetaylor

This comment was marked as resolved.

tmm1 pushed a commit to fancybits/go that referenced this issue May 26, 2023
The new Clone function returns a copy of b[:len(b)]
for the input byte slice b.
The result may have additional unused capacity.
Clone(nil) returns nil.

Fixes golang#45038

Change-Id: I0469a202d77a7b491f1341c08915d07ddd1f0300
Reviewed-on: https://go-review.googlesource.com/c/go/+/359675
Run-TryBot: Martin Möhrmann <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Joseph Tsai <[email protected]>
Reviewed-by: Martin Möhrmann <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
(cherry picked from commit 7b45edb)
@rsc rsc removed this from Proposals Aug 16, 2023
@golang golang locked and limited conversation to collaborators Jan 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests