Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slices: add Concat #56353

Closed
earthboundkid opened this issue Oct 20, 2022 · 57 comments
Closed

slices: add Concat #56353

earthboundkid opened this issue Oct 20, 2022 · 57 comments

Comments

@earthboundkid
Copy link
Contributor

This came up in the discussion of #55358 and prior discussions. It would be nice to have a simple, generic way to concatenate a slice of slices.

Here's the proposed signature:

func Concat[S ~[]E, E any](dst S, slices ...S) S 

Usage:

// Join slices into a new slice
a := []int{ 1, 2, 3 }
b := []int{ 4, 5, 6 }
c := slices.Concat(nil, a, b) 
// c == int{ 1, 2, 3, 4, 5, 6 }

// overwrite an existing slice
a = slices.Concat(a[:0], c[0:1], c[2:3], c[4:5]) 
// a == int{ 1, 3, 5 }
@gopherbot gopherbot added this to the Proposal milestone Oct 20, 2022
@rittneje
Copy link

a := []int{1, 2, 3}
b := []int{4}
c := slices.Concat(a[:2], b, a[2:])

What is the value of c? Depending on the order of events weird things could happen.

For example, consider the following (perhaps naive) implementation:

func Concat[S ~[]E, E any](dst S, slices ...S) S {
	for _, s := range slices {
		dst = append(dst, s...)
	}
	return dst
}

This will result in c being [1 2 4 4] which is probably unexpected.

@ericlagergren
Copy link
Contributor

Isn't append already generic?

@earthboundkid
Copy link
Contributor Author

Isn't append already generic?

Append can't concatenate multiple slices. This means you need to jump through hoops to precalculate the capacity and to make a one-liner you end up with awkward things like append(s1, append(s2, s3...)...).

@earthboundkid
Copy link
Contributor Author

This will result in c being [1 2 4 4] which is probably unexpected.

I think it's only worth doing Concat if it does one shot preallocation, like

func Concat[S ~[]E, E any](dst S, ss ...S) S {
	c := 0
	for _, s := range ss {
		c += cap(s)
	}
	dst = Grow(dst, c)
	for _, s := range ss {
		dst = append(dst, s...)
	}
	return dst
}

If you run this, you get 1, 2, 4, 3.

@rittneje
Copy link

@carlmjohnson I don't think it should use cap as that could greatly overestimate the size requirement. For example, slices.Concat(nil, make([]int, 0, 1000)) should not grow the dest to 1000 elements.

Really it should be len, but that still causes problems. https://go.dev/play/p/zb1d-MhD6Pj

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Oct 20, 2022
@earthboundkid
Copy link
Contributor Author

😅 You're right!

@earthboundkid
Copy link
Contributor Author

I think at a certain point, if the user is clobbering the slice they are reading from, it's a user error. You can probably make up an example using slices.Insert that is confusing too.

@rittneje
Copy link

@carlmjohnson It would be all too easy for someone not to realize that the first parameter is special in that regard, especially because it isn't in other languages that have similar APIs. And due to the nature of append/slices.Grow it will sometimes appear to work the way they expect, meaning they might not even notice the problem until it causes some production incident.

What if it were a method instead of a function, to reinforce the fact that the first parameter is actually the output buffer?

type Buffer[T any] []T

func (buf Buffer[T]) Concat(ss ...[]T) Buffer[T] {
	size := 0
	for _, s := range ss {
		size += len(s)
	}
	if size == 0 {
		return buf
	}
	buf = Grow(buf, len(buf)+size)
	for _, s := range ss {
		buf = append(buf, s...)
	}
	return buf
}

(This type name is just for example purposes.)

Then it would be slices.Buffer[int](a).Concat(b, c). For people who don't care to optimize it, they can call slices.Buffer[int](nil).Concat(a, b, c).

@earthboundkid
Copy link
Contributor Author

The objection to destination slices already applies to several functions in exp/slices: Insert, Delete, the upcoming Replace, and arguably Grow and Clip (although not as confusing as the others). If you think it's causing bugs, you should open an issue to make a v2 of the package before the current slices API gets into the standard library.

@rittneje
Copy link

Yes, it is possible to break them, but it is a bit more contrived. https://go.dev/play/p/4lLBHhxrfdN

Regardless, adding a typedef with methods does not itself necessitate a v2. It just means some (or all) of the existing functions would presumably be deprecated.

@AndrewHarrisSPU
Copy link

I feel like two versions of concatenation could make sense - a minimally complicated, more variadic 'Append'-like concatenation, and a profligately eager, maximally allocating 'Collect'-ing concatenation:
https://go.dev/play/p/13N05VZPjwS

@earthboundkid
Copy link
Contributor Author

I don't think it makes sense to have two very similar concatenation functions.

@DeedleFake
Copy link

Especially because the Append() case could be handled by allowing multiple spread operations in variadic arguments, such as append(s, a..., b..., c...).

@rittneje
Copy link

@DeedleFake The Go language spec defines a... to reuse the backing slice rather than copying. Allowing multiple spread operations could not preserve this essential behavior.

@DeedleFake
Copy link

It could, but only for the case of a single spread operator being used. It doesn't seem that different to me to just having multiple elements without using the spread operator.

@AndrewHarrisSPU
Copy link

AndrewHarrisSPU commented Oct 24, 2022

I don't think it makes sense to have two very similar concatenation functions.

The edit distance in LOC from append(...) to a variation with copy-then-append, or clone-then-append, isn't large. I do think they are not at all similar in terms of actual behavior, the invariants broken or preserved, algorithmic uses etc. That said I think it'd be fair enough to just have the append-like behavior in slices (a different take on Collect in an iter package would be more interesting.)

I think what I'm trying to suggest is that, on a first glance Concat can read either way. To me, I'd expect the Collect-like behavior from a slices.Concat. FWIW, there are only a few uses of a func concat or a method concat in the standard library (plus one in x/exp/slog); they all maintain the "no clobbering" invariant, mostly by slice copying rather than slice growing.

@rittneje
Copy link

It could, but only for the case of a single spread operator being used. It doesn't seem that different to me to just having multiple elements without using the spread operator.

It cannot.

func foo(x ...int) {
    // ...
}

a := []int{1, 2, 3}
b := []int{4, 5, 6}
foo(a..., b...) // cannot work

Within foo, the type of x is []int, which means it must be a standard slice backed by a single contiguous array. Thus there is no way to pass both a... and b... without copying at least one of them.

With regards to the original proposal, I think naming it something like AppendAll would be better, as that would at least suggest that it has analogous behavior to built-in append as far as the first parameter goes.

@DeedleFake
Copy link

Thus there is no way to pass both a... and b... without copying at least one of them.

Right, that's what I said. It wouldn't change anything for existing cases where only a single spread operator is used, which is all that's currently legal.

@ianlancetaylor
Copy link
Contributor

Discussion about passing both a... and b... should perhaps go to #18605 rather than here. Thanks.

@zigo101
Copy link

zigo101 commented Nov 11, 2022

concat should be built-in to get the most out of it.
Otherwise, the implementation is hard to be performant, even correct.

@aclements
Copy link
Member

I think we should consider this proposal together with #4096, since these are two different solutions to the same problem.

Personally, I would much prefer to have #4096, since that feels like a natural and ergonomic extension to the way people already concatenate multiple slices. While we can do this with a generic function, doing so creates two different ways to do almost the same thing, where neither subsumes the other. It also has some (probably slight) performance disadvantages compared to doing it directly in append.

@rsc
Copy link
Contributor

rsc commented May 24, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals May 24, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/504882 mentions this issue: slices: add Concat

@6543
Copy link
Contributor

6543 commented Aug 7, 2023

@6543
Copy link
Contributor

6543 commented Aug 7, 2023

well this should impl. the proposed api: #61817

correct me if I did messed up somewhere :)

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/516656 mentions this issue: slices: add Concat

@fzipp
Copy link
Contributor

fzipp commented Aug 7, 2023

@6543 There is already an implementation; see the linked change list in comment #56353 (comment)

@6543
Copy link
Contributor

6543 commented Aug 7, 2023

@fzipp yes but it does not implement the proposed api and looks stale

@earthboundkid
Copy link
Contributor Author

It implements the approved API. It's not stale, just waiting for the Go release cycle: https://github.com/golang/go/wiki/Go-Release-Cycle

@zigo101
Copy link

zigo101 commented Sep 1, 2023

Maybe this function should be specially optimized, just like maps.Clone? https://docs.go101.org/std/src/maps/maps.go.html#line-41

@earthboundkid
Copy link
Contributor Author

If you have a performance improvement to benchmark, that can be a new issue.

@zigo101
Copy link

zigo101 commented Sep 1, 2023

It is impossible to write normal benchmarks without hacking the runtime code. Someone ever hacked it.

There is no need to create a new issue.

eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
Fixes golang#56353

Change-Id: I985e1553e7b02237403b833e96fb5ceec890f5b8
GitHub-Last-Rev: 96a35e5
GitHub-Pull-Request: golang#60929
Reviewed-on: https://go-review.googlesource.com/c/go/+/504882
Auto-Submit: Ian Lance Taylor <[email protected]>
Run-TryBot: Ian Lance Taylor <[email protected]>
Run-TryBot: Ian Lance Taylor <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
eric pushed a commit to fancybits/go that referenced this issue Sep 7, 2023
Fixes golang#56353

Change-Id: I985e1553e7b02237403b833e96fb5ceec890f5b8
GitHub-Last-Rev: 96a35e5
GitHub-Pull-Request: golang#60929
Reviewed-on: https://go-review.googlesource.com/c/go/+/504882
Auto-Submit: Ian Lance Taylor <[email protected]>
Run-TryBot: Ian Lance Taylor <[email protected]>
Run-TryBot: Ian Lance Taylor <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
eric pushed a commit to fancybits/go that referenced this issue Dec 23, 2023
Fixes golang#56353

Change-Id: I985e1553e7b02237403b833e96fb5ceec890f5b8
GitHub-Last-Rev: 96a35e5
GitHub-Pull-Request: golang#60929
Reviewed-on: https://go-review.googlesource.com/c/go/+/504882
Auto-Submit: Ian Lance Taylor <[email protected]>
Run-TryBot: Ian Lance Taylor <[email protected]>
Run-TryBot: Ian Lance Taylor <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@justinhwang
Copy link

@ianlancetaylor / @earthboundkid could one of you elaborate on what this comment means and why we shouldn't be using make here?

https://go-review.googlesource.com/c/go/+/504882/3..6/src/slices/slices.go#b510

Since we don't make any promises about the capacity, I recommend

newslice := Grow[S](nil, size)

That should lift the capacity up to the next size class.

@earthboundkid

This comment was marked as off-topic.

@justinhwang
Copy link

If it used make there, the capacity would be exactly the combined size of the slices, but we want it to be rounded up to the next size class, so people don't rely on len(s) == cap(s) being true, and box ourselves into a corner.

Still not quite following, what do we mean by rounded up to the next size class? Do we mean:

cap(Grow[S](nil, size)) != size

@earthboundkid

This comment was marked as off-topic.

@OhJuhun
Copy link

OhJuhun commented Jul 15, 2024

@earthboundkid
I also don't quite understand why slices.Grow should be used instead of make in this case. I tested it with Go 1.22 using the link you provided, and contrary to what the post claims, this code does not allocate memory.

func BenchmarkSortStrings(b *testing.B) {
        s := []string{"heart", "lungs", "brain", "kidneys", "pancreas"}
        b.ReportAllocs()
        for i := 0; i < b.N; i++ {
                sort.Strings(s)
        }
}
image

Therefore, I believe that this post is no longer valid.

In this case, make seems to be superior in terms of performance, and there doesn't seem to be a significant difference in readability either. Can you explain in more detail why slices.Grow should be used in this scenario please?

@earthboundkid
Copy link
Contributor Author

This is a closed issue. Your benchmark is invalid because sort.Strings isn't relevant to slices.Concat and the slice is sorted after the first loop. If you can make a change with better performance, submit the change with a benchmark from benchstat.

@zigo101
Copy link

zigo101 commented Jul 16, 2024

@OhJuhun
It looks the APIs in the slices package try to round up memory size to the next class size, except strings.Repeat. Honestly, I don't understand the cause of the inconsistency.

The current design of the slices package lacks of custom configurations:

  • whether or not round up to the next memory class size.
  • whether or not zero freed-up elements (in Delete, Replace and Compact etc)
  • whether or not keep element order (such as in Delete)
  • ...

Because of these facts, the slices package doesn't satisfy the needs of all cases.
So sometimes, you might need to write your own implementations.

@Sec32fun32 Sec32fun32 mentioned this issue Jul 18, 2024
@rsc rsc removed this from Proposals Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.