Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: unsafe.Slice for slightly safer type-punning #38203

Closed
seebs opened this issue Apr 1, 2020 · 16 comments
Closed

proposal: unsafe.Slice for slightly safer type-punning #38203

seebs opened this issue Apr 1, 2020 · 16 comments

Comments

@seebs
Copy link
Contributor

seebs commented Apr 1, 2020

What version of Go are you using (go version)?

$ go version
1.14

Does this issue reproduce with the latest release?

Sure.

What operating system and processor architecture are you using (go env)?

N/A.

What did you do?

var a [65536]byte
a64 := (*[16384]uint64)(unsafe.Pointer(&a))[:16384]

(In my defense, I was not quite awake yet.)

What did you expect to see?

Probably a bounds check error?

What did you see instead?

Nothing.

And yes, I understand: unsafe is what it says on the tin. But I think there's a bit of a flaw in the design space right now. There's no middle ground between "only completely strictly-supported conversions" and "no sanity-checks or error-checking of any kind".

A while back, I was confused about some of the SliceHeader docs, and filed an issue (#33794) about that, and Ian helpfully pointed out a canonical idiom:

asUint16 := (*[1<<29]uint16)(unsafe.Pointer(obj.Ptr))[:obj.len:obj.len]

But this idiom suppresses any possible bounds-checking.

My initial thought was that it might make sense to abuse/extend the [...]foo array syntax for this kind of thing -- have that tell the compiler to use The Actual Available Size, if it can tell, and if it can't, to reject the code as invalid. Or something like that. But that's a mess.

Alternative proposal: unsafe.Slice, which is like unsafe.Pointer, but knows about len and cap.

So, for instance: Given a []byte which holds... something, anyway, possibly data, such as the return value from mmap:

asUint16 := ([]uint16)(unsafe.Slice(data))

This doesn't actually need an explicit setting of len and cap -- it can compute the correct len and cap values from the existing slice. If you do want to slice it, you get bounds checking on the bounds you provide.

This doesn't let you do anything new, exactly, but it does replace one of the more common uses of unsafe.Pointer with a slightly-safer thing, and I think this would functionally replace any real-world usage of SliceHeader I've ever seen with a variant that is guaranteed not to make the stupid mistakes that real programmers do.

This is similar to, but not the same as, 19367. I am thinking of unsafe.Slice as mostly being useful for casting, rather than being a struct that users would interact with as a struct. So you wouldn't access s.Len, you'd call len(s). Or possibly not, because it's meaningless to ask about the length of an abstract slice without a data type in mind.

So, for instance:

uint64s := ([]uint64)(unsafe.Slice(uint16s))```

The `unsafe.Slice` would know that it has a cap of 46 bytes and a length of 34 bytes, so it would produce a []uint64 with len 4 and cap 5.
@gopherbot gopherbot added this to the Proposal milestone Apr 1, 2020
@ianlancetaylor
Copy link
Member

This space seems pretty well covered by #19367, which seems to include essentially this proposal.

@seebs
Copy link
Contributor Author

seebs commented Apr 2, 2020

I think I got confused by reading the comments, which seemed to be drifting towards a Slice(ptr, len) function or something like that. I think the big distinction is that, in this proposal, the len/cap are type-aware, and the type specifically allows casting-to-slice-types, so there's no fancy shuffling around with slicing pointers-to-arrays and such, and you don't have to think about or compute length and capacity; they become the compiler's problem in some way.

@bcmills
Copy link
Contributor

bcmills commented Apr 2, 2020

#19367 addresses the use-case of expanding a C-style pointer-to-object to a slice of essentially the same type of object. (For example, expanding a *C.char obtained from C to a []byte usable in Go, or expanding a *C.uint16_t to a []uint16.)

If I understand this proposal correctly, it aims to address the (related but distinct) use-case of punning an object of known size to an equivalently-sized slice of a different type.

The change in underlying type introduces some additional concerns: since in this proposal the underlying element type is changed, it is sensitive to architectural endianness and alignment constraints in a way that mere pointer-to-slice expansion is not. I think that sensitivity makes the operation in this proposal more dangerous (and less fundamental) than #19367.

@bcmills
Copy link
Contributor

bcmills commented Apr 2, 2020

However, note that this could be implemented as a library using essentially the same approach as in #13656 (comment).

(The key addition would be the calculation of the length and capacity of the new slice from the length and capacity of the previous slice, but that seems straightforward enough.)

@bcmills
Copy link
Contributor

bcmills commented Apr 2, 2020

(And, along the lines of #19367 (comment), given generics it may be possible to implement this function without the use of reflect; but see the caveat in #19367 (comment).)

@seebs
Copy link
Contributor Author

seebs commented Apr 2, 2020

I agree that it's possibly more dangerous and less fundamental. My basic motivation for this proposal is: I think this is right about exactly the dividing line where the compiler can give us nice clean semantics. Any more-manual thing is significantly more vulnerable to user error.

In the code I work with, I don't think there's a single example of pointer-to-slice conversion that isn't really about type-punning (although possibly type-punning "to []byte" which is sort of a special case). In all of these cases, a conversion that preserved length and cap, and was managed by the compiler, would be sufficient, and anything that it wasn't sufficient for would be an outright error that could be caught by slice bounds checking if we had a way to clearly express "the bounds this slice would have as this other type".

So basically, in the code I work on, this would replace every existing use of SliceHeader, and every existing use of casting pointers to (pointer-to-array) and slicing the array, and would in every case introduce additional bounds checks at the point of conversion which happen to be desireable.

And I recognize that it does nothing for other use cases, but that's sort of the appeal; it doesn't try to solve the hard problems, but if you don't have one of the hard problems, this makes it a lot easier to express things clearly and get a bit of safety back. And if you do have the hard problems, you still have all the existing unsafe.Pointer tools available.

bcmills pushed a commit to bcmills/unsafeslice that referenced this issue Apr 2, 2020
@jimmyfrasche
Copy link
Member

Couldn't you do this with the unsafe.Slice defined in #19367 by:

var s []S = f()
t := unsafe.Slice((*T)(unsafe.Pointer(&s[0])), len(s))

@bcmills
Copy link
Contributor

bcmills commented Apr 2, 2020

The implementation for this turns out not to be entirely trivial — it's easy to write without considering overflow and truncation, but getting those correct (portably) is not entirely trivial.

A couple of interesting examples:

  • What should happen if the length or capacity of the source slice is not an integer multiple of the destination element type? (Probably a run-time panic?)

  • What should happen if the slices are misaligned? (Probably the same thing that happens under checkptr..? Does it need to be checked explicitly?)

@bcmills
Copy link
Contributor

bcmills commented Apr 2, 2020

@jimmyfrasche, at least in theory both the length and capacity should be ported over, and don't forget to adjust the length if S and T are different sizes.

@jimmyfrasche
Copy link
Member

I left the cap off for brevity. I missed the part about the case |T| < |S| being requested.

@seebs
Copy link
Contributor Author

seebs commented Apr 2, 2020

My intuition is to think that "misaligned" should probably be handled by checkptr or something like it. My preference for "not integer multiple" would be that len and cap are computed rounded-down, because in one of my real cases, I have a [5]uint16 that I want to treat, sometimes, as a slice of struct{from, to uint16}.

And yeah, the issue, from my point of view, of the 19367 unsafe.Slice is that I have to do several extra things: I have to think about length and cap, and if I do it wrong, nothing can check that for me.

Basically: If I do it that way, I am taking a slice, throwing away all of its extra bounds-checking safety to turn it into a bare pointer, and then reinventing a new slice from it that then has to have the bounds-checking reintroduced.

My feeling is that it's a lot nicer to not throw the stuff away and then recreate it.

So compare:

func asUint64s(data []byte) (res []uint64) {
  uslice := (*unsafe.Slice)(&res)
  uslice.Cap = cap(data) / 8
  uslice.Len = cap(data) / 8
  uslice.Pointer = (unsafe.Pointer)(&data[0])
  return res
}

func asUint64s(data []byte) []uint64 {
  return []uint64(unsafe.Slice(data))
}

I think the second is easier to read and significantly less error-prone. For instance, the error in the above isn't intentional, it just happened because I was typing quickly and not thinking it through carefully enough, and the compiler wouldn't spot it. Note also that I think you have to be careful about the order of the assignments of cap and len, and actually bcmills has pointed out in another example that you really ought to set Cap and Len to 0 first, then assign the pointer, then assign the corrected values. And that's a lot of fiddly details to easily get wrong.

@jimmyfrasche
Copy link
Member

So this could be implemented with a combination of the other unsafe.Slice and generics, and there are multiple valid ways of defining it?

@seebs
Copy link
Contributor Author

seebs commented Apr 2, 2020

I'm not sure it can be done with the other one and generics, because a key component of this is that it's type-aware in some way. If you just cast a pointer to a slice to a pointer to struct{pointer,len,cap}, you now have no way of meaningfully interpreting the len and cap of that structure.

Think about what happens if you cast both &data and &res in the first function to *unsafe.Slice. What are the len and cap values? They pretty much have to be correct for their target types.

My proposal, I think, requires that unsafe.Slice have the length and capacity that would be expected from a []byte covering that space, probably. Or that it not have length and capacity that are externally visible, and the only way to get to len/cap is to cast it to a specific slice type.

@jimmyfrasche
Copy link
Member

I am referring to the proposal for func Slice(source *ArbitraryType, cap int) []ArbitraryType which takes care of converting from cap contiguous T in memory to a []T.

I may still be misunderstanding but you'd still have unsafe.Sizeof and you could use that to compute the correct len/cap.

func PunSlice(type S, T)(source []S) []T {
  var s S
  var t T
  ssz := int(unsafe.Sizeof(s))
  tsz := int(unsafe.Sizeof(t))
  newLen := f(len(source), ssz, tsz)
  newCap := f(cap(source), ssz, tsz)
  return unsafe.Slice((*T)(unsafe.Pointer(&source[0])), newCap)[:newLen]
}

It just needs a suitable definition of f that handles the desired len/cap conversion when the sizes differ. It would be up to the author to ensure that that conversion was correct but it only needs to be done once per choice of how to go about that.

@seebs
Copy link
Contributor Author

seebs commented Apr 3, 2020

Yeah, that one works, but it seems a lot larger and more potentially error-prone than a direct cast.

As a side note; assuming an initial slice which is Maximally Aligned, one of the things I sort of like about this is that it allows unsafe.Slice to be a fairly generic form for a memory buffer; you don't need to have any idea what the previous type was, after all; you get a thing which can be converted to any slice type and Just Works.

@rsc
Copy link
Contributor

rsc commented Apr 15, 2020

Closing as a duplicate of #19367.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants