Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RawVec stores a capacity field even if T is zero-sized #45431

Open
gnzlbg opened this issue Oct 21, 2017 · 15 comments
Open

RawVec stores a capacity field even if T is zero-sized #45431

gnzlbg opened this issue Oct 21, 2017 · 15 comments
Labels
A-layout Area: Memory layout of types A-specialization Area: Trait impl specialization A-zst Area: Zero-sized types (ZST). C-feature-accepted Category: A feature request that has been accepted pending implementation. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 21, 2017

When std::mem::size_of::<T>() == 0 RawVec<T> has always a capacity equal to std::usize::MAX elements. It currently stores a cap: usize that always contains this value but doing so is unnecessary in this case.

@withoutboats this would be a not so far fetched case for allowing struct specialization like @arielb1 proposed.

@bluss
Copy link
Member

bluss commented Oct 21, 2017

Time to interpret what https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#guarantees actually means for this

Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these. The pointer will never be null, so this type is null-pointer-optimized.

Plain reading says that the capacity field must be there. It's a bit silly, since the only interface with the representation should be with Vec::from_raw_parts, where this "triplet" guarantee isn't needed.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 21, 2017

@bluss I am finishing overhauling the growth-strategy for vectors (#29931, #27627, I'd like to find a mentor to discuss it before submitting a PR) but I'd like to improve on this afterwards.

My proposal would be to amend those guarantees as follows:

Most fundamentally, Vec where T is not zero-sized is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these. The pointer will never be null, so this type is null-pointer-optimized. When T is zero-sized the layout of Vec<T> is unspecified.

And then to replace the cap field of RawVec with a zero-sized type when T is zero-sized.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 21, 2017

FWIW I think that the standard library should make more guarantees than it currently does, in particular with respect to the space and time complexity of operations.

Yet I think this is an example of something that the standard library should never guarantee. The wording kills me:

Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified

What is this guarantee for? It sounds like I can rely on this for serialization purposes, but then the Vec type has different sizes in 32 and 64-bit architectures. This wording also prevents other equally-sized implementations, e.g., a triple of pointers, that might optimize better on a different backend, and the current topic, optimizations for zero-sized types.

Changing this wording would be a breaking change, but if we can't change this, this should become an example of how the types of guarantees that standard library types should not be making.

@bluss
Copy link
Member

bluss commented Oct 21, 2017

Cool. Does it include using usable_size as well? (Don't worry if not, I think that's fine to address as a separate little project too.)

I think the guarantees are definitely overreaching there, but since they are not well defined enough to actually be used for much of anything, maybe the whole triplet thing can be reworded — the Vec has a representation equivalent to pointer, capacity and length, it's just that the capacity doesn't need to be stored for some element types.

By the way, do RawVec and Vec need to store a pointer field for ZST?

For questions, I'll certainly answer anything I can if you find me here or on IRC. Then I suppose @gankro can as well, if he has time.

I haven't heard about struct field specialization, and would love to read more about that.

@cuviper
Copy link
Member

cuviper commented Oct 21, 2017

I agree it could also forgo the pointer. All a ZST Vec should need to store is a length. I guess it only depends on whether from_raw_parts followed by as_ptr needs to preserve arbitrary pointer values.

@TimNN TimNN added C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 22, 2017
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 24, 2017

@cuviper I am talking about RawVec here, but I agree, Vec only needs to store a length, RawVec doesn't need to store anything.

@bluss
Copy link
Member

bluss commented Oct 24, 2017

We can't implement this until some kind of specialization allows it in a backwards compatible way. Aside from doubts about associated type specialization, the problem of keeping Vec covariant in T., Which is not the case with a.t.spec. now AFAIU.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 25, 2017

@bluss can you elaborate about how this kind of specialization would change the variance of Vec ? AFAIK the only way in which a user of Vec<T> can tell whether this specialization is applied is by observing the mem::size_of::<Vec<T>>. We would need some "hacks" to make iteration on these vectors work, but that should be doable (e.g. using a static variable and returning references to that).

@bluss
Copy link
Member

bluss commented Oct 27, 2017

@gnzlbg Yeah, I was meaning to come back to you with an example.

Here's some code that shows that code that compiles for current Vec<T> does not compile if we instead use an associated type (that depends on T) in the representation.

(playground link)

struct CurrentVec<T> {
    elements: Box<T>
}

struct SmartRepresentationVec<T> {
    elements: <T as Smart>::Representation,
}

trait Smart {
    type Representation;
}

impl<T> Smart for T {
    type Representation = Box<T>;
}

// This compiles!
fn change_lifetime1<'a, T>(input: CurrentVec<&'static T>) -> CurrentVec<&'a T> {
    input
}

// This does not! The SmartRepresentationVec is invariant in its parameter.
fn change_lifetime2<'a, T>(input: SmartRepresentationVec<&'static T>) -> SmartRepresentationVec<&'a T> {
    input
}

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Oct 28, 2017

@bluss why is that?

@bluss
Copy link
Member

bluss commented Oct 28, 2017

I don't know exactly why the rule is like this.

It looks like niko explains the rule here: #21726 (comment) by showing how the old "covariant" rule (for us here, "covariant in T" means "be like Vec<T>") was unsound in some situations.

@dtolnay dtolnay added C-feature-accepted Category: A feature request that has been accepted pending implementation. and removed C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Nov 18, 2017
@dtolnay
Copy link
Member

dtolnay commented Nov 18, 2017

I would like to see an implementation of this in a PR.

My reading of the Guarantees section seems like the "no more, no less" claim is just a generalization of the following claims, not a further claim of its own. It just means to say that Vec can't be a rope, can't do small-vector optimization, must support efficient conversion to Box<[T]> when len == capacity, etc all the other following claims. I agree with #45431 (comment) that the "no more, no less" claim is not actionable by itself and so there is no problem rewording it.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jan 10, 2020

Notice that in rust-lang/unsafe-code-guidelines#224 a different extension is discussed. That is, adding a mem::CompressedNonNull<T> type that's a ZST if T is a ZST, in which case the address is always just mem::align_of::<T>().

With this and that optimizations, Vec<T> would "semantically" still have 3 fields (ptr, size, cap), but ptr and cap would be ZSTs when T is a ZST, making Vec<ZST> 1-word wide.

A way to provide wording that guarantees that these optimizations will never happen would be to, e.g., guarantee that the size_of::<Vec<T>>() == size_of::<usize>() * 3 for all Ts. The current wording is a bit ambiguous on what's allowed or not.

@bluss
Copy link
Member

bluss commented Apr 13, 2020

Is there a tangible use case for making Vec smaller for some types?

@camelid camelid added the A-zst Area: Zero-sized types (ZST). label Oct 7, 2020
@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented Nov 9, 2023

As noted in #117763, fixing this could allow giving Cow<[T]> a more optimal layout for T of all sizes. Special-casing the type of the capacity field when T is a ZST, would allow capping it to isize::MAX in all other cases. This would create a new niche, which Cow's layout could exploit to good effect.

@rustbot label A-layout A-specialization I-heavy I-slow

@rustbot rustbot added A-layout Area: Memory layout of types A-specialization Area: Trait impl specialization I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Area: Memory layout of types A-specialization Area: Trait impl specialization A-zst Area: Zero-sized types (ZST). C-feature-accepted Category: A feature request that has been accepted pending implementation. I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants