Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: IndexGet and IndexSet #2953

Closed
wants to merge 5 commits into from
Closed

RFC: IndexGet and IndexSet #2953

wants to merge 5 commits into from

Conversation

carbotaniuman
Copy link

@carbotaniuman carbotaniuman commented Jul 6, 2020

This RFC adds IndexGet and IndexSet types, which allows for additional options when implementing containers, and changing the behavior of assignments.

Rendered

text/0000-index-get-set.md Outdated Show resolved Hide resolved
@programmerjake
Copy link
Member

Additional benefits include more support for DSLs similar to nmigen and its use of Python's index operator to indicate bit slicing. nmigen is based on using Python's operators to construct a description of what the outputted hardware will do rather than executing the operations directly as a normal programming language might.

Co-authored-by: kennytm <[email protected]>
@cuviper
Copy link
Member

cuviper commented Jul 8, 2020

It's a little sad that "set" as a verb or noun has such different meanings, so your trait IndexSet would collide with this struct IndexSet. Maybe trait IndexAssign is a good alternative here? That would also fit in with ops::AddAssign et al. It wouldn't mirror trait IndexGet as nicely, but maybe something like IndexMove would work there.

@clarfonthey
Copy link
Contributor

I also think that IndexAssign is better than IndexSet, as it matches the other operators with Assign.

As far as IndexGet is concerned -- honestly, this is a bit weird, as it basically is trying to allow indexing to return arbitrary values. I don't really know if that's needed, TBQH.

Hear me out -- bitset and arbitrary bit slicing seems like a good candidate for this, as you can't just reference an arbitrary bit slice. But what if you could, and the bit offset were stored in the value as a fat pointer? In most cases, if you were directly copying the data out, this would be mostly optimised away, but otherwise you'd be able to do everything as usual.

If it's copying or cloning data out, it also makes sense to just require a method call or dereference.

If we were moving out, I'd see the need for an explicit trait, but that seems a bit weird and magical IMHO. For example, if map[key] by value were to basically remove an element from the map, I'd say that's super confusing and unexpected.

Basically, IMHO, we should ditch IndexGet.

@carbotaniuman
Copy link
Author

IndexAssign fits another set of names that was bikeshedded, IndexVal and IndexAssign. I do think IndexGet is pretty fundamental to this proposal though, as not including relegates IndexSet to a very niche use, basically only HashMap situations. In addition, IndexSet would allow the setting of arbitrary values anyways, and IMO it doesn't make sense that you wouldn't have a way to get arbitrary values if you could set them.

@coolreader18
Copy link

coolreader18 commented Jul 9, 2020

I feel like what IndexGet proposes can already be done with ops::Index, though -- the a[b] operator returns <a as Index>:: Output, which you only have to reborrow if Output is not Copy. That BitSet example, at least the immutable part of it, can already be implemented on stable, (and here is where I realize the distinction between returning a reference borrowed from self with Index vs created with IndexGet, but I already got this far) though somewhat hackily:

fn index(&self, x: u8) -> &bool {
    static TRUE: bool = true;
    static BOOL: bool = false;
    if self.foo(x) { &TRUE } else { &FALSE }
}

(You could probably use const propagation instead of explicit statics, but w/e)

Still, this does seem unnecessary -- I thought the plan for a while has been to add an IndexSet trait that just complements the existing Index/IndexMut? I still fail to see why it's integral to the proposal, Index has always existed to "get arbitrary values if you could set them", and I don't think implicit cloning would be very desirable for many data structures.

@burdges
Copy link

burdges commented Jul 9, 2020

We should definitely explore the custom fat pointer idea before doing this.

In fact, we could implement such fat pointers without necessarily touching rustc by using ATCs that default to references:

pub trait Index<Idx: ?Sized> {
    type Output: ?Sized;
    type Borrowed<'a>: Deref<Output> = &'a Self::Output;
    fn index(&self, index: Idx) -> Borrowed<'_>;
}
pub trait IndexMut<Idx: ?Sized>: Index<Idx> {
    type BorrowedMut<'a>: Deref<Output>+DerefMut = &'a mut Self::Output;
    fn index_mut(&mut self, index: Idx) -> BorrowedMut<'_>;
}

We'd provide the bitvec functionality by returning a guard that tracks an offset, which then work with *Assign ops types too. We'd extend hashmap by returning some Entry-like guard here, which seemingly provides a far more powerful interface than proposed here.

Aside from likely being suboptimal, I'm afraid this IndexGet/IndexSet solution makes understanding indexing operations quite difficult. You must internalize these subtle rules and "think like the compiler", which provides one big reason not to do IndexGet/IndexSet regardless.

I think this ATC solution however avoids massive confusion about which traits handle which functionality, which keeps the language simpler and more accessible. I think even code that needs strange guards like bitvec and hashmap becomes much more comprehensible with the ATC solution. Any code without strange guards becomes no more complex.

We could explore using similar ATCs to avoid trait proliferation elsewhere too, maybe becoming a common easily understandable design pattern within std.

@burdges
Copy link

burdges commented Jul 10, 2020

We've currently no "assignment" trait really. Would one make sense?

pub trait Assign<T> {
    fn assign(self, t: T);  //  Or &mut self?
}

impl<'a> Assign<V> for BitSetGuard<'a> { .. }

impl<'a, K, V> Assign<V> for ::std::collections::hash_map::Entry<'a, K, V> {
    fn assign(self, t: T) {
        match self {
            Occupied(mut entry) => { entry.insert(value); },
            Vacant(entry) => entry.insert_entry(value),
        }
    }
}

An ATC solution still handles BitSet better than IndexSet even without an assignment trait because BitOrAssign, BitAndAssign, and BitXorAssign are safer and more natural than outright assignment.

@carbotaniuman
Copy link
Author

I really like the ATC solution, but it breaks backwards compatibility. If your concern is with the teachability of IndexGet, perhaps making it mutually exclusive with Index would alleviate some of the mental burden?

@coolreader18
Copy link

How does the ATC solution break backwards compatibility? AFAICT it's still the same if Borrowed defaults to just &Output, and I don't think there's any place in Rust where you need to exhaustively specify all default and non-default associative types (?)

@burdges
Copy link

burdges commented Jul 11, 2020

In the end, index notation for bitsets and hashmap sounds nice, but should not be considered too important really, certainly not important enough to make reading code harder.

An unambiguous notation like foo[[bar]] for IndexGet/IndexAssign would resolves any confusion obviously. You cannot make traits mutually exclusive normally, but yeah anything this magic sure okay, but doing that still makes identifying the impl tricky. Would HashMap get one for Copy types and one drop types? At first blush this sounds messy..

Yes, I noticed the ATC solution would "break" generic types that delegate Index/IndexMut, but that's the only situation, right? Do any generic types that delegate Index/IndexMut exist now? I'd think otherwise the defaults for those types prevent breakage.

Also, any generic types that delegate Index/IndexMut should still work with all current types that impl Index/IndexMut, just not with new impls using guards, like bitset and hashmaps. It should therefore not break any binary builds, maybe acceptable.

Am I missing another breakage?

@Lokathor
Copy link
Contributor

One use case I've wanted, which might be similar to bit sets, is a way to "index" into a 2d slice (eg: image data) and get a 2d sub-slice.

I'm not sure if this would need to change design to support that, but I've also been told that we'd probably need some custom DST stuff in the language as well to make anything like that be practical.

@carbotaniuman
Copy link
Author

carbotaniuman commented Jul 11, 2020

Well, code like this would require authors to specify another type bound, as shown here.

I'd assume making the traits exclusive (if we were to go down that path) would be compiler magic, but I'm not sure what you mean by HashMap and Copy and drop types?

@burdges
Copy link

burdges commented Jul 11, 2020

Interesting, the ATC trick breaks any code that uses explicit Index* bounds, so the ATC solution requires separate traits too, maybe named IndexDeref* or IndexGuard*. These could be either exclusive vs Index*, or else be invoked by some new foo[[index]] notation. It's plausible some compiler magic fat pointer might work here, but it does not cover the other use cases.

I'd suspect any solution involving exclusive traits would be rejected just because rust worked so hard to avoid those in the past, so maybe we should be discussing the foo[[index]] notation more seriously?

I actually think two dimensional indexing sounds more valuable than indexing for bitsets or hashmap. If we'd some foo[[index]] notation that invoked some IndexGuard trait then you could have

pub struct Matrix<..> { .. }

/// Index into elements with self[row,col]
impl<..> Index<(usize,usize)> for Matrix<..> { .. }

/// Index into rows with self[row]
impl<..> Index<usize> for Matrix<..> { .. }

pub struct MatrixColumnGuard<'a,..> { .. }

/// Index into columns with self[[col]]
impl<..> IndexGuard<usize> for Matrix<..> { .. }

Amusingly, one could already make bitset work using existing dyn Trait fat pointers:

/// A pointer of unknown mutability, only accessible via double indirection `&MutMaybe` or `&mut MutMaybe`.
pub struct MutMaybe<T>(usize, PhantomData<*mut T>);

impl<T: ?Sized> Deref for MutMaybe<T> { .. }
impl<T: ?Sized> DerefMut for MutMaybe<T> { .. }

pub trait Bool :
    BitAnd<bool>+BitAndAssign<bool>
    +BitOr<bool>+BitOrAssign<bool>
    +BitXor<bool>+BitXorAssign<bool>
{
    fn get(&self) -> bool;
    fn set(&mut self, b: bool);
}
// Can we encode const vs mut without unsafe casts from 
pub struct BoolOffset<T,const OFFSET: usize>(MutMaybe<T>);

impl<T,const OFFSET: usize> Bit* for BoolOffset<T,OFFSET> { .. }
impl<T,const OFFSET: usize> Bool for BoolOffset<T,OFFSET> { .. }

impl Index<Idx> for BitSet {
    type Output: dyn Bool;
    fn index(&self, index: Idx) -> &'a dyn Bool { .. }
}
pub trait IndexMut<Idx> for BitSet {
    fn index_mut(&mut self, index: Idx) -> &'a mut dyn Bool { .. }
}

We cannot play exactly this trick for HashMap or Matrix because they cannot encode their indexed value with type level information, but I think even more unsafe code could make those work:

pub trait CrazyEntry {}

// Wild unsafe methods that butcher the fat pointer into the desired reference, do assignments, etc.
impl<'a> &dyn CrazyEntry { .. }
impl<'a> &mut dyn CrazyEntry { .. }

impl<K,V,R> Index<K> for HashMap<K,V,R> {
    type Output: dyn CrazyEntry;
    fn index(&self, index: K) -> &'a dyn CrazyEntry { .. }
}
...

In any case, there was considerable discussion and RFCs around doing DerefMove, etc. so maybe those should be revisited before doing anything here, although I think some foo[[index]] notation that returns guards sounds moderately future proof.

@carbotaniuman
Copy link
Author

carbotaniuman commented Jul 11, 2020

What would your IndexGuard trait look like? I'm also uncertain of how your CrazyEntry would let map[key] = value work, which is one of the main points of this proposal. Could you provide a playground link of what you mean?

The main issue is that none of the existing Index traits use GATs, which kind of preclude 2d subslicing. I think if we had a design with GATs (this was hastily sketched up) we could support it, but that would basically be a complete overhaul of how indexing works right now.

The main reason I don't want to just add <'a> to IndexGet without changing the others is because there's no reason the normal Index traits shouldn't get it too, or else we'll just see a proliferation of returning structs that implement Deref and DerefMut. There also should be some compiler check to not return references in IndexGet, but that's besides the point.

The more I think about it though, IndexGet really is just IndexGuard by a different name. I'm unsure of what the purpose of the [[]] is, just a more explicit way of knowing whether you're taking by value or by reference?

@cuviper
Copy link
Member

cuviper commented Jul 11, 2020

I think today, foo[[index]] would try to use impl Index<[T; 1]> for Foo, which could plausibly exist.

@burdges
Copy link

burdges commented Jul 11, 2020

Yes, I only proposed foo[[index]] so that traits could avoid the confusion caused by overlap rules, but yeah impl Index<[T; 1]> for Foo breaks this. :(

I no longer think CrazyEntry works, which sounds unsurprising, so dyn Bool looks unique.

Edit: We'd want hm[key] to return some dyn CrazyEntry where impl CrazyEntry for OccupiedEntry and impl CrazyEntry for VacantEntry, but doing so requires that OccupiedEntry and VacantEntry be nothing but table buckets, which requires inserting the key into the table during the hm[key]. I think worse this breaks hashmaps tracking the keys and values in separate tables, which harms performance unacceptably.

@yellowsquid
Copy link

If Rust had associated types with generic lifetime parameters, the Index and IndexGet problem could be trivially solved with a blanket impl:

impl<T: Index<Idx>, Idx> IndexGet<Idx> for T {
    type Output<'a> = &'a <T as Index<Idx>>::Output;
    fn index_get(&'a self, idx: Idx) -> Self::Output<'a> {
        self.index(idx)
    }
}

You can do the same for IndexSet as well. That way, the compiler can switch to only use index_get and index_set exclusively. It also avoids any issues with new syntax.

Feel free to point out all the problems with this approach. I will admit I didn't read all of the proposal before commenting this, but I didn't see any mention during a skim.

@burdges
Copy link

burdges commented Jul 12, 2020

We dropped another question up thread: Is doing an assignment with hm[key] = value; ever wise for hashmaps?

It appears no because it does not propagate errors correctly. hm.insert(key,value); ignores the returned option, which rust permit by Option not being #[must_use], but code should normally be assert!( hm.insert(key,value).is_noce() ); or hm.insert(key,value).ok_or(SomeError) ?; or hm.entry(key)...

It's maybe best not to give users a "more ergonomic" option that discourages error handling. I felt hm[key] should return roughly an Entry for exactly this reason.

@carbotaniuman
Copy link
Author

I mean vec[index] = val already panics if index is invalid, and so does &map[key]. The pattern if Index* panicking is already baked into the standard library. It feels like a holdover from C++, but I think it's too late to change that.

I've backtracked on my claim that we would need to GAT the rest of the traits, a simple GAT in the associated type of IndexGet would be good enough. I tried doing what @yellowsquid was saying, but I ran into obtuse lifetime errors. There's no reason this shouldn't work, but I don't know if I need HRTBs or if the current GAT impl just doesn't support this.

@Lokathor
Copy link
Contributor

hm[k] = v; should absolutely allow assignment. Both as a way to assign a new key with a value or as a way to replace an old value at that key.

If I just write hm[k] = v; I clearly want to just make that key be that value and I don't care what the old value is. If you need something more complicated, like checking that you never overwrite a key, then you can call the other methods.

@clarfonthey
Copy link
Contributor

clarfonthey commented Jul 13, 2020

IndexSet for Vec could allow inserting at Vec::len as a push.

I dunno, I think that assigning index is pretty idiomatic in a lot of cases. If you want to cover all cases, something like Entry is better, but it seems natural especially for prototyping.

@carbotaniuman
Copy link
Author

So after looking at #159's Alternatives block, I came up with this design that should solve most of @burdges concerns regarding mutually exclusive traits as well as compelx reasoning. It's use of a GAT also makes 2d subslicing possible, but it comes with a plethora of Sized and lifetime issues.

@carbotaniuman
Copy link
Author

carbotaniuman commented Jul 14, 2020

pub trait IndexMut<Idx: ?Sized>: Index<Idx> {
    fn index_mut(&mut self, index: Idx) -> &mut Self::Output;
    
    fn index_set(&mut self, index: Idx, value: Self::Output)
    where
        <Self as Index<Idx>>::Output: Sized,
    {
        *self.index_mut(index) = value;
    }
}

pub trait IndexGet<Idx: ?Sized> {
    type Output<'a>: ?Sized;
    fn index_get(&'a self, index: Idx) -> Self::Output;
}

pub  trait IndexSet<Idx: ?Sized> {
    type Input: ?Sized;
    fn index_set(&mut self, index: Idx, value: Self::Input);
}

We get around complex reasoning by making the desugaring obvious. We can get around mutually exclusive traits by simply relying on coherence and blanket impls. This design allows for 2d subslicing and other use cases, in addition to the simple ones.

I think we could even propose the IndexMut change first and follow up with the the others once GATs are more stable.

@burdges
Copy link

burdges commented Jul 14, 2020

How does the compiler decide between the methods in IndexMut?

@carbotaniuman
Copy link
Author

carbotaniuman commented Jul 14, 2020

&mut container[key] desugars to index_mut, container[key] = value desugars to the index_set.

@burdges
Copy link

burdges commented Jul 14, 2020

Is *container[key] still valid syntax?

@carbotaniuman
Copy link
Author

No? Well, if Self::Output is Deref I guess.

@coolreader18
Copy link

Wait, but something like vec[idx] = foo works now; would that be changed to still use IndexSet? Wouldn't that break backwards compatibility?

@carbotaniuman
Copy link
Author

There would be an blanket forwarding impl for IndexSet. I took a stab at IndexGet and imo it's not possible to implement it without exclusive traits or the overload mechanism in this RFC.

@pickfire
Copy link
Contributor

pickfire commented Jul 16, 2020

I still don't quite understand the motivation on why Index and IndexMut is not necessary. Can there an example in motivation section to show case why it is not enough and how IndexGet and IndexSet can solve this?

@carbotaniuman
Copy link
Author

@pickfire Some examples are in the RFC, but it's for cases where returning a reference is impossible.

@burdges
Copy link

burdges commented Jul 17, 2020

At this point your rules ensure that any existing code still compiles with the existing reference methods, right? If not, then we should watch for performance regressions due to using moves over copies? At the extreme one might consider Copy bounds for some new methods, which breaks doing this for hashmap I guess.

@carbotaniuman
Copy link
Author

Yup, without any user-side changes everything will desguar identically.

@pickfire
Copy link
Contributor

pickfire commented Jul 18, 2020

So IndexGet and IndexSet can have impl IndexGet<u8> for T and impl IndexGet<usize> for T at the same time?

@burdges
Copy link

burdges commented Jul 18, 2020

Yes. In principle, we might exploit that either to avoid the if-else trait matching logic, and/or for fancier interfaces, ala

impl<..> Index<(usize,usize)> for Matrix<..> {
    type Output = ..;
}
impl<..> Index<Row<usize>> for Matrix<..> {
    type Output = [..];
}
impl<..> Index??<Column<usize>> for Matrix<..> {
    type Output<'a> = MatrixColumn<'a,..>;  // Some ATC guard type for accessing the column
}

We've discussed such options relatively well here so far, although a bit more exploration maybe warranted.

All these ideas incur some soundness risks too because they interact with unsafe code everywhere. At first blush, I'd think the if-else trait matching logic incurs less soundness risks than using ATCs, which probably still invoke some even more flexible form of that logic on the ATC itself, but unclear.

@burdges
Copy link

burdges commented Jul 18, 2020

We could explore the custom fat pointer suggestion by @clarfon slightly more..

I showed up thread in #2953 (comment) that bitsets could work without IndexGet etc. by encoding offsets into the trait objects &dyn Bool defined by some type:

pub struct BoolOffset<T,const OFFSET: usize>(MutMaybe<T>);
impl<T,const OFFSET: usize> Bool for BoolOffset<T,OFFSET> { .. }

I think const generics can encode any index at the type level so long as it is foreseeable at compile time, but that last restriction prevents doing this trick for hashmap.

At the same time, we do risk a massive code size explosion from const generics being used with trait objects, which might require expanding our fat pointer repertoire eventually. In other words, should/does rust reject making a trait object from &BoolOffset? If yes, should rust provide some mechanism for passing along those type parameters as values at runtime? Some 24byte &dyn<usize> Bool trait obejct? If yes, then this might provide our cleanest solution because you could push the borrowed vs moved key question outwards. Anything like this sounds very far away obviously.

Impl<'a,K,V,R> Index<&'a K> for HashMap<K,V,R> {
    type Output = dyn<&'a K> RefEntry<K,V>;  // Includes only a reference to the index
    fn index(&self, index: &'a K) -> &Self::Output;
}
Impl<'a,K,V,R> Index<&'a K> for HashMap<K,V,R> {
    fn index_mut(&mut self, index: &'a K) -> &Self::Output;
}
Impl<K,V,R> Index<K> for HashMap<K,V,R> {
    type Output = dyn<K> MoveEntry<K,V>;  // Includes the moved index
    fn index(&self, index: K) -> &Self::Output;
}
Impl<K,V,R> Index<K> for HashMap<K,V,R> {
    fn index_mut(&mut self, index: K) -> &Self::Output;
}

I'd some thoughts about Index: Copy too, but never made sense of them..

cc #1403

@carbotaniuman
Copy link
Author

carbotaniuman commented Jul 19, 2020

One problem with this proposal is this bakes into IndexSet that Rust does not allow assignments to ?Unsized, but eddyb mentioned to me on Zulip that it would be impossible because you don't know you the concrete type for a trait object.

I suspect you've read #2594 and all of it's precursors, some of which sound a lot like what you're trying to do.

@clarfonthey
Copy link
Contributor

FWIW I'm in favour of having assignment be a method on IndexMut if it can be done right. It conveys the fact that IndexMut in a sense implies that you can set values whether you override it or not, whereas something like AddAssign would prevent you from using the operator if it doesn't exist.

@carbotaniuman
Copy link
Author

One thing I've still been debating is the GAT on IndexGet. Is there a purpose for the GAT that would not be better served by #2594? @burdges looking for your thoughts on this. The main thjng it would allow you to do is return a reference to &self, but the lack of proper mut support do hamper it, GATs are still a ways away, and custom DSTs are arguably the better move.

I guess the main question is does a GAT here open up anything custok DSTs won't?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants