Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Stabilize the std::hash module #20654

Merged
merged 1 commit into from
Jan 8, 2015

Conversation

alexcrichton
Copy link
Member

This commit aims to prepare the std::hash module for alpha by formalizing its
current interface whileholding off on adding #[stable] to the new APIs. The
current usage with the HashMap and HashSet types is also reconciled by
separating out composable parts of the design. The primary goal of this slight
redesign is to separate the concepts of a hasher's state from a hashing
algorithm itself.

The primary change of this commit is to separate the Hasher trait into a
Hasher and a HashState trait. Conceptually the old Hasher trait was
actually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old Hasher trait was
actually fairly unrelated to hashing.

This commit redesigns the existing Hasher trait to match what the notion of a
Hasher normally implies with the following definition:

trait Hasher {
    type Output;
    fn reset(&mut self);
    fn finish(&self) -> Output;
}

This Hasher trait emphasizes that hashing algorithms may produce outputs other
than a u64, so the output type is made generic. Other than that, however, very
little is assumed about a particular hasher. It is left up to implementors to
provide specific methods or trait implementations to feed data into a hasher.

The corresponding Hash trait becomes:

trait Hash<H: Hasher> {
    fn hash(&self, &mut H);
}

The old default of SipState was removed from this trait as it's not something
that we're willing to stabilize until the end of time, but the type parameter is
always required to implement Hasher. Note that the type parameter H remains
on the trait to enable multidispatch for specialization of hashing for
particular hashers.

Note that Writer is not mentioned in either of Hash or Hasher, it is
simply used as part derive and the implementations for all primitive types.

With these definitions, the old Hasher trait is realized as a new HashState
trait in the collections::hash_state module as an unstable addition for
now. The current definition looks like:

trait HashState {
    type Hasher: Hasher;
    fn hasher(&self) -> Hasher;
}

The purpose of this trait is to emphasize that the one piece of functionality
for implementors is that new instances of Hasher can be created. This
conceptually represents the two keys from which more instances of a
SipHasher can be created, and a HashState is what's stored in a
HashMap, not a Hasher.

Implementors of custom hash algorithms should implement the Hasher trait, and
only hash algorithms intended for use in hash maps need to implement or worry
about the HashState trait.

The entire module and HashState infrastructure remains #[unstable] due to it
being recently redesigned, but some other stability decision made for the
std::hash module are:

  • The Writer trait remains #[experimental] as it's intended to be replaced
    with an io::Writer (more details soon).
  • The top-level hash function is #[unstable] as it is intended to be generic
    over the hashing algorithm instead of hardwired to SipHasher
  • The inner sip module is now private as its one export, SipHasher is
    reexported in the hash module.

And finally, a few changes were made to the default parameters on HashMap.

  • The RandomSipHasher default type parameter was renamed to RandomState.
    This renaming emphasizes that it is not a hasher, but rather just state to
    generate hashers. It also moves away from the name "sip" as it may not always
    be implemented as SipHasher. This type lives in the
    std::collections::hash_map module as #[unstable]
  • The associated Hasher type of RandomState is creatively called...
    Hasher! This concrete structure lives next to RandomState as an
    implemenation of the "default hashing algorithm" used for a HashMap. Under
    the hood this is currently implemented as SipHasher, but it draws an
    explicit interface for now and allows us to modify the implementation over
    time if necessary.

There are many breaking changes outlined above, and as a result this commit is
a:

[breaking-change]

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

This is a reopening of #19673, and does not currently yet build as it is waiting on a new snapshot. The first commit, however, was created in preparation for a new snapshot.

r? @aturon

@Gankra
Copy link
Contributor

Gankra commented Jan 6, 2015

CC @pczarn @thestinger @cgaebel

@cgaebel
Copy link
Contributor

cgaebel commented Jan 6, 2015

HashMap changes LGTM

@pczarn
Copy link
Contributor

pczarn commented Jan 6, 2015

LGTM.

@alexcrichton alexcrichton force-pushed the stabilize-hash branch 2 times, most recently from 6a74f96 to 71e522e Compare January 7, 2015 09:06
@alexcrichton
Copy link
Member Author

The commit message just gets longer! I've rebased with as many goodies as possible (those that have been implemented). This removes all the #[old_impl_check] tags related to hashing now that associated types are leveraged.

I've purged "sip" as a name from everything other than SipHasher itself, updated the commit message/PR description, and I think that this is ready to go.

r? @aturon

@pczarn
Copy link
Contributor

pczarn commented Jan 7, 2015

continuing from #19673

When it comes to deriving, unless we hard-code one specific algorithm, I don't think we have any option other than incrementally updating a hash (e.g. just calling .hash(state) on all members).

@alexcrichton That's right, such an algorithm would have to be aware of generics. There is another approach to this problem: attempting to optimize these calls to .hash(state) while still doing them for all members.

For example the &[u8] type likely cannot hash well due to the lack of specialization implementations, whereas the &str type can likely hash very well due to actually being one call to .write().

I think the impl of Hash for &str has two calls to .write().

I'm sorry I didn't quite follow your comment, but are you basically saying that requiring .write() + .finish() is imposing overhead, even in the &str case for example? Are you also saying that this is not inherent to the SipHasher implementation, but rather inherent to the design of the hashing traits?

Perhaps using .finish() imposes inevitable overhead, but I wouldn't worry much about it. In my comment, I tried to make an implementation that would do much less work incrementally, yes, because @thestinger didn't support his ideas with a concrete design. Nothing came out of that. I have overestimated LLVM's understanding of pointer arithmetic optimization.

@alexcrichton
Copy link
Member Author

I think the impl of Hash for &str has two calls to .write().

Ah yes good point, I forgot about the byte we include at the end!

Perhaps using .finish() imposes inevitable overhead, but I wouldn't worry much about it.

Thanks for clarifying!

impl<S: hash::Writer, T: Hash<S>> Hash<S> for Rc<T> {
#[inline]
fn hash(&self, state: &mut S) {
(**self).hash(state);
}
}
#[cfg(not(stage0))]
impl<S: hash::Writer + hash::Hasher, T: Hash<S>> Hash<S> for Rc<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need the hash::Writer constraint here any longer?

#[old_impl_check]
impl<T: Eq + Hash<S>, S, H: Hasher<S> + Default> Default for HashSet<T, H> {
impl<T, S> Default for HashSet<T, S>
where T: Eq + Hash<<S as HashState>::Hasher>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion re: factoring out an H with S: HashState<Hasher=H>

@alexcrichton alexcrichton force-pushed the stabilize-hash branch 4 times, most recently from 07ee0f4 to aae9a28 Compare January 7, 2015 19:23
@alexcrichton
Copy link
Member Author

Ok, I believe I took care of all unnecessary Writer bounds as well as the formatting in map.rs/set.rs. I left a bunch of Writer bounds because the implementations require it one way or another (they might hash a primitive, for example), but implementations like pointers just forward the H parameter though.

@aturon
Copy link
Member

aturon commented Jan 7, 2015

Awesome work, @alexcrichton

@alexcrichton alexcrichton force-pushed the stabilize-hash branch 2 times, most recently from 4ce42a1 to 11fa2ec Compare January 7, 2015 20:15
This commit aims to prepare the `std::hash` module for alpha by formalizing its
current interface whileholding off on adding `#[stable]` to the new APIs.  The
current usage with the `HashMap` and `HashSet` types is also reconciled by
separating out composable parts of the design. The primary goal of this slight
redesign is to separate the concepts of a hasher's state from a hashing
algorithm itself.

The primary change of this commit is to separate the `Hasher` trait into a
`Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was
actually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old `Hasher` trait was
actually fairly unrelated to hashing.

This commit redesigns the existing `Hasher` trait to match what the notion of a
`Hasher` normally implies with the following definition:

    trait Hasher {
        type Output;
        fn reset(&mut self);
        fn finish(&self) -> Output;
    }

This `Hasher` trait emphasizes that hashing algorithms may produce outputs other
than a `u64`, so the output type is made generic. Other than that, however, very
little is assumed about a particular hasher. It is left up to implementors to
provide specific methods or trait implementations to feed data into a hasher.

The corresponding `Hash` trait becomes:

    trait Hash<H: Hasher> {
        fn hash(&self, &mut H);
    }

The old default of `SipState` was removed from this trait as it's not something
that we're willing to stabilize until the end of time, but the type parameter is
always required to implement `Hasher`. Note that the type parameter `H` remains
on the trait to enable multidispatch for specialization of hashing for
particular hashers.

Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is
simply used as part `derive` and the implementations for all primitive types.

With these definitions, the old `Hasher` trait is realized as a new `HashState`
trait in the `collections::hash_state` module as an unstable addition for
now. The current definition looks like:

    trait HashState {
        type Hasher: Hasher;
        fn hasher(&self) -> Hasher;
    }

The purpose of this trait is to emphasize that the one piece of functionality
for implementors is that new instances of `Hasher` can be created.  This
conceptually represents the two keys from which more instances of a
`SipHasher` can be created, and a `HashState` is what's stored in a
`HashMap`, not a `Hasher`.

Implementors of custom hash algorithms should implement the `Hasher` trait, and
only hash algorithms intended for use in hash maps need to implement or worry
about the `HashState` trait.

The entire module and `HashState` infrastructure remains `#[unstable]` due to it
being recently redesigned, but some other stability decision made for the
`std::hash` module are:

* The `Writer` trait remains `#[experimental]` as it's intended to be replaced
  with an `io::Writer` (more details soon).
* The top-level `hash` function is `#[unstable]` as it is intended to be generic
  over the hashing algorithm instead of hardwired to `SipHasher`
* The inner `sip` module is now private as its one export, `SipHasher` is
  reexported in the `hash` module.

And finally, a few changes were made to the default parameters on `HashMap`.

* The `RandomSipHasher` default type parameter was renamed to `RandomState`.
  This renaming emphasizes that it is not a hasher, but rather just state to
  generate hashers. It also moves away from the name "sip" as it may not always
  be implemented as `SipHasher`. This type lives in the
  `std::collections::hash_map` module as `#[unstable]`

* The associated `Hasher` type of `RandomState` is creatively called...
  `Hasher`! This concrete structure lives next to `RandomState` as an
  implemenation of the "default hashing algorithm" used for a `HashMap`. Under
  the hood this is currently implemented as `SipHasher`, but it draws an
  explicit interface for now and allows us to modify the implementation over
  time if necessary.

There are many breaking changes outlined above, and as a result this commit is
a:

[breaking-change]
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Jan 8, 2015
This commit aims to prepare the `std::hash` module for alpha by formalizing its
current interface whileholding off on adding `#[stable]` to the new APIs.  The
current usage with the `HashMap` and `HashSet` types is also reconciled by
separating out composable parts of the design. The primary goal of this slight
redesign is to separate the concepts of a hasher's state from a hashing
algorithm itself.

The primary change of this commit is to separate the `Hasher` trait into a
`Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was
actually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old `Hasher` trait was
actually fairly unrelated to hashing.

This commit redesigns the existing `Hasher` trait to match what the notion of a
`Hasher` normally implies with the following definition:

    trait Hasher {
        type Output;
        fn reset(&mut self);
        fn finish(&self) -> Output;
    }

This `Hasher` trait emphasizes that hashing algorithms may produce outputs other
than a `u64`, so the output type is made generic. Other than that, however, very
little is assumed about a particular hasher. It is left up to implementors to
provide specific methods or trait implementations to feed data into a hasher.

The corresponding `Hash` trait becomes:

    trait Hash<H: Hasher> {
        fn hash(&self, &mut H);
    }

The old default of `SipState` was removed from this trait as it's not something
that we're willing to stabilize until the end of time, but the type parameter is
always required to implement `Hasher`. Note that the type parameter `H` remains
on the trait to enable multidispatch for specialization of hashing for
particular hashers.

Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is
simply used as part `derive` and the implementations for all primitive types.

With these definitions, the old `Hasher` trait is realized as a new `HashState`
trait in the `collections::hash_state` module as an unstable addition for
now. The current definition looks like:

    trait HashState {
        type Hasher: Hasher;
        fn hasher(&self) -> Hasher;
    }

The purpose of this trait is to emphasize that the one piece of functionality
for implementors is that new instances of `Hasher` can be created.  This
conceptually represents the two keys from which more instances of a
`SipHasher` can be created, and a `HashState` is what's stored in a
`HashMap`, not a `Hasher`.

Implementors of custom hash algorithms should implement the `Hasher` trait, and
only hash algorithms intended for use in hash maps need to implement or worry
about the `HashState` trait.

The entire module and `HashState` infrastructure remains `#[unstable]` due to it
being recently redesigned, but some other stability decision made for the
`std::hash` module are:

* The `Writer` trait remains `#[experimental]` as it's intended to be replaced
  with an `io::Writer` (more details soon).
* The top-level `hash` function is `#[unstable]` as it is intended to be generic
  over the hashing algorithm instead of hardwired to `SipHasher`
* The inner `sip` module is now private as its one export, `SipHasher` is
  reexported in the `hash` module.

And finally, a few changes were made to the default parameters on `HashMap`.

* The `RandomSipHasher` default type parameter was renamed to `RandomState`.
  This renaming emphasizes that it is not a hasher, but rather just state to
  generate hashers. It also moves away from the name "sip" as it may not always
  be implemented as `SipHasher`. This type lives in the
  `std::collections::hash_map` module as `#[unstable]`

* The associated `Hasher` type of `RandomState` is creatively called...
  `Hasher`! This concrete structure lives next to `RandomState` as an
  implemenation of the "default hashing algorithm" used for a `HashMap`. Under
  the hood this is currently implemented as `SipHasher`, but it draws an
  explicit interface for now and allows us to modify the implementation over
  time if necessary.

There are many breaking changes outlined above, and as a result this commit is
a:

[breaking-change]
bors added a commit that referenced this pull request Jan 8, 2015
This commit aims to prepare the `std::hash` module for alpha by formalizing its
current interface whileholding off on adding `#[stable]` to the new APIs.  The
current usage with the `HashMap` and `HashSet` types is also reconciled by
separating out composable parts of the design. The primary goal of this slight
redesign is to separate the concepts of a hasher's state from a hashing
algorithm itself.

The primary change of this commit is to separate the `Hasher` trait into a
`Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was
actually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old `Hasher` trait was
actually fairly unrelated to hashing.

This commit redesigns the existing `Hasher` trait to match what the notion of a
`Hasher` normally implies with the following definition:

    trait Hasher {
        type Output;
        fn reset(&mut self);
        fn finish(&self) -> Output;
    }

This `Hasher` trait emphasizes that hashing algorithms may produce outputs other
than a `u64`, so the output type is made generic. Other than that, however, very
little is assumed about a particular hasher. It is left up to implementors to
provide specific methods or trait implementations to feed data into a hasher.

The corresponding `Hash` trait becomes:

    trait Hash<H: Hasher> {
        fn hash(&self, &mut H);
    }

The old default of `SipState` was removed from this trait as it's not something
that we're willing to stabilize until the end of time, but the type parameter is
always required to implement `Hasher`. Note that the type parameter `H` remains
on the trait to enable multidispatch for specialization of hashing for
particular hashers.

Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is
simply used as part `derive` and the implementations for all primitive types.

With these definitions, the old `Hasher` trait is realized as a new `HashState`
trait in the `collections::hash_state` module as an unstable addition for
now. The current definition looks like:

    trait HashState {
        type Hasher: Hasher;
        fn hasher(&self) -> Hasher;
    }

The purpose of this trait is to emphasize that the one piece of functionality
for implementors is that new instances of `Hasher` can be created.  This
conceptually represents the two keys from which more instances of a
`SipHasher` can be created, and a `HashState` is what's stored in a
`HashMap`, not a `Hasher`.

Implementors of custom hash algorithms should implement the `Hasher` trait, and
only hash algorithms intended for use in hash maps need to implement or worry
about the `HashState` trait.

The entire module and `HashState` infrastructure remains `#[unstable]` due to it
being recently redesigned, but some other stability decision made for the
`std::hash` module are:

* The `Writer` trait remains `#[experimental]` as it's intended to be replaced
  with an `io::Writer` (more details soon).
* The top-level `hash` function is `#[unstable]` as it is intended to be generic
  over the hashing algorithm instead of hardwired to `SipHasher`
* The inner `sip` module is now private as its one export, `SipHasher` is
  reexported in the `hash` module.

And finally, a few changes were made to the default parameters on `HashMap`.

* The `RandomSipHasher` default type parameter was renamed to `RandomState`.
  This renaming emphasizes that it is not a hasher, but rather just state to
  generate hashers. It also moves away from the name "sip" as it may not always
  be implemented as `SipHasher`. This type lives in the
  `std::collections::hash_map` module as `#[unstable]`

* The associated `Hasher` type of `RandomState` is creatively called...
  `Hasher`! This concrete structure lives next to `RandomState` as an
  implemenation of the "default hashing algorithm" used for a `HashMap`. Under
  the hood this is currently implemented as `SipHasher`, but it draws an
  explicit interface for now and allows us to modify the implementation over
  time if necessary.

There are many breaking changes outlined above, and as a result this commit is
a:

[breaking-change]
@bors bors merged commit 511f0b8 into rust-lang:master Jan 8, 2015
@alexcrichton alexcrichton deleted the stabilize-hash branch January 8, 2015 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants