Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reservation-based methods #12

Open
jonhoo opened this issue Jan 23, 2020 · 6 comments
Open

Add reservation-based methods #12

jonhoo opened this issue Jan 23, 2020 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jonhoo
Copy link
Owner

jonhoo commented Jan 23, 2020

Add computeIfAbsent, computeIfPresent, compute

I have a suspicion that ReservationNode could also be used to implement an Entry-API like the one on std::collections::HashMap, so that may be worth looking into.

@jonhoo jonhoo added enhancement New feature or request help wanted Extra attention is needed labels Jan 23, 2020
@voidc
Copy link
Contributor

voidc commented Jan 25, 2020

I started working on this in #32. For now, only compute_if_present is implemented.

@jhinch
Copy link
Contributor

jhinch commented Jan 28, 2020

I would like to propose the following for the entry API:

impl<K, V, S> HashMap<K, V, S> {
    pub fn entry<'g>(&'g self, key: K, guard: &'g Guard) -> Entry<'g, K, V, S> {
        todo!()
    }
}

pub enum Entry<'g, K, V, S> {
    Vacant(VacantEntry<'g, K, V, S>),
    Occupied(OccupiedEntry<'g, K, V, S>),
}

pub struct VacantEntry<'g, K, V, S> {
    _todo: PhantomData<('g, K, V, S)>,
}

pub struct OccupiedEntry<'g, K, V, S> {
    _todo: PhantomData<('g, K, V, S)>,
}

impl<'g, K, V, S> Entry<'g, K, V, S> {
    pub fn or_insert(self, default: V) -> &'g V {
        todo!()
    }

    pub fn or_insert_with<F: FnOnce() -> V>(self, default: F) -> &'g V {
        todo!()
    }

    pub fn key(&'g self) -> &'g K {
        todo!()
    }

    pub fn and_modify<F: FnOnce(&'g V) -> V>(self, f: F) -> Self {
        todo!()
    }
}

impl<'g, K, V: Default, S> Entry<'g, K, V, S> {
    pub fn or_default(self) -> &'g V {
        todo!()
    }
}

impl<'g, K, V, S> VacantEntry<'g, K, V, S> {
    pub fn key(&self) -> &K {
        todo!()
    }

    pub fn into_key(self) -> K {
        todo!()
    }

    pub fn insert(self, value: V) -> &'g V {
        todo!()
    }
}

impl<'g, K, V, S> OccupiedEntry<'g, K, V, S> {
    pub fn key(&self) -> &K {
        todo!()
    }

    pub fn remove_entry(self) -> (&'g K, &'g V) {
        todo!()
    }

    pub fn get(&self) -> &'g V {
        todo!()
    }

    pub fn insert(&mut self, value: V) -> &'g V {
        todo!()
    }

    pub fn remove(self) -> &'g V {
        todo!()
    }
}

Notable changes from the std::collections::HashMap API:

  • the entry method takes a guard
  • All value references returned from Entry are immutable. It is never safe to share a mutable reference for the flurry hashmap
  • and_modify takes a FnOnce(&V) -> V the function type instead of a FnOnce(&mut V) for the same reasons as above

The API shape would imply that the lock is held for both the vacant and occupied cases for the entirety of the Entry being in scope. This does mean that a poorly written could possibly deadlock itself:

let guard = &epoch::pin();
let map = HashMap::<usize, usize>::new();
let entry = map.entry(42, guard);
let entry2 = map.entry(42, guard); // this would deadlock

I don't have a good suggestion on how to prevent this with the API above. An alternative is to make the entry API 'lazy' and only actually interact with the map when the methods of the Entry are called. This would have the benefit of entry(key).or_insert(value) being able to be implemented using the existing put method with no_replacement set to true and allow for a lock-free insertion. But it does have the downside of the entry API being quite different to that of the regular hashmap or btreemap in std. One way to workaround this with the API design above is to also provide an insert_if_absent API on the flurry hashmap as a more efficient alternative to the entry API.

I thought I would post the API contract in here first instead of getting too far into an implementation as I think the API is important to get right.

@jonhoo
Copy link
Owner Author

jonhoo commented Jan 28, 2020

I like the suggestion and the thinking behind it. It's very similar to what I imagined. The point about deadlock is a good one though, and I worry that it might become an unexpectedly large problem since we have to lock the entire bin, not just the one key... That said, that is what the Java code does, so it sounds like it's still (possibly) a reasonable use-case. We just need to make sure we document that behavior well. We could also perhaps find a way to name the top-level method such that it becomes clear that it might deadlock. Something like entry_exclusive maybe? I'm not sure. It'd be really good if there was a way for us to indicate this to the user at a type level.

As for the suggestion to make Entry lazy, I like that as a separate API. It would also take a Guard, and it would find the bucket on creation, but then just remember the table at the time, the bucket i, and the current key/value rather than hold a reference to the bucket or node directly. Then it'd re-do the "last" part of the search (which may require crossing BinEntry::Moved) when the actual insert happens. Documenting the semantics of that will also be its own challenge.

As an aside, I took a look into the Java code, and it turns out that ReservationNode is actually only used for if someone uses something like Entry or compute_if_absent and the bin they are accessing is currently empty. That shouldn't be too hard to port, but it also means that they aren't that useful to us. Notice that it takes the lock on the reservation node before it swaps it into the null bucket, so it's still holding a lock in that case.

@jhinch
Copy link
Contributor

jhinch commented Feb 1, 2020

I've been thinking about the entry API considerably over the last few days, thinking through the outstanding problems with the API proposed. To reiterate the issues outstanding as I see them, they are:

  • Locks are taken out on the entire bin (not just for the single BinEntry) and held by the Entry until the Entry is dropped
  • Misuse of the entry API can cause a deadlock on the current thread

IMO more sophisticated deadlocks scenarios which could be contrived which span coordination across multiple threads should not be considered in any solution and simply documented that they can occur. Due to the Entry requiring a hold on a Guard to work, it will automatically be !Send and !Sync so a user would need to go out of their way to deadlock in this situation.

I have not been able to think of a solution to the first problem. I think if there was a straight forward way to only allow the single BinEntry to be locked and still allow for inserts and removals, the Java ConcurrentHashMap would already be using it.

In terms of the misuse of the entry API, I don't think its possible to prevent via the type system but it could detect the misuse at runtime and panic using a reentrantlock.

Given these outstanding problems, I'm thinking it may be better to avoid the entry API, at least for now, and build APIs similar to those on the Java API:

impl HashMap {
    pub fn insert_if_absent<'g>(&'g self, key: K, value: V, guard: &'g Guard) -> Option<&'g V> {
        todo!()
    }
    pub fn compute_if_absent<'g, F: FnOnce() -> V>(&'g self, key: K, value: F, guard: &'g Guard) -> &'g V {
        todo!()
    }
    pub fn default_if_absent<'g>(&'g self, key: K, guard: &'g Guard) -> &'g V where V: Default {
        todo!()
    }
}

At least the insert_if_absent has a benefit for existing anyway irrespective of the entry API as it can allow for lock-free insertion under the happy path which the entry API cannot achieve.

@jonhoo
Copy link
Owner Author

jonhoo commented Feb 17, 2020

@jhinch Sorry it took me so long to get back to this -- yes, I agree with you, we should probably stick with the Java API for now. I think deadlocks are just as likely (people will end up doing something blocking in the compute_if_absent method), but it's maybe slightly less likely.

One alternative is to implement an internal entry-based API, and then for the time being just expose the Java-like versions. I think that should be about the same complexity.

@jhinch
Copy link
Contributor

jhinch commented Feb 18, 2020

No worries. Thanks for your response! Sounds like a plan then. I might take a look over the next few days. If no one hears from me over the next week, I likely haven't had the time and people should feel free to go ahead and pick this up.

jonhoo pushed a commit that referenced this issue Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants