Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a priority queue implementation built on top of Heap #51

Closed
wants to merge 9 commits into from

Conversation

AquaGeek
Copy link
Contributor

@AquaGeek AquaGeek commented May 28, 2021

Description

This introduces a double-ended PriorityQueue as discussed in #3.

Detailed Design

Heap (a min-max heap) is used as the storage. One of the big advantages of using a min-max heap is that we don't need to bifurcate into a min heap or max heap and keep track of which kind is used — you can pop min or max, as needed.

The main difference between Heap and PriorityQueue is that the latter separates the value from the comparable priority of it. This is useful in cases where a type doesn't conform to Comparable directly but it may have a property that does — e.g. Task.priority. It also keeps track of insertion order, so dequeueing of elements with the same priority should happen in FIFO order.

public struct PriorityQueue<Value, Priority: Comparable> {
    public typealias Element = (value: Value, priority: Priority)

    public var isEmpty: Bool
    public var count: Int

    // Initializers
    public init()
    public init<S: Sequence>(_ elements: S) where S.Element == Element
    public init(arrayLiteral elements: Element...)
    public init(dictionaryLiteral elements: (Value, Priority)...)

    // Mutations/querying
    public mutating func insert(_ value: Value, priority: Priority)
    public func min() -> Value?
    public func max() -> Value?
    public mutating func popMin() -> Value?
    public mutating func popMax() -> Value?
    public mutating func removeMin() -> Value
    public mutating func removeMax() -> Value
}

To-Do

  • Merge @AmanuelEphrem's Sequence conformance
  • Reformat the code to match the 2 space indent/80 char width style
  • Optimize _indexOfChildOrGrandchild(of:sortedUsing:) (probably by splitting it back out into 2 separate functions)
  • Chase down suggestion from @timvermeulen about skipping nodes that have children
  • Figure out if @inlinable is required
  • Add checks for heap invariants
  • Make a call on whether or not we want to add removeMin/Max
  • Make _minMaxHeapIsMinLevel an instance method
  • Merge @hassila's changes to make _bubbleUpMin(startingAt:) and _trickleDown(startingAt:) iterative instead of recursive
  • Add implementation of insert(contentsOf:)
  • Rebase this on top of the Heap implementation merged into main

Future Directions

  • Add support for merging two priority queues
  • Add replaceMin and replaceMax

Documentation

The public APIs have largely been documented. An overview document has been added to the Documentation directory.

Testing

There are unit tests for the added PriorityQueue type.

Performance

Performance tests have been added. @lorentey has a PR up (#76) which adds a std::priority_queue benchmark — it would be interesting to compare against that with more complex types.

We may want to revisit the library JSON to ensure we have the desired comparisons defined.

Source Impact

This is purely additive. No existing APIs were changed, deprecated, or removed.

Checklist

  • I've read the Contribution Guidelines
  • My contributions are licensed under the Swift license.
  • I've followed the coding style of the rest of the project.
  • I've added tests covering all new code paths my change adds to the project (to the extent possible).
  • I've added benchmarks covering new functionality (if appropriate).
  • I've verified that my change does not break any existing tests or introduce unexpected benchmark regressions.
  • I've updated the documentation (if appropriate).

@AquaGeek AquaGeek requested a review from lorentey as a code owner May 28, 2021 21:38
@kylemacomber
Copy link

This looks really good!

@timvermeulen Could you lend a second pair of eyes and review the heap logic?

The PriorityQueue type is a very thin wrapper on top of MinMaxHeap. I went through and renamed some of the functions in MinMaxHeap based on feedback @lorentey provided in #43; after doing so, it became an even thinner wrapper. Maybe it's not even needed?

I agree. I don't think the separate MinMaxHeap type is needed. Let's just have a single type called PriorityQueue!

It might be nice to add a few convenience APIs for bulk insertion/creation:

  • public init<S: Sequence>(_ elements: S) where S.Element == Element
  • public mutating func insert<S: Sequence>(contentsOf newElements: S) where S.Element == Element
  • ExpressibleByArrayLiteral conformance

I have not yet run the performance tests to ensure the performance matches what is expected.

I think this would be the most profitable next step to take. It'd be great to see how the performance of PriorityQueue compares to CFBinaryHeap!

@kylemacomber kylemacomber requested a review from timvermeulen June 1, 2021 19:00
@hassila
Copy link
Contributor

hassila commented Jun 2, 2021

Super to see this in progress! Just one quick initial comment/question while browsing, push/pop is documented as Complexity: O(log n) - although if skipping levels as mentioned in article referenced previously in #3 the time complexity should be O(log n) / 2 - different implementation or documentation?

@AquaGeek
Copy link
Contributor Author

AquaGeek commented Jun 3, 2021

Just one quick initial comment/question while browsing, push/pop is documented as Complexity: O(log n) - although if skipping levels as mentioned in article referenced previously in #3 the time complexity should be O(log n) / 2 - different implementation or documentation?

The implementation does skip levels (see _bubbleUpMin and _bubbleUpMax — they reach for the grandparent nodes). My understanding is that when using big-O notation coefficients are usually not included.

@hassila
Copy link
Contributor

hassila commented Jun 3, 2021

Ok, thanks for clarification!

@AquaGeek
Copy link
Contributor Author

AquaGeek commented Jun 3, 2021

In the case of Sequence conformance, how do we handle the double-endedness of the PriorityQueue? i.e. Should next() iterate from lowest to highest priority or vice-versa?

@kylemacomber
Copy link

In the case of Sequence conformance, how do we handle the double-endedness of the PriorityQueue? i.e. Should next() iterate from lowest to highest priority or vice-versa?

Maybe PriorityQueue shouldn't conform directly to Sequence but have (warning placeholder names!) a minHeap and maxHeap "view":

let q: PriorityQueue = ...

for elt in q.minHeap {
  // iterates from min to max
}

for elt in q.maxHeap {
  // iterates from max to min
}

wdyt?

@AquaGeek
Copy link
Contributor Author

AquaGeek commented Jun 4, 2021

In the case of Sequence conformance, how do we handle the double-endedness of the PriorityQueue? i.e. Should next() iterate from lowest to highest priority or vice-versa?

Maybe PriorityQueue shouldn't conform directly to Sequence but have (warning placeholder names!) a minHeap and maxHeap "view":

let q: PriorityQueue = ...

for elt in q.minHeap {
  // iterates from min to max
}

for elt in q.maxHeap {
  // iterates from max to min
}

wdyt?

Another approach I was mulling over is exposing separate functions for creating the iterator (e.g. makeLowToHighPriorityIterator and makeHighToLowPriorityIterator). That doesn't address the issue that will come up if somebody tries to iterate directly (though I guess that could be addressed by not conforming to Sequence directly).

I think exposing a view that conforms to Sequence makes more sense.

@hassila
Copy link
Contributor

hassila commented Jun 5, 2021

@kylemacomber
Copy link

https://developer.apple.com/documentation/swift/bidirectionalcollection ?

If we can provide a bidirectional collection view I think that'd work great:

for elt in q.sortedElements {
  // iterates from min to max
}

for elt in q.sortedElements.reversed() {
  // iterates from max to min
}

However, it wasn't clear to me if we're able to provide an efficient bidirectional view over the heap.

@AmanuelEphrem
Copy link
Contributor

If we can provide a bidirectional collection view I think that'd work great:

Would the priority queue be able to conform to the Collection protocol? Because min and max values are determined during the remove operation, the priority queue does not lend itself easily to random access of elements via indexes.

let q: PriorityQueue = ...

for elt in q.minHeap {
 // iterates from min to max
}

for elt in q.maxHeap {
// iterates from max to min
}```

I like this method of traversing the priority queue, as it is the least ambiguous.

That doesn't address the issue that will come up if somebody tries to iterate directly

In this case, the iterator could default to iterating from min to max (or max to min if that makes more sense)

@hassila
Copy link
Contributor

hassila commented Jun 5, 2021

I don’t think the iterator necessarily should default to any sorted order - it could just provide “some” order unless asked for sorted / reversed as suggested? Just trying to find reuse of existing protocols if possible if performant and natural (that being said, can’t say whether efficiency would be good - but just wanted to point out that there is an existing interface that might be useful) - that would also argue for popFirst/popLast methods for dequeueing from respective ends

@kylemacomber
Copy link

kylemacomber commented Jun 6, 2021

Because min and max values are determined during the remove operation, the priority queue does not lend itself easily to random access of elements via indexes.

This was my impression.

I don’t think the iterator necessarily should default to any sorted order - it could just provide “some” order unless asked for sorted / reversed as suggested?

I think both could be useful:

let q: PriorityQueue<Foo> = ...
let heap: [Element] = q.heap // direct access to the underlying heap array
for elt in q.minToMax { ... } // a sequence that iterates from min to max
for elt in q.maxToMin { ... } // a sequence that iterates from max to min

Direct access to the underlying heap array could be useful if want to efficiently hand off all the elements to an API that expects an Array if order isn't important.

@AmanuelEphrem
Copy link
Contributor

Direct access to the underlying heap array could be useful if want to efficiently hand off all the elements to an API that expects an Array if order isn't important

I feel that exposing the underlying heap to the developer would introduce unnecessary complexity, and a separate function could return an array view of the priority queue.

Also, because elements of the highest priority are at the front of the priority queue, another way to conform to the Sequence protocol would be like this

let q: PriorityQueue = ...

for elt in q {
 // iterates from max to min
}

for elt in q.reversed() {
// iterates from min to max
}

While I think this is the most straightforward way to traverse the priority queue, I'm not sure how intuitive it is for others.

@AquaGeek
Copy link
Contributor Author

AquaGeek commented Jun 7, 2021

I think conforming to Collection is out — I don't think there's a way to do this efficiently.

Direct access to the underlying heap array could be useful if want to efficiently hand off all the elements to an API that expects an Array if order isn't important

I feel that exposing the underlying heap to the developer would introduce unnecessary complexity, and a separate function could return an array view of the priority queue.

I think we should expose it as var unorderedElements: [Element].

Also, because elements of the highest priority are at the front of the priority queue[…]

Because we're using a min-max heap, the lowest-priority element is at the front of the backing array not the highest.

[…] another way to conform to the Sequence protocol would be like this

let q: PriorityQueue = ...

for elt in q {
 // iterates from max to min
}

for elt in q.reversed() {
// iterates from min to max
}

While I think this is the most straightforward way to traverse the priority queue, I'm not sure how intuitive it is for others.

I'm still in the camp of not adding conformance to Sequence to the priority queue type itself; I think that conformance belongs on the min and max views. I don't have good suggestions as far as naming those, though. I think calling them minHeap and maxHeap exposes some of the implementation details.

Meta question: does this sort of discussion belong in the forums or would we rather it happen here?

@AmanuelEphrem
Copy link
Contributor

I think that conformance belongs on the min and max views

I can get behind this idea as well!

I don't have good suggestions as far as naming those, though. I think calling them minHeap and maxHeap exposes some of the implementation details.

Maybe have something like this? The priority queue would not conform to the Sequence protocol, yet rather have whatever q.orderedIncreasing() and q.orderedDecreasing() returns to conform to Sequence.

 let q: PriorityQueue = ...
 
 for elt in q.orderedIncreasing() {
   // iterates from min to max
 }
 
 for elt in q.orderedDecreasing() {
   // iterates from max to min
 }

Assuming we are exposing the underlying heap as var unorderedElements: [Element], the naming of q.orderedIncreasing() and q.orderedDecreasing would follow in the same fashion.

For brevity, however, it might be better just to have it as q.increasing() and q.decreasing() as saying "increasing"/"decreasing" already implies that it is ordered.

Meta question: does this sort of discussion belong in the forums or would we rather it happen here?

I think discussion here is better, as we would be able to see code changes and implementation suggestions all in one place.

@pyrtsa
Copy link

pyrtsa commented Jun 8, 2021

Other possible naming options: ascendingElements/descendingElements, or increasingElements/decreasingElements. I think the former of these two could go better together with the Foundation type ComparisonResult with its cases orderedAscending etc.

@kylemacomber
Copy link

Other possible naming options: ascendingElements/descendingElements, or increasingElements/decreasingElements. I think the former of these two could go better together with the Foundation type ComparisonResult with its cases orderedAscending etc.

I also like riffing off of ascending and descending. I think we can get away with dropping the "elements" suffix and parenthesis. For example, OrderedSet has an unordered view.

That would leave us with:

let q: PriorityQueue<Element> = ...
let heap: [Element] = q.unordered // direct read-only access to the underlying heap array
for elt in q.ascending { ... } // a sequence that iterates from min to max
for elt in q.descending { ... } // a sequence that iterates from max to min

I think I still prefer heap to unordered because it provides additional useful information (the elements aren't arbitrarily unordered like a hash table). However, this seems to me like a solid direction regardless of the names. I think @AquaGeek, you should proceed with your preferred names.

Meta question: does this sort of discussion belong in the forums or would we rather it happen here?

I think discussion here is better, as we would be able to see code changes and implementation suggestions all in one place.

I think this discussion is working well in this thread—we seem to be making progress and converging. If we find ourselves at a stalemate, that might be a good time to gather more opinions by bringing it to the forums. And before we merge the PR, it'd prob be good to solicit feedback and do some kind of API review on the forums.

@AquaGeek
Copy link
Contributor Author

AquaGeek commented Jun 8, 2021

@AmanuelEphrem Do you want to work on implementing ascending and descending and open a PR against my branch? I was thinking of returning an Iterator from both of those that conforms to both IteratorProtocol and Sequence:

extension PriorityQueue {
    public struct Iterator: IteratorProtocol, Sequence {
        private var _base: PriorityQueue
        private let _direction: IterationDirection

        []
    }

    var ascending: Iterator {
        Iterator(_base: Self, _direction: .ascending)
    }

    var descending: Iterator {
        Iterator(_base: Self, _direction: .descending)
    }
}

I'll work on adding the performance tests.

@AquaGeek
Copy link
Contributor Author

AquaGeek commented Jun 8, 2021

Here's a first pass at performance tests:

chart-pq

@AmanuelEphrem
Copy link
Contributor

@AmanuelEphrem Do you want to work on implementing ascending and descending and open a PR against my branch? I was thinking of returning an Iterator from both of those that conforms to both IteratorProtocol and Sequence:

Yes! I was thinking of the exact same implementation as well.

input: [Int].self
) { input in
return { timer in
var queue = PriorityQueue(input)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go above the closure (so that we're not also timing the instantiation)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should still stay within the outer closure, but outside the inner closure (essentially swapping lines 54 and 55) so it stays consistent with the other benchmarks.

@hassila
Copy link
Contributor

hassila commented Jun 9, 2021

Here's a first pass at performance tests:

chart-pq

Interesting! Out of curiosity what hardware is this running on? (And to confirm it is built in release mode?)

@AquaGeek
Copy link
Contributor Author

AquaGeek commented Jun 9, 2021

Interesting! Out of curiosity what hardware is this running on? (And to confirm it is built in release mode?)

This was run on a 2019 MacBook Pro (2.3 GHz 8-Core Intel Core i9). Yes, it was built in release mode.

@kylemacomber
Copy link

Here's a first pass at performance tests:

chart-pq

@AquaGeek Nice! It'd be great to see how the performance compares to CFBinaryHeap for equivalent operations (e.g. CFBinaryHeapAddValue and CFBinaryHeapGetMinimum)!

@AquaGeek
Copy link
Contributor Author

@AquaGeek Nice! It'd be great to see how the performance compares to CFBinaryHeap for equivalent operations (e.g. CFBinaryHeapAddValue and CFBinaryHeapGetMinimum)!

@kylemacomber Here's a quick pass at adding CFBinaryHeap implementations. I'll get the code added here shortly. Somebody definitely needs to check my work here — take the benchmarks with a hefty grain of salt.

CFBinaryHeap Comparison

@hassila
Copy link
Contributor

hassila commented Jun 14, 2021

I just spent a little time profiling a bit (as the benchmark runtime seems quite long compared to the C++ version referenced in the article linked to previously), there seems to be an issue with using swapAt: (probably the same issue discussed here) - I just did a simple measurement of PriorityQueue<Int> insert and could see around 7.5M transient allocations. Doing the simple change of manually doing the swapAt: like this in all places:

storage.swapAt(largestDescendantIdx, index)

changed to

let tmp = storage[index]
storage[index] = storage[largestDescendantIdx]
storage[largestDescendantIdx] = tmp

brought this test down to 165K transient allocations and the runtime was basically halved.

And the original benchmark went from:
chart-old

to:

chart-new

Not sure if this is a known issue with the swapAt: method, but it seems a bit rough to force two memory allocations per swapAt: operation for a simple small value type.

Also, looking at the removeMax test, there is also a large amount of transient allocations with this backtrace, can't quite understand why though (haven't looked at any others yet).

image

@hassila
Copy link
Contributor

hassila commented Jun 14, 2021

Tracked down the removeMax transient allocations too (disable inlining of _trickleDownMax and used Instruments), removing them changed the runtime for one iteration of the benchmark from from 95s -> 83s (and removed several M allocations too).

It was triggered by a couple of guard statements looking like:

guard let (smallestDescendantIdx, isChild) = _indexOfChildOrGrandchild(of: index, sortedUsing: <) else {
 return
}

So I just tried remove 'sortedUsing' and created two duplicate methods for _indexOfChildOrGrandchild (one for greater, one for less than) and the allocations went away (it seems an implicit closure is created by passing the operator function as an argument which caused all the allocations).

image

@glessard
Copy link
Contributor

storage.swapAt(largestDescendantIdx, index)

I wonder if storage.withUnsafeMutableBufferPointer { $0.swapAt(largestDescendantIdx, index) } would also exhibit the excessive allocations problem?

The type stored in the underlying heap has more than 2 values, so Pair is a misnomer. This has
been renamed to 'Element.' The existing tyepalias Element has been renamed to 'Pair,' as it only
includes the Value and its Priority.
These were removed from the underlying Heap in another PR. Many of the same reasons for doing that apply here.
@AquaGeek
Copy link
Contributor Author

@lorentey I rebased on main again and removed the ordered views from PriorityQueue.

struct _Element: Comparable {
@usableFromInline let value: Value
let priority: Priority
let insertionCounter: UInt64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone have any thoughts on this approach? Is using UInt64 overkill for this? Should we overflow back to 0 on insertion instead of trapping?

@jnssnmrcs
Copy link

jnssnmrcs commented Dec 20, 2021

What about decreasePriority/increasePriority functions? It's used by some algorithms, for example Dijkstras algorithm, to become faster.

@datwelk
Copy link

datwelk commented Sep 1, 2022

What is the status of this PR, is it pending review still? @AquaGeek

@AquaGeek
Copy link
Contributor Author

AquaGeek commented Sep 1, 2022

What is the status of this PR, is it pending review still? @AquaGeek

Yes, it's still pending review. In particular, I'm looking for feedback on the best way to handle FIFO ordering — right now, I'm using a UInt64 insertion counter.

I'd love to get this landed.

@datwelk
Copy link

datwelk commented Sep 1, 2022

@AquaGeek Have you considered using a timestamp such as Date() to enforce FIFO in case of equal priorities instead of the counter? What would be the disadvantages of that approach? I guess precision.

@AquaGeek
Copy link
Contributor Author

AquaGeek commented Sep 2, 2022

@AquaGeek Have you considered using a timestamp such as Date() to enforce FIFO in case of equal priorities instead of the counter? What would be the disadvantages of that approach? I guess precision.

I think that would be effectively the same thing but with the downside of depending on Foundation. There may be performance or space differences worth digging into.

@hassila
Copy link
Contributor

hassila commented Sep 2, 2022

@AquaGeek Have you considered using a timestamp such as Date() to enforce FIFO in case of equal priorities instead of the counter? What would be the disadvantages of that approach? I guess precision.

I think that would be effectively the same thing but with the downside of depending on Foundation. There may be performance or space differences worth digging into.

If exploring this, I'd aim for using Instant then from 5.7 to keep it Foundation-free (https://github.com/apple/swift-evolution/blob/main/proposals/0329-clock-instant-duration.md). Space-wise it would be twice (128 bits), performance of getting the current time is fairly decent these days (used to suck big time), but hardly as cheap as maintaining a counter - which is the best thing I can think of at least for maintaining FIFO.

/// - Complexity: O(log `count`)
@inlinable
public mutating func popMax() -> Value? {
_base.popMax()?.value
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also return the priority along with value like in Python?

next = q.get()
print(next)
//(1, 'Jones')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also confused about why we can't get back the priorities we previously set.

@lorentey
Copy link
Member

lorentey commented Sep 25, 2022

My gut feeling is that if someone wants to ensure that this preserves insertion order for items with the same priority, then they can simply define their Priority type to account for this.

Using Date (or, rather, ContinuousClock.now) would not be appropriate -- clocks do not return strictly monotonically increasing instants.

@lorentey
Copy link
Member

lorentey commented Sep 25, 2022

Then again, if they're defining a type anyway, they could just as well define a custom Comparable wrapper over their element type, and just use Heap.

I don't quite get the value this adds over Heap at all. The case for PriorityQueue isn't nearly as self-evident as it is with Set vs Dictionary -- Dictionary adds a wealth of functionality that isn't directly expressible using Set. PriorityQueue is effectively just a glorified typealias:

struct Prioritized<Element, Priority: Comparable>: Comparable {
  let priority: Priority
  let element: Element

  init(_ element: Element, _ priority: Priority) { 
    self.priority = priority
    self.element = element
  }

  static func ==(left: Self, right: Self) -> Bool { left.priority == right.priority }
  static func <(left: Self, right: Self) -> Bool { left.priority < right.priority }
}

typealias PriorityQueue<Element, Priority: Comparable> = 
  Heap<Prioritized<Element, Priority>>

Can y'all remind me of the point in defining a separate PriorityQueue type? Is it just for convenience?

If stable ordering is needed often enough to deserve its own place here, then ideally it should be offered for Heap as well, by a StableHeap wrapper type.

The PriorityQueue type in the current PR would then become

typealias StablePriorityQueue<Element, Priority: Comparable> = 
  StableHeap<Prioritized<Element, Priority>>

@AquaGeek
Copy link
Contributor Author

Can y'all remind me of the point in defining a separate PriorityQueue type? Is it just for convenience?

Honestly, it has been so long that I've lost a lot of the context here.

If stable ordering is needed often enough to deserve its own place here, then ideally it should be offered for Heap as well, by a StableHeap wrapper type.

At the moment, that would have been my primary argument for PriorityQueue — I imagine the typical use case would prefer FIFO ordering and not having to conform to Comparable (i.e. splitting out the element from the priority).

@lorentey
Copy link
Member

Not having to conform to Comparable is not necessarily a primary goal here, but I certainly see how guaranteeing stable ordering would be annoying to implement, especially if one had to repeatedly do that.

My hesitation is really just me getting a bit worried about the hidden costs of having to store a full UInt64 value with every single item, just on the off chance some of them might compare equal. Clients that do not care about preserving insertion order shouldn't have to pay the storage overhead of these. (A hidden 8-byte value usually doesn't sound too bad, but it can end up significant enough to be worth worrying about.)

Automatically & implicitly bundling stable ordering with the Element+Priority separation sounds unappealing to me -- both aspects are interesting ideas on their own, but combining them together under the name PriorityQueue seems less palatable on the whole.

What if we waited until we had some actual users for this module before we start elaborating it? If potential adopters came back complaining about the pain of having to define wrapper types and/or asking how they could preserve insertion order, then we'd probably be in a better position to figure out the right way to serve them. (Which may well turn out to be your original design -- I may just not be seeing that far ahead!)

Note: I'm not aware of any high-profile heap implementations in other languages that provide out-of-the-box support for preserving insertion order. Then again, I only did a quick & very basic search and may have missed something obvious.

(Now of course we could also preserve insertion ordering using a different design -- such as using UInt32 values, and renumbering all existing items if we ever wrap around, or by tweaking the underlying Heap implementation in some way. But then again, the effort of thinking about & implementing these may well prove unnecessary, and why bother until we know we'll need it...)

@AquaGeek
Copy link
Contributor Author

Not having to conform to Comparable is not necessarily a primary goal here, but I certainly see how guaranteeing stable ordering would be annoying to implement, especially if one had to repeatedly do that.

My hesitation is really just me getting a bit worried about the hidden costs of having to store a full UInt64 value with every single item, just on the off chance some of them might compare equal. Clients that do not care about preserving insertion order shouldn't have to pay the storage overhead of these. (A hidden 8-byte value usually doesn't sound too bad, but it can end up significant enough to be worth worrying about.)

Yeah, I think that's valid. I'm hesitant to commit to too many design decisions in absence of compelling use cases.

There's always the "if you care about eking out every last drop of performance you can drop down to Heap directly" argument. 🤷

What if we waited until we had some actual users for this module before we start elaborating it? If potential adopters came back complaining about the pain of having to define wrapper types and/or asking how they could preserve insertion order, then we'd probably be in a better position to figure out the right way to serve them. (Which may well turn out to be your original design -- I may just not be seeing that far ahead!)

@phausler was one of the people who brought up FIFO ordering in the forum thread. The comparison was against NSOperationQueue.

The original use case for me starting to implement this at all was a priority queue for pending network requests. FIFO ordering is important, as we don't want a request that was enqueued first waiting forever in the queue because subsequent requests with the same priority happen to get inserted into a slot ahead of it in the heap. That being said, we haven't cut over to this library's implementation yet, partly because we were hoping to get the FIFO ordering and the removal of the Comparable requirement provided by PriorityQueue first (network requests being Comparable directly doesn't really make sense, though I guess we could always create a wrapper type).

(Now of course we could also preserve insertion ordering using a different design -- such as using UInt32 values, and renumbering all existing items if we ever wrap around, or by tweaking the underlying Heap implementation in some way. But then again, the effort of thinking about & implementing these may well prove unnecessary, and why bother until we know we'll need it...)

I mainly used UInt64 out of an abundance of caution. UInt32 still provides 4B inserts before rolling over — my first reaction is "nobody will hit that," but network servers that run for a long time might. There is probably prior art to draw on here. One quick thought I had was to zero the insertion counter when the queue becomes empty.

@lorentey lorentey added the Heap Min-max heap module label Nov 16, 2022
static func < (lhs: Self, rhs: Self) -> Bool {
if lhs.priority < rhs.priority {
return true
} else if lhs.priority == rhs.priority {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue we've run into in our upstream implementation of this, is that this Comparable implementation only works when trying to dequeue the minimum element. Dequeueing the maximum element doesn't respect FIFO ordering due to the default implementation of > being to flip the arguments and use < (i.e. a > bb < a). Implementing > to also account for the insertion counter starts violating invariants in the heap — it creates weird situations where an element can be both greater than and less than another if it was inserted before and has the same priority.

One way we've come up with to fix this is to only insert the priorities into the underlying heap and use them as a key for looking up an array (or linked list or deque) in a dictionary that actually contains the values. That eliminates the need to store a UInt64 per element but comes at some cost (especially if you use an array) and requires that the Priority be Hashable. None of these changes affect the underlying Heap.

struct PriorityQueue<Value, Priority: Hashable> {
    var _base: Heap<Priority>
    var _elements: Dictionary<Priority, Deque<Value>>

    ...
}

Are there other approaches for maintaining FIFO ordering?

@lorentey
Copy link
Member

lorentey commented Sep 29, 2023

One issue we've run into in our upstream implementation of this, is that this Comparable implementation only works when trying to dequeue the minimum element. Dequeueing the maximum element doesn't respect FIFO ordering due to the default implementation of > being to flip the arguments and use < (i.e. a > b ⇢ b < a).

This is a good point, but only as it underscores a point. From my viewpoint, the fact that popMin's ordering is precisely the reverse of popMax is not a bug; rather, it is the entire point.

Expecting popMin and popMax to both observe FIFO ordering would be an undesirable semantic complication that I don't believe would carry its weight -- for the data structure to make sense, the < relation must define a total ordering, and the pop operations need to observe it.

Clients have full flexibility to resolve the ordering of items as they wish, by conforming their Element (or if you like, Priority) type to Comparable as they choose. If a client wishes to have popMin return elements with equal priority in FIFO order, then it's straightforward to implement that by adding a serial number to the custom element type, then defining < to compare that in addition to the priority value. popMax will then return items in the opposite order of that, by definition. If a client wants popMax to use FIFO ordering, then they can simply reverse the polarity of the serial number comparison.

I believe Heap is a great implementation (given the constraints of today's Swift), and it is ready to ship as is. If we also want to add a second heap-like type, I'd need that to provide something that (1) is useful to many clients, but (2) isn't trivial to implement.

The idea of adding a serial number to incoming items to implement a FIFO popMin operation (like this PR does) would pass the first criterium, but it doesn't seem complex enough to be worth the expense of adding it to this package. (Keep in mind that this package has never seen a feature release beyond 1.0, and it has a very significant unreleased backlog of highly important work -- potential new additions need to pass an incredibly high importance check.)

Adding a second heap-like data structure that provides both a FIFO popMin and a FIFO popMax (as suggested) would certainly pass the second criterium, but I do not see how can it possibly pass the first. The double-ended nature of our min-max heap is a bonus convenience feature that we got for free; I'm not aware of important use cases that inherently require it, much less ones that need both directions to implement a FIFO ordering.

Given how backlogged this package is, I don't see us shipping second-order additions like this in the foreseeable future -- accordingly, I'm going to close this PR. Apologies for leaving it open this long!

I'm not against trying again after we've managed to ship the current set of data structures in limbo -- however, my concerns above will likely still apply. An example of a perhaps more interesting direction (that could better satisfy both criteria above) might be to have a PriorityQueue type that supports an O(log(n)) remove/update operation.

@lorentey lorentey closed this Sep 29, 2023
@fbartho
Copy link

fbartho commented Dec 31, 2023

Sorry for resurrecting the discussion here if people are exhausted about it; but if this library is never going to add PriorityQueue -- would an acceptable alternative be some Examples in the docs on how to use the Heap type to roll your own PriorityQueues using Heap?

Looking at the diff in this PR, there's a ton of details that could be omitted if users are implementing the PriorityQueue they need, so I think a short-enough example would be small enough to fit in the docs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Heap Min-max heap module
Projects
None yet
Development

Successfully merging this pull request may close these issues.