Implement `SortedSet`, `SortedDictionary` #1

lorentey · 2021-04-05T18:05:26Z

OrderedSet and OrderedDictionary work great when we need to keep their elements in the order they were inserted, or if we only need to infrequently reorder/sort them. However, inserting (or removing) an element from the middle of the collection takes linear time in both cases, which makes these generic types less suitable for maintaining a sorted collection. It would be very much desirable for Swift developers to have access to efficient sorted collection types.

Self-balancing search trees naturally keep their elements sorted, and they implement efficient (O(log(n))) insertions and removals from anywhere in the collection -- so they seem an ideal choice for the implementation of a standard suite of sorted collections.

Binary search trees and similar low-fanout search tree variants (such as AVL trees or red-black trees) need to maintain a multitude of internal references, so they come with an inappropriately large memory overhead. High fanout search trees such as in-memory B-trees organize their elements into a tree of small contiguous buffers (up to a couple thousand items or so), and this makes them far more efficient in practice -- in terms of both space and time.

Unlike collections that store their elements in a single, contiguous buffer, tree-based collections also allow different versions of the same tree to share some of their storage, by simply referring to the same nodes within the tree. This makes them potentially more efficient than the existing standard collection types when implementing the copy-on-write optimization. (In B-tree's case, we can maintain a nice balance between lookup and CoW performance by having the inner nodes of the tree have a lower maximum fanout number than leaves.)

We'd like this package to implement a great set of sorted collection types based on an in-memory B-tree implementation.

struct SortedSet<Element: Comparable>: BidirectionalCollection, SetAlgebra {....}
struct SortedDictionary<Key: Comparable, Value> { ... }

struct SortedBag<Element: Comparable> { ... }
struct SortedMultimap<Key: Comparable, Value> { ... }

The text was updated successfully, but these errors were encountered:

codeman7 · 2021-04-07T22:22:30Z

Hey @lorentey have you kicked this work off? I'd love to contribute and help out if possible.

Jeehut · 2021-04-10T12:33:25Z

Just want to add that the types SortedSet and SortedDictionary would also allow for finally adding binary search to Swift. The API could look very familiar, but under the hoods it would be much faster by using binary search:

let sortedNumbers: SortedSet = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
if let firstMatchingIndex = sortedNumbers.index(where: { $0 > 5 }) {
  let firstMatch = sortedNumbers[firstMatchingIndex]
}

The index(where:) method would then include the binary search algorithm.

Also, @lorentey I don't quite understand why you're restricting adding sorted variants to Set and Dictionary only, I think Array deserves one just as much. The only reason there's no OrderedArray in this library is because Array is order by definition. But it's not sorted, so we should also have a SortedArray.

See also my own SortedArray implementation that I have been already using in production for years.

kylemacomber · 2021-04-10T13:58:41Z

I think the SortedBag that @lorentey mentions at the end of the issue would effectively be the same as a SortedArray. It's interesting to consider what the more approachable name is.

Jeehut · 2021-04-11T17:39:45Z

@kylemacomber I was already wondering what the SortedBag is supposed to mean. Never heard Bag as a type name before, maybe it's just me but I find it confusing. We should use known terms, I would say. Array seems an obvious one.

lorentey · 2021-04-14T01:02:57Z

"Bag" is a well-established name for a set that allows duplicates; it's less clinical sounding as "multiset", but that is definitely a viable name for it. (For what it's worth, multiset seems to be more popular these days; see e.g. Guava, LLVM, or Boost. However, contrast this with Eclipse, Apache Commons, and others. Knuth also prefers the term multiset, and that probably seals the deal for me. 😉)

At this early in the implementation stage though, I think it would probably be better to think about the abstract construct and the operations it would provide, rather than the exact name we would end up labeling it under.

(E.g., note that there is a distinction between a multiset that stores each individual duplicate, and one that merely counts duplicates. For a sorted collection intended to serve as a replacement for O(n^2) sorted array implementations, I expect we'd need the former variant.)

The word "array" has some strong connotations with random-access indexing to me. I don't know if I'd like to use the term SortedArray to mean anything other than an Array that is sorted. (And to be clear, I don't think adding a SortedArray in this sense would be a high priority for this package.)

<aside>(My problem with names for less universally established mathematical (or programming) concepts is the same as my problem with indentation styles and football clubs -- I find people tend to stick to whatever they got first exposed to, and they sometimes form overly strong attachments to them. This includes myself, of course.)</aside>

codeman7 · 2021-05-03T18:46:48Z

What would the process generally look like for someone to contribute? Both from a personal aspect as well as from a professional one if possible?

CTMacUser · 2021-07-08T05:26:46Z

Just want to add that the types SortedSet and SortedDictionary would also allow for finally adding binary search to Swift.

That should probably be in the Algorithms library, so any collection that stores its elements in-order can take advantage of it, not just specific container types. (In fact, I think it has one right now, and I've proposed at least another in the past.)

lorentey · 2022-09-19T23:09:52Z

Provisionally putting on the 1.1.0 milestone. This may get bumped off if we take too much time on persistent collections, but let's assume otherwise for now.

lorentey · 2022-10-11T20:28:24Z

Bumping to 1.2.0 acknowledging just how much work the rest of the new features in 1.1.0 is taking.
(I'm hoping 1.2.0 will come soon after 1.1.0 though.)

mdznr · 2022-12-03T09:43:23Z

It would be nice if SortedSet didn’t require Element to conform to Comparable, and instead could take a comparator closure ((Element, Element) -> Bool), like the sort(by:) family of functions. This would be useful in cases where you might want to have the same type sorted in different ways in different collections. Of course, if Element is Comparable, there’d be a convenience initializer for that.

For example, if you wanted a sorted list of people (SortedSet<Person>), you may wish to have them sorted by either their given name or family name. This different valid mechanisms for sorting isn’t something that could really easily be defined in a Comparable implementation on a Person type.

Before:

class Person: Equatable, Hashable {
  let givenName: String
  let familyName: String
  
  init(givenName: String, familyName: String) {
    self.givenName = givenName
    self.familyName = familyName
  }
  
  static func == (lhs: Person, rhs: Person) -> Bool {
    return (lhs.givenName == rhs.givenName) &&
           (lhs.familyName == rhs.familyName)
  }
  
  func hash(into hasher: inout Hasher) {
    hasher.combine(self.givenName)
    hasher.combine(self.familyName)
  }
}

class GivenNameSortedFirstPerson: Person, Comparable {
  static func < (lhs: GivenNameSortedFirstPerson, rhs: GivenNameSortedFirstPerson) -> Bool {
    if lhs.givenName < rhs.givenName {
      return true
    } else if lhs.givenName > rhs.givenName {
      return false
    } else {
      return lhs.familyName < rhs.familyName
    }
  }
}

let givenNameSortedPeople = SortedSet<GivenNameSortedFirstPerson>()

class FamilyNameSortedFirstPerson: Person, Comparable {
  static func < (lhs: FamilyNameSortedFirstPerson, rhs: FamilyNameSortedFirstPerson) -> Bool {
    if lhs.familyName < rhs.familyName {
      return true
    } else if lhs.familyName > rhs.familyName {
      return false
    } else {
      return lhs.givenName < rhs.givenName
    }
  }
}

let familyNameSortedPeople = SortedSet<FamilyNameSortedFirstPerson>()

After:

struct Person: Equatable, Hashable {
  let givenName: String
  let familyName: String
}

extension Person {
  static func compareByGivenNameFirst(_ lhs: Self, _ rhs: Self) -> Bool {
    if lhs.givenName < rhs.givenName {
      return true
    } else if lhs.givenName > rhs.givenName {
      return false
    } else {
      return lhs.familyName < rhs.familyName
    }
  }
  
  static func compareByFamilyNameFirst(_ lhs: Self, _ rhs: Self) -> Bool {
    if lhs.familyName < rhs.familyName {
      return true
    } else if lhs.familyName > rhs.familyName {
      return false
    } else {
      return lhs.givenName < rhs.givenName
    }
  }
}

let givenNameSortedPeople = SortedSet<Person>(comparator: Person.compareByGivenNameFirst(_:_:))
let familyNameSortedPeople = SortedSet<Person>(comparator: Person.compareByFamilyNameFirst(_:_:))

lorentey · 2022-12-06T22:09:37Z

It would be nice if SortedSet didn’t require Element to conform to Comparable, and instead could take a comparator closure ((Element, Element) -> Bool), like the sort(by:) family of functions.

This would have too many undesirable consequences to accept. The core sorted collection types will definitely not work this way.

Closure values are inherently not Equatable. Therefore, there is no way to prove that a collection that's sorted by a closure is using a particular ordering -- the order is neither reflected in the type system, nor it is verifiable at runtime without manually scanning the entire contents of the collection. This makes closure-based collection designs inherently weaker than ones that make use of protocol conformances.

In my (very, very strongly held) opinion, two SortedSet values of the same type must be guaranteed to sort their members precisely the same way. This is a crucial feature of the core sorted collection types; violating it is not an option.

A function that takes a SortedSet must encode its ordering expectation within the Swift type system, and the natural way to do this is to rely on the Comparable protocol. SortedSet and SortedDictionary will therefore follow Set and Dictionary in making full use of Swift's protocols. Instead of Hashable, they will naturally require their Element type to conform to Comparable.

Beyond the type-safety/expressivity issue of not actually expressing the ordering in the type system, using a closure-based ordering would also have dire technical consequences.

It would make it impossible to efficiently implement crucial methods for combining sets, such as intersection or union: there would be no way to quickly verify if two sets use the same ordering relation, so combining two sets would require element-wise processing with O(n*log(n)) worst-case performance. This would not be acceptable behavior for the basic sorted collection types. Instead, the statically typed SortedSet will guarantee a linear upper bound for merges. Additionally, it will be able to recognize identical subtrees in its inputs and process them in O(1) time -- which is going to be a crucial feature for discovering differences between diverged values.
In addition, while the Swift compiler is able to specialize the implementation of generic types for specific type arguments, it is nowhere near capable of specializing functions for specific closure arguments -- so a closure-based API is expected to be generally slower (by a constant, but potentially significant factor) than ones that make good use of the type system.

Having said all this, though, I would be also open to shipping additional collection types that take closures -- once we have shipped the core sorted collection types. I would very strongly object to shipping closure-based collections before we ship the type-safe ones, or to implement type-safe sorted collections on top of lesser, closure-based variants.

Instead of designing closure-taking APIs, we also have the option of using mixin type parameters instead:

enum ComparisonResult {
  case orderedAscending
  case orderedSame
  case orderedDescending
}

protocol ComparatorProtocol {
  associatedtype Value
  static func compare(_ first: Value, _ second: Value) -> ComparisonResult
}

struct GeneralizedSortedSet<Element, Comparator: ComparatorProtocol> {
 ...
}

struct Person: Equatable, Hashable {
  let givenName: String
  let familyName: String

  struct ComparatorByGivenNameFirst: ComparatorProtocol {
    typealias Value = Person
    static func compare(_ first: Value, _ second: Value) -> ComparisonResult {
      ...
    }
  }

  struct ComparatorByFamilyNameFirst: ComparatorProtocol {
    typealias Value = Person
    static func compare(_ first: Value, _ second: Value) -> ComparisonResult {
      ...
    }
  }
}

let givenNameSortedPeople = GeneralizedSortedSet<Person, Person.ComparatorByGivenNameFirst>()
let familyNameSortedPeople = GeneralizedSortedSet<Person, Person.ComparatorByFamilyNameFirst>()

However, this is by no means necessary -- it's simply a mechanical transformation of the far more pedestrian option of creating trivial wrapper types over Person:

struct PersonOrderedByGivenName: Comparable {
  var value: Person

  func ==(left: Self, right: Self) -> Bool { left.value == right.value }
  func <(left: Self, right: Self) -> Bool { ... }
}

struct PersonOrderedByGivenName: Comparable {
  var value: Person

  func ==(left: Self, right: Self) -> Bool { left.value == right.value }
  func <(left: Self, right: Self) -> Bool { ... }
}

let givenNameSortedPeople = SortedSet<PersonOrderedByGivenName>()
let familyNameSortedPeople = SortedSet<PersonOrderedByFamilyName>()

GeneralizedSortedSet can be built on top of SortedSet, or vice versa.

Note that GeneralizedSortedSet would share the type safety of the core SortedSet -- you cannot pass a set that's sorted by given name to a function that expects to get a set that's ordered by family name. I generally consider this a feature, not a bug.

(Like closure-based data structures, GeneralizedSortedSet is also firmly in the "possible future direction" box -- the fundamental problem we need to tackle first is shipping the SortedSet type that requires its Element to be Comparable. Once we have done that, we'll have a sound technical foundation on which we can build other things.)

fix autoclosure capture issue

pbk20191 · 2024-08-18T10:03:57Z

In current implementation. It is not possible to look up for non existing Key.

It would great if SortedDictionary and SortedSet provides this kind of feature.
Like java.util.NavigableMap and java.util.NavigableSet

extension SortedDictionary {
// returns the existing Index, if key exist, else return Index which contains bigger key than the given key.
   func ceilingIndex(for key:Key) -> Index
// returns the existing Index, if key exist, else return Index which contains smaller key than the given key.
   func floorIndex(for key:Key) -> Index
}

extension SortedSet {
   func ceilingIndex(for item:Element) -> Index
   func floorIndex(for item:Element) -> Index
}

I hope it won't be too hard, since _BTree already has lastIndex(forKey:) and firstIndex(forKey:)

lorentey added the enhancement New feature or request label Apr 5, 2021

lorentey self-assigned this Apr 5, 2021

lorentey mentioned this issue Apr 14, 2021

Counted Set #20

Open

lorentey assigned vihanb Jun 4, 2021

lorentey mentioned this issue Jun 4, 2021

BTreeIterator is very slow attaswift/BTree#17

Open

vihanb mentioned this issue Jul 21, 2021

Sorted collections #65

Merged

7 tasks

lorentey added this to the 1.1.0 milestone Sep 19, 2022

lorentey modified the milestones: 1.1.0, 1.2.0 Oct 11, 2022

JonathanStorey mentioned this issue Jan 29, 2023

Alternative implementation of MutableCollection conformance for OrderedDictionary.Values #262

Closed

Azoy added a commit that referenced this issue Jun 14, 2024

Merge pull request #1 from glessard/inout

dbf90bd

fix autoclosure capture issue

lorentey mentioned this issue Oct 21, 2024

SortedCollections is missing is when using fixed versions #425

Closed

2 tasks

ximlu mentioned this issue Oct 31, 2024

compile to android failed #432

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `SortedSet`, `SortedDictionary` #1

Implement `SortedSet`, `SortedDictionary` #1

lorentey commented Apr 5, 2021

codeman7 commented Apr 7, 2021

Jeehut commented Apr 10, 2021 •

edited

Loading

kylemacomber commented Apr 10, 2021

Jeehut commented Apr 11, 2021

lorentey commented Apr 14, 2021

codeman7 commented May 3, 2021

CTMacUser commented Jul 8, 2021

lorentey commented Sep 19, 2022

lorentey commented Oct 11, 2022

mdznr commented Dec 3, 2022 •

edited

Loading

lorentey commented Dec 6, 2022 •

edited

Loading

pbk20191 commented Aug 18, 2024

Implement SortedSet, SortedDictionary #1

Implement SortedSet, SortedDictionary #1

Comments

lorentey commented Apr 5, 2021

codeman7 commented Apr 7, 2021

Jeehut commented Apr 10, 2021 • edited Loading

kylemacomber commented Apr 10, 2021

Jeehut commented Apr 11, 2021

lorentey commented Apr 14, 2021

codeman7 commented May 3, 2021

CTMacUser commented Jul 8, 2021

lorentey commented Sep 19, 2022

lorentey commented Oct 11, 2022

mdznr commented Dec 3, 2022 • edited Loading

lorentey commented Dec 6, 2022 • edited Loading

pbk20191 commented Aug 18, 2024

Implement `SortedSet`, `SortedDictionary` #1

Implement `SortedSet`, `SortedDictionary` #1

Jeehut commented Apr 10, 2021 •

edited

Loading

mdznr commented Dec 3, 2022 •

edited

Loading

lorentey commented Dec 6, 2022 •

edited

Loading