-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement SortedSet
, SortedDictionary
#1
Comments
Hey @lorentey have you kicked this work off? I'd love to contribute and help out if possible. |
Just want to add that the types let sortedNumbers: SortedSet = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
if let firstMatchingIndex = sortedNumbers.index(where: { $0 > 5 }) {
let firstMatch = sortedNumbers[firstMatchingIndex]
} The Also, @lorentey I don't quite understand why you're restricting adding sorted variants to See also my own |
I think the |
@kylemacomber I was already wondering what the |
"Bag" is a well-established name for a set that allows duplicates; it's less clinical sounding as "multiset", but that is definitely a viable name for it. (For what it's worth, multiset seems to be more popular these days; see e.g. Guava, LLVM, or Boost. However, contrast this with Eclipse, Apache Commons, and others. Knuth also prefers the term multiset, and that probably seals the deal for me. 😉) At this early in the implementation stage though, I think it would probably be better to think about the abstract construct and the operations it would provide, rather than the exact name we would end up labeling it under. (E.g., note that there is a distinction between a multiset that stores each individual duplicate, and one that merely counts duplicates. For a sorted collection intended to serve as a replacement for O(n^2) sorted array implementations, I expect we'd need the former variant.) The word "array" has some strong connotations with random-access indexing to me. I don't know if I'd like to use the term <aside>(My problem with names for less universally established mathematical (or programming) concepts is the same as my problem with indentation styles and football clubs -- I find people tend to stick to whatever they got first exposed to, and they sometimes form overly strong attachments to them. This includes myself, of course.)</aside> |
What would the process generally look like for someone to contribute? Both from a personal aspect as well as from a professional one if possible? |
That should probably be in the Algorithms library, so any collection that stores its elements in-order can take advantage of it, not just specific container types. (In fact, I think it has one right now, and I've proposed at least another in the past.) |
Provisionally putting on the 1.1.0 milestone. This may get bumped off if we take too much time on persistent collections, but let's assume otherwise for now. |
Bumping to 1.2.0 acknowledging just how much work the rest of the new features in 1.1.0 is taking. |
It would be nice if For example, if you wanted a sorted list of people ( Before: class Person: Equatable, Hashable {
let givenName: String
let familyName: String
init(givenName: String, familyName: String) {
self.givenName = givenName
self.familyName = familyName
}
static func == (lhs: Person, rhs: Person) -> Bool {
return (lhs.givenName == rhs.givenName) &&
(lhs.familyName == rhs.familyName)
}
func hash(into hasher: inout Hasher) {
hasher.combine(self.givenName)
hasher.combine(self.familyName)
}
}
class GivenNameSortedFirstPerson: Person, Comparable {
static func < (lhs: GivenNameSortedFirstPerson, rhs: GivenNameSortedFirstPerson) -> Bool {
if lhs.givenName < rhs.givenName {
return true
} else if lhs.givenName > rhs.givenName {
return false
} else {
return lhs.familyName < rhs.familyName
}
}
}
let givenNameSortedPeople = SortedSet<GivenNameSortedFirstPerson>()
class FamilyNameSortedFirstPerson: Person, Comparable {
static func < (lhs: FamilyNameSortedFirstPerson, rhs: FamilyNameSortedFirstPerson) -> Bool {
if lhs.familyName < rhs.familyName {
return true
} else if lhs.familyName > rhs.familyName {
return false
} else {
return lhs.givenName < rhs.givenName
}
}
}
let familyNameSortedPeople = SortedSet<FamilyNameSortedFirstPerson>() After: struct Person: Equatable, Hashable {
let givenName: String
let familyName: String
}
extension Person {
static func compareByGivenNameFirst(_ lhs: Self, _ rhs: Self) -> Bool {
if lhs.givenName < rhs.givenName {
return true
} else if lhs.givenName > rhs.givenName {
return false
} else {
return lhs.familyName < rhs.familyName
}
}
static func compareByFamilyNameFirst(_ lhs: Self, _ rhs: Self) -> Bool {
if lhs.familyName < rhs.familyName {
return true
} else if lhs.familyName > rhs.familyName {
return false
} else {
return lhs.givenName < rhs.givenName
}
}
}
let givenNameSortedPeople = SortedSet<Person>(comparator: Person.compareByGivenNameFirst(_:_:))
let familyNameSortedPeople = SortedSet<Person>(comparator: Person.compareByFamilyNameFirst(_:_:)) |
This would have too many undesirable consequences to accept. The core sorted collection types will definitely not work this way. Closure values are inherently not Equatable. Therefore, there is no way to prove that a collection that's sorted by a closure is using a particular ordering -- the order is neither reflected in the type system, nor it is verifiable at runtime without manually scanning the entire contents of the collection. This makes closure-based collection designs inherently weaker than ones that make use of protocol conformances. In my (very, very strongly held) opinion, two A function that takes a Beyond the type-safety/expressivity issue of not actually expressing the ordering in the type system, using a closure-based ordering would also have dire technical consequences.
Having said all this, though, I would be also open to shipping additional collection types that take closures -- once we have shipped the core sorted collection types. I would very strongly object to shipping closure-based collections before we ship the type-safe ones, or to implement type-safe sorted collections on top of lesser, closure-based variants. Instead of designing closure-taking APIs, we also have the option of using mixin type parameters instead: enum ComparisonResult {
case orderedAscending
case orderedSame
case orderedDescending
}
protocol ComparatorProtocol {
associatedtype Value
static func compare(_ first: Value, _ second: Value) -> ComparisonResult
}
struct GeneralizedSortedSet<Element, Comparator: ComparatorProtocol> {
...
}
struct Person: Equatable, Hashable {
let givenName: String
let familyName: String
struct ComparatorByGivenNameFirst: ComparatorProtocol {
typealias Value = Person
static func compare(_ first: Value, _ second: Value) -> ComparisonResult {
...
}
}
struct ComparatorByFamilyNameFirst: ComparatorProtocol {
typealias Value = Person
static func compare(_ first: Value, _ second: Value) -> ComparisonResult {
...
}
}
}
let givenNameSortedPeople = GeneralizedSortedSet<Person, Person.ComparatorByGivenNameFirst>()
let familyNameSortedPeople = GeneralizedSortedSet<Person, Person.ComparatorByFamilyNameFirst>() However, this is by no means necessary -- it's simply a mechanical transformation of the far more pedestrian option of creating trivial wrapper types over struct PersonOrderedByGivenName: Comparable {
var value: Person
func ==(left: Self, right: Self) -> Bool { left.value == right.value }
func <(left: Self, right: Self) -> Bool { ... }
}
struct PersonOrderedByGivenName: Comparable {
var value: Person
func ==(left: Self, right: Self) -> Bool { left.value == right.value }
func <(left: Self, right: Self) -> Bool { ... }
}
let givenNameSortedPeople = SortedSet<PersonOrderedByGivenName>()
let familyNameSortedPeople = SortedSet<PersonOrderedByFamilyName>()
Note that (Like closure-based data structures, |
In current implementation. It is not possible to look up for non existing Key. It would great if extension SortedDictionary {
// returns the existing Index, if key exist, else return Index which contains bigger key than the given key.
func ceilingIndex(for key:Key) -> Index
// returns the existing Index, if key exist, else return Index which contains smaller key than the given key.
func floorIndex(for key:Key) -> Index
}
extension SortedSet {
func ceilingIndex(for item:Element) -> Index
func floorIndex(for item:Element) -> Index
} I hope it won't be too hard, since |
OrderedSet
andOrderedDictionary
work great when we need to keep their elements in the order they were inserted, or if we only need to infrequently reorder/sort them. However, inserting (or removing) an element from the middle of the collection takes linear time in both cases, which makes these generic types less suitable for maintaining a sorted collection. It would be very much desirable for Swift developers to have access to efficient sorted collection types.Self-balancing search trees naturally keep their elements sorted, and they implement efficient (O(log(n))) insertions and removals from anywhere in the collection -- so they seem an ideal choice for the implementation of a standard suite of sorted collections.
Binary search trees and similar low-fanout search tree variants (such as AVL trees or red-black trees) need to maintain a multitude of internal references, so they come with an inappropriately large memory overhead. High fanout search trees such as in-memory B-trees organize their elements into a tree of small contiguous buffers (up to a couple thousand items or so), and this makes them far more efficient in practice -- in terms of both space and time.
Unlike collections that store their elements in a single, contiguous buffer, tree-based collections also allow different versions of the same tree to share some of their storage, by simply referring to the same nodes within the tree. This makes them potentially more efficient than the existing standard collection types when implementing the copy-on-write optimization. (In B-tree's case, we can maintain a nice balance between lookup and CoW performance by having the inner nodes of the tree have a lower maximum fanout number than leaves.)
We'd like this package to implement a great set of sorted collection types based on an in-memory B-tree implementation.
The text was updated successfully, but these errors were encountered: