Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counted Set #20

Open
ladd opened this issue Apr 12, 2021 · 4 comments
Open

Counted Set #20

ladd opened this issue Apr 12, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@ladd
Copy link

ladd commented Apr 12, 2021

How about a counted set/multiset type? These are pretty commonly used, and can be quite handy:

https://developer.apple.com/documentation/foundation/nscountedset
https://developer.apple.com/documentation/corefoundation/cfbag-s1l

@ladd ladd added the enhancement New feature or request label Apr 12, 2021
@lorentey
Copy link
Member

Yep -- counted multisets are useful, and they would be an interesting API design exercise!

I think an unordered counted multiset that stores its contents in a Dictionary might be a relatively straightforward addition. (I wonder if they can be implemented better than just a dictionary mapping set elements to their multiplicities -- perhaps there is some optimization opportunity there, too.)

An ordered multiset would likely not be counting -- it would need to preserve duplicate items so that it can keep them in the desired order. This would be much like OrderedSet except without the uniqueness requirement. (I.e., the benefit over Array would be that it provided a fast way to find all instances of particular items.) Unfortunately, our existing hash table implementations can't handle duplicate keys well, so we'd need to switch away from open addressing with linear probing. (Which isn't necessarily a big deal, but it does lead to concerns about memory allocation overhead & general performance.)

We already have plans to use the B-tree implementation that we develop in #1 to add a sorted multiset type. This too would keep duplicates as is, rather than counting them.

@lorentey
Copy link
Member

Part of the API design challenge is to figure out how to expose the multiplicities while also keeping the interface familiar to Swift users. (SetAlgebra definitely can't handle multisets, but its high-level operations translate pretty well, I think.)

API design-wise, I expect a counted multiset will have a very different interface than a dupe-preserving one. For example, would CountedUnorderedMultiset be a Collection? If so, what would be its Element type?

@kylemacomber
Copy link

As CountedSet (aka CountedUnorderedMultiset) is conceptually a Dictionary<Key, Int>, I think it would make sense to have an equivalent Element type: (Key, Int). Similarly, I think it would be OK for CountedSet to conform to Collection like Dictionary. However, arguably Dictionary's direct collection conformance is a mistake, so maybe CountedSet shouldn't follow its precedent there (we could vend a collection view instead).

@amomchilov
Copy link

Hey all, if a CountedSet fruition, could we expose a chainable API for constructing it?

Name TBD, but something like a tallied() or counted() method on Sequence would be nice.

This is more chainable, and substantially improves the code clarity. See this motivation for why input.grouped(by: ...) is nicer than Dictionary(grouping: input, by: ...). (Implemented in apple/swift-algorithms#197)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants