Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] boost::unordered_flat_map variant with key and value stored separately #209

Open
holenno1 opened this issue Aug 28, 2023 · 5 comments

Comments

@holenno1
Copy link

Hello!

It would be great if we could have another variant of boost::unordered_flat_map where the keys and values are stored in separate arrays - "split" storage.
The main motivation for this is to minimise memory wastage due to padding.

eg: key=uint64_t, value=float would mean 4 bytes of padding per key-value pair.

Obviously this will be a further divergence from std::unordered_map's interface.
The main difference being that iterators will return std::pair<const Key&, Value&> instead of std::pair<const Key, Value>&.

However, we believe this will provide a tonne of value to fans of boost::unordered_flat_map.

Thanks!

@cmazakas
Copy link
Member

However, we believe this will provide a tonne of value to fans of boost::unordered_flat_map.

Hmm, do you have any concrete use-cases or data for this?

We'd like to consider the idea but before dedicating the resources, we'd like to get a gauge on the effectiveness of including such a map.

@ktprime
Copy link

ktprime commented Sep 2, 2023

@holenno1 This design needs at least three arrays. Even it can save some memory in use-cases, but it's performance is not as good as you think so in most cases.

@holenno1
Copy link
Author

holenno1 commented Sep 7, 2023

Hi, sorry was busy all week.

I've collected some data,


A bit of context:

We have an in-house open-addressing hash map implementation that is heavily based on LLVM's llvm::DenseMap.
That means

  • Power-of-2 buffer sizes
  • Quadratic probing (triangular number sequence)
  • 75% maximum load factor
  • 12.5% maximum tombstone ratio

llvm::DenseMap relies on 2 predefined sentinel values to indicate if a slot is "empty" or "erased/tombstone" .
Our implementation instead stores an extra bitset, where each slot in the buffer is represented by 2 bits for the aforementioned tags.

In addition, we have another hash map variant that stores the keys and values in separate buffers - "split storage".
As mentioned in our feature request, this is a primarily a memory optimisation.


Here's a memory usage experiment:

  • key_type = const A* (alignof(A) = 1)
  • mapped_type = uint32_t
  • number of entries = 100,000

A real world usage scenario for this setup is object-to-id mapping - we have a large number of heap allocated objects, and we want to assign an ID to each of them.

In fact, 100,000 entries is a little on the low side.
We typically deal with 100s of millions or low-billions of entries, which is the typical size of a large netlist or graph problem.

Here are the results:

boost::unordered_flat_map                    : 4,194,304 Bytes
llvm::DenseMap derivative                    : 8,388,608 Bytes
llvm::DenseMap derivative w/ split key-value : 6,291,456 Bytes

*Those numbers were measured by querying /proc/self/stat - Virtual memory size in bytes

The "split storage" variant sees a 25% reduction in memory usage compared to the "pair storage" variant.


Let's look at runtime.
For this, I extended the Unordered benchmark suite

Yes @ktprime you are right, the "split storage" variant will have a performance disadvantage for insertion and erasure workloads.

However, "split storage" does have a slight edge when it comes to unsuccessful lookups, as we don't have to visit the value buffer.

image
*Those numbers were measured with an isolated Xeon W-2155 (Skylake) server with gcc 12.2, boost 1.81 at -O3

@martinus
Copy link

martinus commented Sep 8, 2023

Couldn't you just create a custom key type (say with std::array<std::byte, 8>, or two uint32_t types) so that the alignment doesn't cause padding? then access the data with memcpy. I don't think it would cause much of a slowdown on most systems.

@holenno1
Copy link
Author

holenno1 commented Sep 8, 2023

I think you mean use unordered_flat_set (not map) + type punning + const_cast?
EDIT: Ok I see you meant

struct key {  std::byte raw[8]; };
struct value {  std::byte raw[4]; };

using my_map = boost::unordered_flat_map<key, value, key_hash, key_eq>;

We can certainly do that, but I think the spirit of the feature request is so we don't have to actually do that :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants