Direct intersections with serialized RoaringBitmaps #281

Kerollmops · 2024-06-05T15:46:16Z

This PR is related to #263 and implements methods to do intersections directly with serialized data. This avoids deserializing everything in memory to reduce it just after which drastically reduces the amount of allocated memory and intensive intersection operations.

To Do

Improve function documentation.
Use the containers offsets when available.
~~Convert operation methods to use the BitAndAssign/BitAnd traits~~ (too much work for now, let's ship)

src/bitmap/ops_with_serialized.rs

Co-authored-by: Tamo <[email protected]>

4682: Speed Up Filter ANDs operations r=Kerollmops a=Kerollmops This PR fixes #4659 and improves the way we do AND operations by using the latest [RoaringBitmap feature to do intersections with serialized bitmaps](RoaringBitmap/roaring-rs#281). Doing so drastically reduces the time spent reading, copying bytes in memory to use and keep a subset of the containers in the bitmap. ### Some Example Results With a 45M documents dataset running on a good NVMe. This example filter was taking 77ms and with this PR only 13ms (6x speedup): ```sql artist = 'The Beatles' AND (duration 150 TO 500 OR duration NOT EXISTS) AND genres IN [Rock, 'Rock and Roll'] AND rating > 4 AND released_year 1960 TO 1990 ``` By reordering the filter AND clauses we can reach a constant 8ms execution time. However, note that it is a manual operation. On the other side the previous filter pipeline is still at a constant 45ms execution time with this filter. (6x speedup) ```sql artist = 'The Beatles' AND genres IN [Rock, 'Rock and Roll'] AND released_year 1960 TO 1990 AND (duration 150 TO 500 OR duration NOT EXISTS) ``` ### To Do - [x] Rebase on `release-v1.9.0`. - [ ] ~Skip branches of the facet/filter tree when nothing is in common with the universe~ slower this way. - [x] When the universe is required use the universe given in parameter if possible. Co-authored-by: Clément Renault <[email protected]>

Kerollmops added 3 commits June 5, 2024 11:33

First impl trying to iter on local containers

9be1e4b

Prefer reading the content store by store

9ba1ca1

Do the intersections on containers to ensure correct store type

95432ec

Kerollmops added the enhancement label Jun 5, 2024

Kerollmops added 2 commits June 5, 2024 13:50

Use the container and store is_empty methods

77166fa

Fix building with no-std

4466ae0

Kerollmops mentioned this pull request Jun 5, 2024

Speed Up Filter ANDs operations meilisearch/meilisearch#4682

Merged

3 tasks

Kerollmops added 5 commits June 5, 2024 22:09

Avoid allocating too much containers

add2002

Skip offsets instead of copying and dropping them

7b4df74

Improve the documentation of the function

88ceb7d

Deserialize the description info at the beginning

b92f110

Use containers offsets when available

88b848b

Kerollmops force-pushed the intersection-with-serialized branch from 0a11dd0 to 88b848b Compare June 6, 2024 14:41

Always use container offsets when available

e417314

irevoire reviewed Jun 7, 2024

View reviewed changes

src/bitmap/ops_with_serialized.rs Outdated Show resolved Hide resolved

Remove useless std feature gate

457b2c9

Co-authored-by: Tamo <[email protected]>

Kerollmops marked this pull request as ready for review June 7, 2024 22:12

Kerollmops merged commit 6391a97 into main Jun 8, 2024
4 checks passed

Kerollmops deleted the intersection-with-serialized branch June 8, 2024 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct intersections with serialized RoaringBitmaps #281

Direct intersections with serialized RoaringBitmaps #281

Kerollmops commented Jun 5, 2024 •

edited

Loading

Direct intersections with serialized RoaringBitmaps #281

Direct intersections with serialized RoaringBitmaps #281

Conversation

Kerollmops commented Jun 5, 2024 • edited Loading

To Do

Kerollmops commented Jun 5, 2024 •

edited

Loading