faster hnsw CPU index training #3822

alexanderguzhva · 2024-09-03T15:15:08Z

This change decreases the training time for 1M x 768 dataset down to 10 minutes from 13 minutes in our experiments.

Please verify and benchmark.

Signed-off-by: Alexandr Guzhva <[email protected]>

facebook-github-bot · 2024-09-03T21:18:59Z

@kuarora has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mengdilin · 2024-09-04T20:50:23Z

I microbenchmarked IndexHNSWFlat::add on google benchmark on this change and compared it against the master commit. The results are looking good (we are seeing 7-11% CPU time improvement for the microbenchmarks) with a reasonably small p-value from the U tests.

With parameters:
d=64, n=1000, M=32, threads=1, we get a 7% CPU time improvement on average (P1567300587)
d=128, n=1000, M=16, threads=1, we get a 7% CPU time improvement on average (P1567316788)
d=128, n=1000, M=32, threads=5, we get a 11% CPU time improvement on average(P1567363053)
d=64, n=2000, M=16, threads=5, we get a 10% CPU time improvement on average (P1567491190)
d=128, n=1000, M=16, threads=5, we get a 5% CPU time improvement on average (P1567371095) but this result is not statistically significant enough to be counted

I think we can verify actual impact on production with @mnorris11's work on observability for the internal customers

@kuarora and @mdouze: are there other parameters we would want to try out here?

mdouze · 2024-09-05T08:41:53Z

@alexanderguzhva you are usually leaving the "simple" version of the code in comments, which is better than nothing.

Would you mind instead using a local boolean variable reference_code (or something) which is set to false and do

bool reference_code = false; 

if(reference_code) {
  ... The old short code 
}  else {
   ... Your optimized long code 
}

In this way the compiler will compile the old code but optimize it away.
The reason is that (1) it should be easy to switch back to the ref code and (2) if the interface changes somehow, we don't want the two branches to diverge

alexanderguzhva · 2024-09-05T11:59:48Z

@mdouze done

…NSW functions Signed-off-by: Alexandr Guzhva <[email protected]>

facebook-github-bot · 2024-09-05T18:59:55Z

@kuarora has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-05T21:07:50Z

@kuarora merged this pull request in e261725.

Summary: This change decreases the training time for 1M x 768 dataset down to 10 minutes from 13 minutes in our experiments. Please verify and benchmark. Pull Request resolved: facebookresearch#3822 Reviewed By: mdouze Differential Revision: D62151489 Pulled By: kuarora fbshipit-source-id: b29ffd0db615bd52187464b4665c31fc9d3b8d0a

faster hnsw CPU index training

24bf633

Signed-off-by: Alexandr Guzhva <[email protected]>

facebook-github-bot added the CLA Signed label Sep 3, 2024

add flags that select whether to use reference versions for certain H…

52ea389

…NSW functions Signed-off-by: Alexandr Guzhva <[email protected]>

alexanderguzhva force-pushed the faster_hnsw_training branch from 9885885 to 52ea389 Compare September 5, 2024 15:12

kuarora added the Implementation label Sep 5, 2024

Merge branch 'main' into faster_hnsw_training

6251026

facebook-github-bot closed this in e261725 Sep 5, 2024

facebook-github-bot added the Merged label Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster hnsw CPU index training #3822

faster hnsw CPU index training #3822

alexanderguzhva commented Sep 3, 2024 •

edited

Loading

facebook-github-bot commented Sep 3, 2024

mengdilin commented Sep 4, 2024

mdouze commented Sep 5, 2024 •

edited

Loading

alexanderguzhva commented Sep 5, 2024

facebook-github-bot commented Sep 5, 2024

facebook-github-bot commented Sep 5, 2024

faster hnsw CPU index training #3822

faster hnsw CPU index training #3822

Conversation

alexanderguzhva commented Sep 3, 2024 • edited Loading

facebook-github-bot commented Sep 3, 2024

mengdilin commented Sep 4, 2024

mdouze commented Sep 5, 2024 • edited Loading

alexanderguzhva commented Sep 5, 2024

facebook-github-bot commented Sep 5, 2024

facebook-github-bot commented Sep 5, 2024

alexanderguzhva commented Sep 3, 2024 •

edited

Loading

mdouze commented Sep 5, 2024 •

edited

Loading