Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster hnsw CPU index training #3822

Conversation

alexanderguzhva
Copy link
Contributor

@alexanderguzhva alexanderguzhva commented Sep 3, 2024

This change decreases the training time for 1M x 768 dataset down to 10 minutes from 13 minutes in our experiments.

Please verify and benchmark.

Signed-off-by: Alexandr Guzhva <[email protected]>
@facebook-github-bot
Copy link
Contributor

@kuarora has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mengdilin
Copy link
Contributor

I microbenchmarked IndexHNSWFlat::add on google benchmark on this change and compared it against the master commit. The results are looking good (we are seeing 7-11% CPU time improvement for the microbenchmarks) with a reasonably small p-value from the U tests.

With parameters:
d=64, n=1000, M=32, threads=1, we get a 7% CPU time improvement on average (P1567300587)
d=128, n=1000, M=16, threads=1, we get a 7% CPU time improvement on average (P1567316788)
d=128, n=1000, M=32, threads=5, we get a 11% CPU time improvement on average(P1567363053)
d=64, n=2000, M=16, threads=5, we get a 10% CPU time improvement on average (P1567491190)
d=128, n=1000, M=16, threads=5, we get a 5% CPU time improvement on average (P1567371095) but this result is not statistically significant enough to be counted

I think we can verify actual impact on production with @mnorris11's work on observability for the internal customers

@kuarora and @mdouze: are there other parameters we would want to try out here?

@mdouze
Copy link
Contributor

mdouze commented Sep 5, 2024

@alexanderguzhva you are usually leaving the "simple" version of the code in comments, which is better than nothing.

Would you mind instead using a local boolean variable reference_code (or something) which is set to false and do

bool reference_code = false; 

if(reference_code) {
  ... The old short code 
}  else {
   ... Your optimized long code 
}

In this way the compiler will compile the old code but optimize it away.
The reason is that (1) it should be easy to switch back to the ref code and (2) if the interface changes somehow, we don't want the two branches to diverge

@alexanderguzhva
Copy link
Contributor Author

@mdouze done

@facebook-github-bot
Copy link
Contributor

@kuarora has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kuarora merged this pull request in e261725.

aalekhpatel07 pushed a commit to aalekhpatel07/faiss that referenced this pull request Oct 17, 2024
Summary:
This change decreases the training time for 1M x 768 dataset down to 10 minutes from 13 minutes in our experiments.

Please verify and benchmark.

Pull Request resolved: facebookresearch#3822

Reviewed By: mdouze

Differential Revision: D62151489

Pulled By: kuarora

fbshipit-source-id: b29ffd0db615bd52187464b4665c31fc9d3b8d0a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants