Add AMX support to speed up Faiss Inner-Product #535

mellonyou · 2024-04-28T09:13:11Z

Use Intel AMX to speed up Inner-Product algorithm of knowhere::BruteForce::Search(), which can bring more than 10x performance boost.

Build parameter: use "-o with_dnnl=True/False" to control enable/disable AMX feature.
This feature will depends on libdnnl.so.3, you can install it by running scripts/install_deps.sh.

Runtime parameter: if you want use AMX feature, you need set ENV parameter "DNNL_ENABLE=1" at first, otherwise the AMX feature will not work.

mergify · 2024-04-28T09:13:49Z

@mellonyou 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

If you're fixing a bug, label it as kind/bug.
For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

Signed-off-by: Fangzheng Zhang <[email protected]>

mellonyou · 2024-05-06T02:48:20Z

issue: #541

mellonyou · 2024-05-06T03:03:02Z

I can't edit the labels, need any access permissions?

liliu-z · 2024-05-06T03:17:59Z

/kind enhancement

codecov · 2024-05-06T04:16:06Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.59%. Comparing base (3c46f4c) to head (7b6f49a).
Report is 179 commits behind head on main.

❗ Current head 7b6f49a differs from pull request most recent head ff5c7cd

Please upload reports for the commit ff5c7cd to get more accurate results.

Additional details and impacted files

@@            Coverage Diff            @@
##           main     #535       +/-   ##
=========================================
+ Coverage      0   71.59%   +71.59%     
=========================================
  Files         0       67       +67     
  Lines         0     4446     +4446     
=========================================
+ Hits          0     3183     +3183     
- Misses        0     1263     +1263

see 67 files with indirect coverage changes

liliu-z · 2024-05-06T07:52:01Z

thirdparty/faiss/faiss/utils/onednn_utils.h

+            BaseData::getState().store(BASE_DATA_STATE::MODIFIED);
+    }
+
+    void execut(float** out_f32) {


nit: execute?

yes, it's a typo

liliu-z · 2024-05-06T08:05:39Z

thirdparty/faiss/faiss/utils/onednn_utils.h

+        // inner memory bf16
+        bf16_md1 = dnnl::memory::desc({xrow, xcol}, dnnl::memory::data_type::bf16, dnnl::memory::format_tag::any);
+        bf16_md2 = dnnl::memory::desc({yrow, ycol}, dnnl::memory::data_type::bf16, dnnl::memory::format_tag::any);


Noob Q, why we use bf16 here?

Because AMX can native support for bf16/int8 compute, which can significantly improve performance, and we have done the test, it have little impact on accuracy.

liliu-z · 2024-05-06T08:19:28Z

thirdparty/faiss/faiss/utils/onednn_utils.h

+        BASE_DATA_STATE expected = BASE_DATA_STATE::MODIFIED;
+
+        if (BaseData::getState().compare_exchange_strong(expected, BASE_DATA_STATE::PREPARE)) {
+            pthread_rwlock_wrlock(&rwlock);


Noob Q, why we need to lock this. Is that because we only have only AMX instruction can run at a time?

The lock is designed for multi-thread scenario, if two threads operate on the same base dataset with different query dataset, the lock prevent the base dataset from being modified by the other thread while working on it.

liliu-z · 2024-05-06T08:23:59Z

thirdparty/faiss/faiss/utils/onednn_utils.h

+        dnnl::reorder(f32_mem1, bf16_mem1).execute(engine_stream, f32_mem1, bf16_mem1);
+        BASE_DATA_STATE expected = BASE_DATA_STATE::MODIFIED;
+
+        if (BaseData::getState().compare_exchange_strong(expected, BASE_DATA_STATE::PREPARE)) {


Plz CMIIW. In the first call, expected will be BASE_DATA_STATE::MODIFIED and changed into BASE_DATA_STATE::PREPARE in this line and return false. Then it will loop in line 196 forever

The state is also designed for multi-thread scenario, the state change is INIT->MODIFIED -> PREPARE -> READY. When the first thread have finished the initialization, the other thread will get the state is READY, and then skip line 196.

liliu-z · 2024-05-06T08:28:10Z

thirdparty/faiss/faiss/utils/distances.cpp

+    if (is_dnnl_enabled()) {
+        float *res_arr = NULL;
+
+        comput_f32bf16f32_inner_product(nx, d, ny, d, const_cast<float*>(x), const_cast<float*>(y), &res_arr);


Can we implement a dynamic hook like all other simd in Knowhere?

We have also considered following the other simd interface, but due to the implementation of AMX, it may be a bit incompatible with the current interface:

AMX prefers batch data calculation, and it's library will schedule multiple threads on its own.

The return value is a array for batch data operation.
So if we use dynamic hook, maybe need add new interface for batch data operation, and call the new interface when AMX is available.

@liliu-z We are planning to port code to adapt dynamic hook, do you have any other suggestions?

alexanderguzhva · 2024-05-06T11:32:56Z

thirdparty/faiss/faiss/utils/distances.cpp

@@ -211,30 +214,59 @@ void exhaustive_inner_product_seq_impl(
    using SingleResultHandler = typename BlockResultHandler::SingleResultHandler;
    int nt = std::min(int(nx), omp_get_max_threads());

+#ifdef FAISS_WITH_DNNL


the problem here is that this code is inserted into the function that computes inner products according to a filter. So, if the filter filters out 90% of samples, then 9 out of 10 computed distances will not be used, costing quite an extra memory bandwidth.
Benchmarks are needed for this PR.

@alexanderguzhva The filter is inside Knowhere or in the Milvus?

@xtangxtang an external filter (in the form of bitset), provided from Milvus

…k interface.

…nednn. Signed-off-by: Eric Zhang <[email protected]>

mellonyou · 2024-06-05T03:11:33Z

Add searchwithbuf and rangesearch interface implementation with AMX onednn. And will submit the related build config into milvus later.

mellonyou · 2024-06-18T03:40:35Z

I am trying to do a manual filter with multithread before AMX IP.
@liliu-z @alexanderguzhva @godchen0212 Do you have any other opinions on the current interface implementation.

…mx_ip

sre-ci-robot · 2024-07-01T08:12:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mellonyou
To complete the pull request process, please assign zhengbuqian after the PR has been reviewed.
You can assign the PR to them by writing /assign @zhengbuqian in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot · 2024-07-01T08:12:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mellonyou
To complete the pull request process, please assign zhengbuqian after the PR has been reviewed.
You can assign the PR to them by writing /assign @zhengbuqian in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mellonyou · 2024-07-01T08:20:15Z

Have tried to do manual filter with multithread before AMX IP, and it have a significant impact on performance. So we only filter the results to ensure their accuracy, which has a relatively small impact on performance.

mellonyou added 2 commits April 26, 2024 16:43

Add AMX support to speed up Faiss Inner-Product

a28d310

Merge branch 'zilliztech:main' into main

5d2afb3

sre-ci-robot requested review from cqy123456 and Presburger April 28, 2024 09:13

sre-ci-robot added the size/L label Apr 28, 2024

mergify bot added the dco-passed label Apr 28, 2024

mergify bot added the do-not-merge/missing-related-issue label Apr 28, 2024

Add AMX support to speed up Faiss Inner-Product

7b6f49a

Signed-off-by: Fangzheng Zhang <[email protected]>

mellonyou marked this pull request as draft May 6, 2024 02:16

sre-ci-robot added the do-not-merge/work-in-progress label May 6, 2024

mellonyou marked this pull request as ready for review May 6, 2024 02:58

sre-ci-robot removed the do-not-merge/work-in-progress label May 6, 2024

sre-ci-robot added the kind/enhancement label May 6, 2024

mergify bot added the ci-passed label May 6, 2024

liliu-z reviewed May 6, 2024

View reviewed changes

alexanderguzhva reviewed May 9, 2024

View reviewed changes

mellonyou added 3 commits May 15, 2024 15:46

Port the onednn code to knowhere, and modify it to follow dynamic hoo…

64a8804

…k interface.

Merge branch 'zilliztech:main' into main

71fd0cf

Merge branch 'zilliztech:main' into amx_ip

420b8c2

mergify bot removed the ci-passed label May 15, 2024

Merge branch 'main' into amx_ip

64bd1b9

sre-ci-robot added size/XL and removed size/L labels May 15, 2024

mergify bot added needs-dco and removed dco-passed labels May 15, 2024

mellonyou added 2 commits June 5, 2024 10:37

Merge branch 'zilliztech:main' into amx_ip

62eba85

Add searchwithbuf and rangesearch interface implementation with AMX o…

417601b

…nednn. Signed-off-by: Eric Zhang <[email protected]>

sre-ci-robot added size/XL and removed size/L labels Jun 5, 2024

This was referenced Jun 5, 2024

enhance: Add WITH_DNNL build config for knowhere. milvus-io/milvus#33628

Closed

enhance: Add WITH_DNNL build config for knowhere. milvus-io/milvus#33630

Closed

Add result filter after AMX Inner Product.

76cc32d

mellonyou closed this Jul 1, 2024

mellonyou force-pushed the amx_ip branch from b420761 to f0c16f4 Compare July 1, 2024 06:49

sre-ci-robot added size/XS and removed size/XL labels Jul 1, 2024

mergify bot added dco-passed and removed needs-dco labels Jul 1, 2024

Merge branch 'amx_ip' of https://github.com/mellonyou/knowhere into a…

ff5c7cd

…mx_ip

mellonyou reopened this Jul 1, 2024

sre-ci-robot added size/XL and removed size/XS labels Jul 1, 2024

mergify bot added needs-dco and removed dco-passed labels Jul 1, 2024

github-actions bot added stale and removed stale labels Aug 1, 2024

github-actions bot added the stale label Sep 1, 2024

github-actions bot closed this Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMX support to speed up Faiss Inner-Product #535

Add AMX support to speed up Faiss Inner-Product #535

mellonyou commented Apr 28, 2024

mergify bot commented Apr 28, 2024

mellonyou commented May 6, 2024

mellonyou commented May 6, 2024

liliu-z commented May 6, 2024

codecov bot commented May 6, 2024 •

edited

Loading

liliu-z May 6, 2024

mellonyou May 7, 2024

liliu-z May 6, 2024

mellonyou May 7, 2024

liliu-z May 6, 2024

mellonyou May 7, 2024

liliu-z May 6, 2024

mellonyou May 7, 2024

liliu-z May 6, 2024

mellonyou May 7, 2024

mellonyou May 9, 2024

alexanderguzhva May 6, 2024

xtangxtang May 10, 2024

alexanderguzhva May 10, 2024

mellonyou commented Jun 5, 2024

mellonyou commented Jun 18, 2024

sre-ci-robot commented Jul 1, 2024

sre-ci-robot commented Jul 1, 2024

mellonyou commented Jul 1, 2024

Add AMX support to speed up Faiss Inner-Product #535

Add AMX support to speed up Faiss Inner-Product #535

Conversation

mellonyou commented Apr 28, 2024

mergify bot commented Apr 28, 2024

mellonyou commented May 6, 2024

mellonyou commented May 6, 2024

liliu-z commented May 6, 2024

codecov bot commented May 6, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mellonyou commented Jun 5, 2024

mellonyou commented Jun 18, 2024

sre-ci-robot commented Jul 1, 2024

sre-ci-robot commented Jul 1, 2024

mellonyou commented Jul 1, 2024

codecov bot commented May 6, 2024 •

edited

Loading