[API] Export MLP LLaMA api. #64

changqi1 · 2023-11-20T15:19:32Z

$ cmake -DXFT_BUILD_TESTS=ON ../
$ make -j
$ numactl -N 1 -m 3  ./ut/layers_mlp_test
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from MLPLLaMA
[ RUN      ] MLPLLaMA.bfloat16_t
create llama_mlp_key: 4096_11008_0x7f3977604040_0x7f396ca03040_0x7f3961e02040
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.088813 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.006947 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.001165 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.001286 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.036273 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.006276 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.001244 sec
[       OK ] MLPLLaMA.bfloat16_t (6935 ms)
[----------] 1 test from MLPLLaMA (6935 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6935 ms total)
[  PASSED  ] 1 test.

pujiang2018 · 2023-11-23T12:47:10Z

src/layers/mlp_llama.cpp

+void invokeMLPLLaMA(DataType dt, int numTokens, int hiddenSize, int intermediateSize, void *output, int outputStride,
+        const void *input, int inputStride, const void *gateWeight, const void *upWeight, const void *downWeight) {
+    static std::mutex mutex;
+    std::lock_guard<std::mutex> lock(mutex);


Why we need a lock here? to protect the unordered_map access?
If so, suggest narrowing down the scope.

narrowing down in forward.

pujiang2018 · 2023-11-23T12:48:19Z

src/layers/mlp_llama.cpp

+        // create hash key
+        std::stringstream weights_addr;
+        weights_addr << gateWeight << "_" << upWeight << "_" << downWeight;
+        std::string llama_mlp_key


I think the address is enough, do we really need to add the size info in the key?

pujiang2018 · 2023-11-23T12:50:47Z

src/layers/mlp_llama.cpp

+        auto it_created = llama_mlp_hub.find(llama_mlp_key);
+        if (it_created == llama_mlp_hub.end()) {
+            // LlamaMLP<bfloat16_t> &llama_mlp = LlamaMLP<bfloat16_t>::getInstance();
+            ctx = new DecoderContext(1, hiddenSize, 1, 1, intermediateSize, "silu", 1e-6, 0, 0, 0, 0, 0, 1);


how do we reuse the context? to make sure better performance, we need to reuse the context for every layers.

pujiang2018 · 2023-11-25T14:22:56Z

src/layers/mlp_llama.cpp

+        }
+
+        // Unsupport different type model serving simultaneously because of same DecoderContext
+        std::lock_guard<std::mutex> lock(mutex);


since now we are using a static variable: "static DecoderContext *ctx;"
If multi-threads calls into invokeMLPLLaMA, then potentially they together modify the value of 'ctx', or use the same intermediate buffer at the same time, thus make problem.
So, this time, I think we need to expand the lock scope, :)

OK, expanded the lock scope.

changqi1 added 6 commits November 20, 2023 23:37

[API] Export MLP LLaMA api.

674464d

[API] Export MLP LLaMA api.

fd7aaf2

[API] Export MLP LLaMA api.

af05220

[API] Export MLP LLaMA api.

d8e18ed

[API] Export MLP LLaMA api.

75928a6

[API] Export MLP LLaMA api.

fd1cbac

changqi1 marked this pull request as draft November 21, 2023 01:42

[API] Add MLP LLaMA unit test.

8ce973b

changqi1 force-pushed the changqing/feature/export_mlp_llama branch from 6bb26c1 to 8ce973b Compare November 21, 2023 05:11

changqi1 added 2 commits November 21, 2023 13:16

[API] Update MLP LLaMA unit test.

d82f09e

[API] Update MLP LLaMA unit test.

ccf8f37

changqi1 marked this pull request as ready for review November 21, 2023 05:21

changqi1 requested a review from pujiang2018 November 21, 2023 05:23

pujiang2018 reviewed Nov 23, 2023

View reviewed changes

Optimize context perf.

516986c

pujiang2018 requested changes Nov 25, 2023

View reviewed changes

[API] Expand lock scope.

31de85a

pujiang2018 approved these changes Nov 26, 2023

View reviewed changes

pujiang2018 merged commit b5a5fc0 into intel:main Nov 26, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API] Export MLP LLaMA api. #64

[API] Export MLP LLaMA api. #64

changqi1 commented Nov 20, 2023 •

edited

Loading

pujiang2018 Nov 23, 2023

changqi1 Nov 25, 2023

pujiang2018 Nov 23, 2023

changqi1 Nov 25, 2023

pujiang2018 Nov 23, 2023

changqi1 Nov 25, 2023

pujiang2018 Nov 25, 2023

changqi1 Nov 26, 2023

[API] Export MLP LLaMA api. #64

[API] Export MLP LLaMA api. #64

Conversation

changqi1 commented Nov 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

changqi1 commented Nov 20, 2023 •

edited

Loading