Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API] Export MLP LLaMA api. #64

Merged
merged 11 commits into from
Nov 26, 2023

Conversation

changqi1
Copy link
Contributor

@changqi1 changqi1 commented Nov 20, 2023

$ cmake -DXFT_BUILD_TESTS=ON ../
$ make -j
$ numactl -N 1 -m 3  ./ut/layers_mlp_test
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from MLPLLaMA
[ RUN      ] MLPLLaMA.bfloat16_t
create llama_mlp_key: 4096_11008_0x7f3977604040_0x7f396ca03040_0x7f3961e02040
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.088813 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.006947 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.001165 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.001286 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.036273 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.006276 sec
[ RUNTIME  ] XFT::invokeMLPLLaMA 0.001244 sec
[       OK ] MLPLLaMA.bfloat16_t (6935 ms)
[----------] 1 test from MLPLLaMA (6935 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6935 ms total)
[  PASSED  ] 1 test.

@changqi1 changqi1 marked this pull request as draft November 21, 2023 01:42
@changqi1 changqi1 force-pushed the changqing/feature/export_mlp_llama branch from 6bb26c1 to 8ce973b Compare November 21, 2023 05:11
@changqi1 changqi1 marked this pull request as ready for review November 21, 2023 05:21
void invokeMLPLLaMA(DataType dt, int numTokens, int hiddenSize, int intermediateSize, void *output, int outputStride,
const void *input, int inputStride, const void *gateWeight, const void *upWeight, const void *downWeight) {
static std::mutex mutex;
std::lock_guard<std::mutex> lock(mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need a lock here? to protect the unordered_map access?
If so, suggest narrowing down the scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

narrowing down in forward.

// create hash key
std::stringstream weights_addr;
weights_addr << gateWeight << "_" << upWeight << "_" << downWeight;
std::string llama_mlp_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the address is enough, do we really need to add the size info in the key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto it_created = llama_mlp_hub.find(llama_mlp_key);
if (it_created == llama_mlp_hub.end()) {
// LlamaMLP<bfloat16_t> &llama_mlp = LlamaMLP<bfloat16_t>::getInstance();
ctx = new DecoderContext(1, hiddenSize, 1, 1, intermediateSize, "silu", 1e-6, 0, 0, 0, 0, 0, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we reuse the context? to make sure better performance, we need to reuse the context for every layers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reused.

}

// Unsupport different type model serving simultaneously because of same DecoderContext
std::lock_guard<std::mutex> lock(mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since now we are using a static variable: "static DecoderContext *ctx;"
If multi-threads calls into invokeMLPLLaMA, then potentially they together modify the value of 'ctx', or use the same intermediate buffer at the same time, thus make problem.
So, this time, I think we need to expand the lock scope, :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, expanded the lock scope.

@pujiang2018 pujiang2018 merged commit b5a5fc0 into intel:main Nov 26, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants