-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cann: add Ascend NPU support #2336
Conversation
ggml/include/ggml.h
Outdated
abort(); \ | ||
} \ | ||
} while (0) | ||
#define GGML_ABORT(...) ggml_abort(__FILE__, __LINE__, __VA_ARGS__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe the reason why modify this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the refactor of GGML_ASSERT
and the addition of GGML_ABORT
in ggerganov/llama.cpp#8698.
GGML_ABORT
is ultilized in CANN related code to abort the process with a message.
@@ -760,7 +762,7 @@ struct test_dup : public test_case { | |||
} | |||
|
|||
test_dup(ggml_type type = GGML_TYPE_F32, | |||
std::array<int64_t, 4> ne = {10, 10, 10, 1}, | |||
std::array<int64_t, 4> ne = {10, 10, 20, 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure thers's no existing scenes are missed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This modification causes ne
and nb
to change after permute, thus covering test_dup
of non-continuous data. Keeping same with https://github.com/ggerganov/llama.cpp/blob/master/tests/test-backend-ops.cpp#L770
@ggerganov Could you please view this PR. The main code comes from Ascend NPU implementation in llama.cpp. |
Thanks for the PR. I need to first sync the latest ggml repository into |
Thanks! That‘s great! Please @ me after the sync of ggml is done, and I'll rebase the commit then. |
@MengqingCao The sync is now done - please update as necessary |
* enable Ascend NPU in src/whisper.cpp * sync test-backend-ops with llama.cpp
54f8c4c
to
f66b806
Compare
Hi @ggerganov, thanks for your work! This PR is updated now, please review it. BTW, I sync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Consider adding instructions to the readme for using this backend in follow-up PRs
Sure. We will. |
* ggerganov/master: (118 commits) cann : add Ascend NPU support (ggerganov#2336) whisper : fix compile warning (#0) sync : ggml ggml : add CANN backend (llama/0) scripts : sync cann ci : disable ruby workflow (#0) ci : try to fix FreeBSD (#0) build : fix aarch64 (#0) talk-llama : sync llama.cpp sync : ggml ggml-backend : fix async copy from CPU (llama/8897) Updated SYCL device filtering (llama/8901) CUDA/HIP: fix tests/test-backend-ops (llama/8896) CUDA: fix padding logic for FP16/FP32 (llama/8884) ggml : add epsilon as a parameter for group_norm (llama/8818) ggml : fix overflows in elu function (llama/8866) ggml : reading the runtime sve config of the cpu (llama/8709) Fix conversion of unnormalized BF16->BF16 weights (llama/7843) Fixing wrong VDR iq4nl value (llama/8812) ggml-cuda: Adding support for unified memory (llama/8035) ...
* master: (119 commits) cann : add Ascend NPU support (ggerganov#2336) whisper : fix compile warning (#0) sync : ggml ggml : add CANN backend (llama/0) scripts : sync cann ci : disable ruby workflow (#0) ci : try to fix FreeBSD (#0) build : fix aarch64 (#0) talk-llama : sync llama.cpp sync : ggml ggml-backend : fix async copy from CPU (llama/8897) Updated SYCL device filtering (llama/8901) CUDA/HIP: fix tests/test-backend-ops (llama/8896) CUDA: fix padding logic for FP16/FP32 (llama/8884) ggml : add epsilon as a parameter for group_norm (llama/8818) ggml : fix overflows in elu function (llama/8866) ggml : reading the runtime sve config of the cpu (llama/8709) Fix conversion of unnormalized BF16->BF16 weights (llama/7843) Fixing wrong VDR iq4nl value (llama/8812) ggml-cuda: Adding support for unified memory (llama/8035) ...
Great work. I have a question, does it support the Ascend 310P3 chip now? @MengqingCao @hipudding |
Not support 310 now. But I think it's easy to support 310p by making some small change, If you are interest in this project, Please open a Pull Request. |
I attempted to run on the 310P3 chip and encountered an issue, involving error messages. I've opened a new issue, could you help me identify where the problem lies? #2372 |
* enable Ascend NPU in src/whisper.cpp * sync test-backend-ops with llama.cpp
* enable Ascend NPU in src/whisper.cpp * sync test-backend-ops with llama.cpp
This PR enables users to leverage the Ascend NPU for inferencing whisper model on
whisper.cpp
.Mainly changes
llama.cpp
. CANN related codes inllama.cpp
are migrated to this project by this PR.src/whisper.cpp
,ggml/CMakeLists.txt
, etc.Build with CANN
Using the following command to build
whisper.cpp
with CANN:ASR Inference
Inference test on whisper base model (ggml-base.en.bin downloaded at https://huggingface.co/ggerganov/whisper.cpp/tree/main):
Inference result:
ASR inferece for longer speech (https://upload.wikimedia.org/wikipedia/en/d/d4/En.henryfphillips.ogg):