Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: gemma-2-9b-it inference speed very slow 1.73 tokens per second #9906

Open
ninth99 opened this issue Oct 16, 2024 · 0 comments
Open

Bug: gemma-2-9b-it inference speed very slow 1.73 tokens per second #9906

ninth99 opened this issue Oct 16, 2024 · 0 comments
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)

Comments

@ninth99
Copy link

ninth99 commented Oct 16, 2024

What happened?

System Info
Device: Ascend 910B3
OS: Ubuntu 20.04.6 LTS
Arch: aarch64

command:
./build/bin/llama-cli -m ./models/gemma-2-9b-it.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0

log:
llama_perf_sampler_print: sampling time = 152.66 ms / 414 runs ( 0.37 ms per token, 2711.82 tokens per second)
llama_perf_context_print: load time = 7152.40 ms
llama_perf_context_print: prompt eval time = 619.28 ms / 14 tokens ( 44.23 ms per token, 22.61 tokens per second)
llama_perf_context_print: eval time = 230735.63 ms / 399 runs ( 578.28 ms per token, 1.73 tokens per second)
llama_perf_context_print: total time = 231911.05 ms / 413 tokens

Name and Version

./build/bin/llama-cli --version
version: 3923 (becfd38)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for aarch64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

The CPU usage is very high, while the NPU usage is low, suggesting that the NPU is not being utilized during inference.
@ninth99 ninth99 added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)
Projects
None yet
Development

No branches or pull requests

1 participant