Habana-LLM-Viewer is a tool that provides Roofline model, LLM performance prediction and memory analysis for Intel Gaudi platform. Inspired by LLM-Viewer, Habana-LLM-Viewer can be used to estimate performance of models such as Llama2-13B, Qwen-7B, Mixtral-8x7B on Intel Gaudi platform.
- Simpily run with habana_viewer.py and the results will show up on localhost.
python habana_viewer.py
- Simpily run with run_model_projection.py and the results will be saved to folder "data/model".
python run_model_projection.py \ --device IntelGaudi2 \ --device-type B \ --model Llama2-7B \ --data-type BF16 \ --batch-size BATCH_SIZE \ --context-input CONTEXT_INPUT \ --context-output CONTEXT_OUTPUT \ --kvcache-bucket 256 \ --vec-bmm
Model Name | Projected Data |
---|---|
Llama2-7B | Link |
Llama2-13B | Link |
Llama3-8B | Link |
Qwen-7B | Link |
Qwen-14B | Link |
Mixtral-8x7B | Link |
- Simpily run with run_op_projection.py and the results will be saved to folder "data/operation", same with model projection, one can modify proj_cfg in main.
python run_op_projection.py \ --device IntelGaudi2 \ --device-type B \ --op Matmul \ --data-type BF16 \ --m-list m1 m2 ... \ --n-list n1 n2 ... \ --k-list k1 k2 ...
Op Name | Projected Data |
---|---|
Matmul | Link |
- Currently only cover single card perf projection, will support multi-card / multi-node.
- Will cover more models / operations.