Skip to content

jychen21/Habana-LLM-Viewer

Repository files navigation

Habana-LLM-Viewer

Habana-LLM-Viewer is a tool that provides Roofline model, LLM performance prediction and memory analysis for Intel Gaudi platform. Inspired by LLM-Viewer, Habana-LLM-Viewer can be used to estimate performance of models such as Llama2-13B, Qwen-7B, Mixtral-8x7B on Intel Gaudi platform.

dashboard graph

dashboard table

dashboard memory

Model Projection

Command

  1. Simpily run with habana_viewer.py and the results will show up on localhost.
    python habana_viewer.py
  2. Simpily run with run_model_projection.py and the results will be saved to folder "data/model".
    python run_model_projection.py \
    --device IntelGaudi2 \
    --device-type B \
    --model Llama2-7B \
    --data-type BF16 \
    --batch-size BATCH_SIZE \
    --context-input CONTEXT_INPUT \
    --context-output CONTEXT_OUTPUT \
    --kvcache-bucket 256 \
    --vec-bmm

Example

Model Name Projected Data
Llama2-7B Link
Llama2-13B Link
Llama3-8B Link
Qwen-7B Link
Qwen-14B Link
Mixtral-8x7B Link

Operation Projection

Command

  1. Simpily run with run_op_projection.py and the results will be saved to folder "data/operation", same with model projection, one can modify proj_cfg in main.
    python run_op_projection.py \
    --device IntelGaudi2 \
    --device-type B \
    --op Matmul \
    --data-type BF16 \
    --m-list m1 m2 ... \
    --n-list n1 n2 ... \
    --k-list k1 k2 ...

Example

Op Name Projected Data
Matmul Link

Todo

  1. Currently only cover single card perf projection, will support multi-card / multi-node.
  2. Will cover more models / operations.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published