-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPU Baichuan2 Multi- Process example #11928
Conversation
parser.add_argument( | ||
"--repo-id-or-model-path", | ||
type=str, | ||
default="meta-llama/Llama-2-7b-chat-hf", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify to baichuan
help='Prompt to infer') | ||
parser.add_argument("--n-predict", type=int, default=32, help="Max tokens to predict") | ||
parser.add_argument("--max-output-len", type=int, default=1024) | ||
parser.add_argument("--max-prompt-len", type=int, default=768) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max-prompt-len
is better to be 512 by default.
trust_remote_code=True, | ||
attn_implementation="eager", | ||
load_in_low_bit="sym_int4", | ||
enable_mp=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have just updated API, please change enable_mp
to optimize_model
@@ -108,3 +108,25 @@ def optimize_llm( | |||
prefill_runner=prefill_runner, decode_runner=decode_runner | |||
) | |||
convert_forward(model, module.MiniCPMModel, minicpm_model_forward) | |||
elif model.config.model_type == "baichuan": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we need to strict the check to avoid apply optimization on baichuan-13b.
Merge it first as initial support and will open another PR to fix. |
NPU Baichuan2 Multi- Process example.
How to test?
1234
). And paste your action link here once it has been successfully finished.