You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, may I know why the hyperparameters of the training command in Llama-x (this repo) and Alpaca are different. Eg., the batch size 128 vs. 512 (64*8), the warmup steps 0.03 (ratio) vs. 2.
Which hyperparameter should we adopt?
Another question is what is the Llama-i (7B) in the Llama-X Evaluation section? And the GSM8K result is 18.8% while my own LLAMA-X model (using the hyperparamters in this repo) is only 10%. Not sure why the gap is so large. Would you mind sharing your evaluation script on GSM8K in Llama-X? Thank you.
The text was updated successfully, but these errors were encountered:
Hi, may I know why the hyperparameters of the training command in
Llama-x (this repo)
and Alpaca are different. Eg., thebatch size 128 vs. 512 (64*8)
, thewarmup steps 0.03 (ratio) vs. 2
.Which hyperparameter should we adopt?
Another question is what is the Llama-i (7B) in the
Llama-X Evaluation
section? And theGSM8K
result is18.8%
while my own LLAMA-X model (using the hyperparamters in this repo) is only10%
. Not sure why the gap is so large. Would you mind sharing your evaluation script onGSM8K
in Llama-X? Thank you.The text was updated successfully, but these errors were encountered: