About Llama-X and Alpaca repo #20

haorannlp · 2023-05-08T08:38:13Z

Hi, may I know why the hyperparameters of the training command in Llama-x (this repo) and Alpaca are different. Eg., the batch size 128 vs. 512 (64*8), the warmup steps 0.03 (ratio) vs. 2.
Which hyperparameter should we adopt?

Another question is what is the Llama-i (7B) in the Llama-X Evaluation section? And the GSM8K result is 18.8% while my own LLAMA-X model (using the hyperparamters in this repo) is only 10%. Not sure why the gap is so large. Would you mind sharing your evaluation script on GSM8K in Llama-X? Thank you.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Llama-X and Alpaca repo #20

About Llama-X and Alpaca repo #20

haorannlp commented May 8, 2023 •

edited

Loading

About Llama-X and Alpaca repo #20

About Llama-X and Alpaca repo #20

Comments

haorannlp commented May 8, 2023 • edited Loading

haorannlp commented May 8, 2023 •

edited

Loading