Skip to content

Commit

Permalink
Remove early stopping from LLaMA end-to-end benchmarking (#20033)
Browse files Browse the repository at this point in the history
### Description
This PR removes early stopping from the end-to-end LLaMA-2 benchmark
script.

### Motivation and Context
This allows models to always generate the requested number of new
tokens.
  • Loading branch information
kunal-vaishnavi authored and rachguo committed Mar 25, 2024
1 parent e6c3d56 commit c9ebded
Showing 1 changed file with 0 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -400,11 +400,7 @@ def main():
sampling_times.append(sampling_end_time - sampling_start_time)

all_token_ids = torch.cat([all_token_ids, tokens_to_add], dim=-1)

# Return early if all batch entries have reached EOS token id
current_length += 1
if torch.all(has_eos) or current_length > max_length:
break

# Update inputs for next inference run
inputs["input_ids"] = tokens_to_add
Expand Down

0 comments on commit c9ebded

Please sign in to comment.