Replies: 1 comment 3 replies
-
That's a great observation, and I agree that this could be an issue. However, the data loader is set up such that it pads all sequences to equal length, even for the validation and test loaders: val_dataset = SpamDataset(
csv_file="validation.csv",
max_length=train_dataset.max_length, # <-------
tokenizer=tokenizer
)
test_dataset = SpamDataset(
csv_file="test.csv",
max_length=train_dataset.max_length, # <-------
tokenizer=tokenizer
) So, the -1 token is always in the same position. I've run some experiments without padding (see row 15 here: https://github.com/rasbt/LLMs-from-scratch/tree/main/ch06/02_bonus_additional-experiments) and yes, it can indeed perform better. (This is somewhat analogous to your suggestion.) |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a question about the implementation of and
calc_loss_batch
(applies tocalc_accuracy_loader
as well). I see the following implementation:Could this be potentially be an issue since we're always taking the last token position (-1) regardless of the actual length of the input text. In the current implementation, we might be using the representation of a padding token for classification. Wouldn't something like this be more accurate?
When we are actually making predictions for one sequence, then we always use last token (non-padding). I see this as a mismatch between train and test time. Can someone shed some light on this?
Beta Was this translation helpful? Give feedback.
All reactions