Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose validation loss calculation to imporve accuracy by reducing floating-point errors #19

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aakashapoorv
Copy link

Propose shift from the current approach, where loss is normalized immediately for each batch:

$( \text{Average Loss} = \sum(\text{Normalized Losses}) $)

to a cumulative method followed by a single normalization step:

$( \text{Average Loss} = \frac{\text{Total Accumulated Loss}}{\text{Number of Batches}} $)

This aims to reduce floating-point errors and increase the accuracy of the reported validation loss.

Changes Made

Aspect Old Approach New Approach Reason for Change
Calculation Loss divided by number of batches before accumulating. Accumulate all losses, then divide by batch count. Reduces rounding errors and floating-point imprecisions.
Precision Potential for early precision loss due to division. Division occurs only once, preserving precision. Enhances the reliability of loss metrics.
Error Potential Higher, due to repeated operations on each batch. Lower, with fewer operations on critical data. Minimizes the accumulation of computational errors.

@aakashapoorv aakashapoorv changed the title Propose validation loss calculation to accumulate before normalizing Propose validation loss calculation to imporve accuracy by reducing floating-point errors Jun 13, 2024
@karpathy
Copy link
Owner

You're not wrong... I did it mostly that way because I thought it was cognitively simpler to understand. Possibly it wasn't a great idea. I'll think through. At this scale of the project it probably doesn't actually make a difference?

@aakashapoorv
Copy link
Author

I agree 😊, simplicity has its appeal. At this scale, it might make only a small difference, if any. I had a confusion though... if someone modifies things to scale up, then this approach may help reduce floating-point errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants