You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thank you for the solid work and for making this code public -- the paper makes some great insights and the code is very clean! I'm using this codebase a reference for implementing a variant of Group DRO, and I had a clarification question on the loss computation.
The Group DRO loss stated in the DoReMi paper is (Eq. 1):
In particular, it looks like the loss is a reweighted average over all samples across all domains rather than a reweighted sum of averages over domain-specific losses.
Could you please clarify if the Group DRO loss stated in Eq. 1 is indeed implemented in the code, or if it is summing domain-specific average? Thank you!
The text was updated successfully, but these errors were encountered:
First, thank you for the solid work and for making this code public -- the paper makes some great insights and the code is very clean! I'm using this codebase a reference for implementing a variant of Group DRO, and I had a clarification question on the loss computation.
The Group DRO loss stated in the DoReMi paper is (Eq. 1):
However, it looks like the code is actually optimizing
In particular, it looks like the loss is a reweighted average over all samples across all domains rather than a reweighted sum of averages over domain-specific losses.
The domain weight update computes the average domain-specific losses here: https://github.com/sangmichaelxie/doremi/blob/7cde52d1848737aa967ecbdb9e643cf334de160d/doremi/trainer.py#L252C22-L252C110
I would expect to see a similar computation for the model parameter updates, but it looks like the code computes the total loss across all domains, reweights it by the domain weights, and then normalizes by a constant
normalizer
(a reweighted average loss over all samples in all domains).https://github.com/sangmichaelxie/doremi/blob/7cde52d1848737aa967ecbdb9e643cf334de160d/doremi/trainer.py#L363C17-L363C89
Could you please clarify if the Group DRO loss stated in Eq. 1 is indeed implemented in the code, or if it is summing domain-specific average? Thank you!
The text was updated successfully, but these errors were encountered: