Some confusions about AUCM loss #67

RickeyBorges · 2025-01-03T08:32:22Z

Dear developers,

After learning about your work, I have the following confusions:

Why do we need to convert the square loss into an SPP problem? What problems will arise if we directly minimize $E[f(x_i)-a]^2+E[f(x_j)-b]^2+2*(m+b-a)^2$?

loss = self.mean((y_pred - self.a)**2*pos_mask) + self.mean((y_pred - self.b)**2*neg_mask) + (self.margin + self.mean(y_pred*neg_mask) - self.mean(y_pred*pos_mask))**2

I noticed that in your code, $a$ in $E[f(x_i)-a]^2$ and $b$ in $E[f(x_j)-b]^2$ directly use self.a and self.b, while $a$ and $b$ in $(m+b-a)$ use the sample mean self.mean(y_pred*neg_mask) - self.mean(y_pred*pos_mask). I would like to know the reason for that.

loss = self.mean((y_pred - self.a)**2*pos_mask) + self.mean((y_pred - self.b)**2*neg_mask) + \ 2*self.alpha*(self.margin + self.mean(y_pred*neg_mask) - self.mean(y_pred*pos_mask)) - self.alpha**2

Can I regard that the design of margin loss is to transform the square loss $(m+f(x_j)-f(x_i))^2$ into $max[0, (m+f(x_j)-f(x_i))]^2$ to allow $m+f(x_j)$ to be equal or less than $f(x_i)$ while the square loss only seeks to be equal to $f(x_i)$? When the loss function has a value of 0, is there a potential problem that the gradient cannot be updated?
In the demo you provided, the AUC test score of AUCM based on PESG on CIFAR10 can reach 0.9245, while the value quoted in your paper Large-scale Robust Deep AUC Maximization is 0.715±0.008. Is this because some content has been updated?

I would be grateful if you could reply as soon as possible. Wish you a happy new year.

The text was updated successfully, but these errors were encountered:

optmai · 2025-01-07T16:17:55Z

Thank you for your interest in our library. Let me answer your questions below.

You cannot simply do that because it does not provide guarantee for convergence. Because we use mini-batch to estimate the gradient, the loss you defined here loss = self.mean((y_pred - self.a)**2pos_mask) + self.mean((y_pred - self.b)**2neg_mask) + (self.margin + self.mean(y_predneg_mask) - self.mean(y_predpos_mask))**2 will only use the data in the mini-batch. The gradient of this mini-batch loss is not an unbiased estimator for the true gradient. Converting this into a SPP will help address this issue. Additionally, the SPP allows more efficient optimization in online learning and distributed learning.
Indeed, a, b are mean scores of positive and negative samples. We make them free variable in the first two terms so that their optimal solution are indeed the mean scores. We cannot make a, b as free variables in the last term because the resulting solution will not be the mean scores any more.
I do not think you can explain in that way. It is not equivalent to the SPP.
I think it is because that we make the data more imbalanced in the paper.

Please let us know if you have any additional questions.

optmai closed this as completed Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some confusions about AUCM loss #67

Some confusions about AUCM loss #67

RickeyBorges commented Jan 3, 2025

optmai commented Jan 7, 2025

Some confusions about AUCM loss #67

Some confusions about AUCM loss #67

Comments

RickeyBorges commented Jan 3, 2025

optmai commented Jan 7, 2025