Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some confusions about AUCM loss #67

Closed
RickeyBorges opened this issue Jan 3, 2025 · 1 comment
Closed

Some confusions about AUCM loss #67

RickeyBorges opened this issue Jan 3, 2025 · 1 comment

Comments

@RickeyBorges
Copy link

Dear developers,

After learning about your work, I have the following confusions:

  1. Why do we need to convert the square loss into an SPP problem? What problems will arise if we directly minimize $E[f(x_i)-a]^2+E[f(x_j)-b]^2+2*(m+b-a)^2$?

loss = self.mean((y_pred - self.a)**2*pos_mask) + self.mean((y_pred - self.b)**2*neg_mask) + (self.margin + self.mean(y_pred*neg_mask) - self.mean(y_pred*pos_mask))**2

  1. I noticed that in your code, $a$ in $E[f(x_i)-a]^2$ and $b$ in $E[f(x_j)-b]^2$ directly use self.a and self.b, while $a$ and $b$ in $(m+b-a)$ use the sample mean self.mean(y_pred*neg_mask) - self.mean(y_pred*pos_mask). I would like to know the reason for that.

loss = self.mean((y_pred - self.a)**2*pos_mask) + self.mean((y_pred - self.b)**2*neg_mask) + \ 2*self.alpha*(self.margin + self.mean(y_pred*neg_mask) - self.mean(y_pred*pos_mask)) - self.alpha**2

  1. Can I regard that the design of margin loss is to transform the square loss $(m+f(x_j)-f(x_i))^2$ into $max[0, (m+f(x_j)-f(x_i))]^2$ to allow $m+f(x_j)$ to be equal or less than $f(x_i)$ while the square loss only seeks to be equal to $f(x_i)$? When the loss function has a value of 0, is there a potential problem that the gradient cannot be updated?

  2. In the demo you provided, the AUC test score of AUCM based on PESG on CIFAR10 can reach 0.9245, while the value quoted in your paper Large-scale Robust Deep AUC Maximization is 0.715±0.008. Is this because some content has been updated?

I would be grateful if you could reply as soon as possible. Wish you a happy new year.

@optmai
Copy link
Collaborator

optmai commented Jan 7, 2025

Thank you for your interest in our library. Let me answer your questions below.

  1. You cannot simply do that because it does not provide guarantee for convergence. Because we use mini-batch to estimate the gradient, the loss you defined here loss = self.mean((y_pred - self.a)**2pos_mask) + self.mean((y_pred - self.b)**2neg_mask) + (self.margin + self.mean(y_predneg_mask) - self.mean(y_predpos_mask))**2 will only use the data in the mini-batch. The gradient of this mini-batch loss is not an unbiased estimator for the true gradient. Converting this into a SPP will help address this issue. Additionally, the SPP allows more efficient optimization in online learning and distributed learning.

  2. Indeed, a, b are mean scores of positive and negative samples. We make them free variable in the first two terms so that their optimal solution are indeed the mean scores. We cannot make a, b as free variables in the last term because the resulting solution will not be the mean scores any more.

  3. I do not think you can explain in that way. It is not equivalent to the SPP.

  4. I think it is because that we make the data more imbalanced in the paper.

Please let us know if you have any additional questions.

@optmai optmai closed this as completed Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants