Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "Binary" in "LogisticRegressionBInaryTrainer" Looks Redundant #3016

Closed
wschin opened this issue Mar 19, 2019 · 11 comments
Closed

The "Binary" in "LogisticRegressionBInaryTrainer" Looks Redundant #3016

wschin opened this issue Mar 19, 2019 · 11 comments
Assignees
Labels
API Issues pertaining the friendly API
Milestone

Comments

@wschin
Copy link
Member

wschin commented Mar 19, 2019

As title. Logistic regression is clearly a binary classifier, so we can consider dropping "Binary" among "LogisticRegressionBinaryTrainer". In addition, as L-BFGS is outperformed by SDCA and SymSGD, I feel we should not call this class LogisticRegressionTrainer. A suggested solution is to prepend Lbfgs to this class' name.

@singlis singlis added the API Issues pertaining the friendly API label Mar 19, 2019
@sfilipi
Copy link
Member

sfilipi commented Mar 19, 2019

I would suggest to make an exception and leave the Binary in the name of LR for two reasons:
1- The userbase is not proficient in ML. I think the "regression" on the trainer name might easily confuse people.
2- We have MulticlassLR

@wschin
Copy link
Member Author

wschin commented Mar 20, 2019

I would suggest to make an exception and leave the Binary in the name of LR for two reasons:
1- The userbase is not proficient in ML. I think the "regression" on the trainer name might easily confuse people.

If someone doesn't know LR is a binary classifier, I don't think they can use LogisticRegressionBinaryTrainer properly.

2- We have MulticlassLR

Not anymore. We have LbfgsMaximumEntropyTrainer now.

@rogancarr
Copy link
Contributor

rogancarr commented Mar 20, 2019

So what do we have?

binary
LogisticRegressionBinaryTrainer
multiclass
LogisticRegression
MulticlassLogisticRegression
(yes, two!)
regression
PoissonRegression

First, we should only have one name for multiclass.

Second, I think that LogisticRegression is clearly binary classification. It's one of the most standard names out there. Let's rename LogisticRegressionBinaryTrainer => LogisticRegression.

Third, I'm even okay with LogisticRegression as the name for the multiclass version.

Fourth, I think that PoissonRegression should stay as-is.

@rogancarr
Copy link
Contributor

LbfgsMaximumEntropyTrainer is fine for the multiclass version, but it does seem a bit technical. Are we sure the trainer catalog has this name, and not just the internal classes?

@artidoro
Copy link
Contributor

I would prefer that we stick to the name LogisticRegression.

I also agree with @sfilipi that probably it's a good idea to keep Binary in the end of LogisticRegressionBinaryTrainer so that people don't get confused on whether it is a regression task.

I would settle for LbfgsLogisticRegressionBinaryTrainer. We have longer names in other parts of the code base and this has all the characteristics that we need if I am not wrong.

@eerhardt
Copy link
Member

Adding to Project 13 to track as this is an API breaking request.

@TomFinley
Copy link
Contributor

TomFinley commented Mar 22, 2019

To be honest I am somewhat skeptical about renaming it to omit binary. When I was a new PhD student just learning about machine learning for the first time, I was earnestly confused about the fact the logistic regression had nothing to do with regression (as elsewhere defined), but binary classification. Now, I understand why it is named what it is named now, but that understanding rests on comprehending some fairly non-obvious mathematics. For a while, I remember being genuinely confused.

This was all very long ago, of course -- almost two decades, as depressing as it is to remember now -- and I can't claim that I was much of an ML researcher, but I still remember what it was like when I was a bit of a novice, and we must always keep the novice in mind.

Just to be explicit, I agree with @sfilipi and @artidoro and disagree with @wschin and @rogancarr...

@glebuk
Copy link
Contributor

glebuk commented Mar 25, 2019

Note that if we rename LR to LbfgsLogisticRegression, then perhaps we should consider also SymSGD and SDCA as they also can do LR, I believe.

@wschin
Copy link
Member Author

wschin commented Mar 25, 2019

@TomFinley, I am ok with LbfgsLogisticRegressionBinary the part I want to most is Lbfgs because L-BFGS doesn't look like the default trainer of logistic regression (we should use SDCA or SymSGD).

@glebuk, I will try to rename them and possibly drop Calibrated or NonCalibrated in trainer APIs. But if I feel it causes more confusions, I might skip this.

@rogancarr
Copy link
Contributor

@wschin @abgoswam Would one of you mind putting the results of the offline conversation in this Issue? Then we can close on the PR.

@wschin
Copy link
Member Author

wschin commented Mar 25, 2019

@rogancarr, sure!

API name changes:
LogisticRegression --> LbfgsLogisticRegression
SymbolicSgd --> SymbolicSgdLogisticRegression
LogisticRegression --> LbfgsLogisticRegression
PoissonRegression --> LbfgsPoissonRegression
SdcaCalibrated --> SdcaLogisticRegression
SdcaCalibrated ---> SdcaMaximumEntropy

Trainer classes' name changes:
LbfgsLogisticRegressionTrainer ---> LbfgsLogisticRegressionBinaryTrainer
LbfgsMaximumEntropyTrainer ---> LbfgsMaximumEntropyMulticlassTrainer
SymbolicSgdTrainer --> SymbolicSgdLogisticRegressionBinaryTrainer
SdcaCalibratedBinaryTrainer => SdcaLogisticRegressionBinaryTrainer
SdcaCalibratedMulticlassTrainer => SdcaMaximumEntropyMulticlassTrainer
SdcaMulticlassClassificationTrainerBase => SdcaMulticlassTrainerBase
MetaMulticlassClassificationTrainer => MetaMulticlassTrainer

@shauheen shauheen added this to the 0319 milestone Mar 26, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

No branches or pull requests

9 participants