Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeightColumn name defaults to "Weights" in one of the constructors of the estimators #1065

Closed
artidoro opened this issue Sep 26, 2018 · 4 comments
Assignees
Labels
API Issues pertaining the friendly API

Comments

@artidoro
Copy link
Contributor

artidoro commented Sep 26, 2018

Estimators that are instantiated through MAML without specifying a weightcolumn are instead created with default weightcolumn named "Weights".

More details:
The constructor for estimator objects that is used by MAML takes an Arguments object, which has a weight column name that defaults to "Weights". To be precise it defaults to an Optional with implicit value of "Weights".

When we instantiate the estimator through that constructor, we need to pass a SchemaShape.Column object to the base class TrainerEstimatorBase. This column is in most cases constructed with a method that takes the column name as specified in the Arguments object. As the default value in the Arguments object is not null, it is "Weights", this method does not return null, but returns a new SchemaShape.Column named Weights.

The rational:
The new API's rational is as follows:

  1. the user needs to specify a weight column name, if the name is not null we check the presence of the column
  2. if the user does not define a weight column name (it is null), we assume there is no weight column

The old MAML's rational:

  1. same as above
  2. if the user does not specify a weight column name:
    • if there is a column named "Weights" it will be taken as the weights column automatically
    • if there is no column name "Weights", we assume that there is no weight column

What we have to do:
We have to fix the current instantiation of estimators through MAML, and decide if we want to continue using a different behavior for the command line and the C# API.

@artidoro artidoro added the API Issues pertaining the friendly API label Sep 27, 2018
@TomFinley
Copy link
Contributor

I just realized that we have not done this and it has implications for our public surface. This must be done soon. We currently have a situation where if there happens to be a column named Weight, there is no way to have unweighted training, which is really bad!!

@eerhardt and @ganik have context on this. It also reflects something I was talking about with @vinodshanbhag earlier today where I (wrongly) claimed that it was possible to not set weighted training in this scenario of the "incidental" `Weight column. I realized when reviewing #2665 that we are still doing this wrong thing.

The behavior should be as @artidoro describes above, that if the weight column is left null, you simply do not have weighted training. If you want weighted training, you must provide the column name to a non-null value.

@eerhardt
Copy link
Member

eerhardt commented Mar 2, 2019

@ganik - This was fixed with #2674, correct? Can it be closed?

@ganik
Copy link
Member

ganik commented Mar 8, 2019

@eerhardt I believe so, but let me double check with Artidoro

@ganik
Copy link
Member

ganik commented Mar 8, 2019

Fixed with #2674

@ganik ganik closed this as completed Mar 8, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

No branches or pull requests

5 participants