Need to make consistent the parameters across the API #1521

CESARDELATORRE · 2018-11-03T18:24:39Z

Issue: Similar classes/estimators methods have different parameters' order for the same concepts.

It looks like a "silly" error, but not straightforward to catch because the column's name provided as parameters are not typed (just strings, logically, since column names could vary) you get an error in execution time but in addition to that because the pipeline works as lazy loading, you find the error later on when fitting the model, so you initially don't know what API is incorrectly configured making it hard to find these kind of "silly" errors which are prone to happen because of different parameter's order in similar APIs (such as Estimators and Transformers).

For instance, when migrating to the estimator by using mlContext, the following "silly" case happened to me:

// This line works
var trainer = new SdcaMultiClassTrainer(mlContext, "Features", "Label");
// This line doesn't work, but you'll find out later on when Fitting the model
var trainer = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent("Features", "Label");

Error you get when fitting:

System.ArgumentOutOfRangeException
  HResult=0x80131502
  **Message=Training label column 'Features' type is not valid for multi-class: Vec<R4, 69981>. Type must be R4 or R8.**
  Source=Microsoft.ML.Data

//This line DOES work when changing to the right order of parameters accepted by the new API:
var trainer = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscent("Label", "Features");

But you can see that is not consistent in order with this very similar line, the same trainer:

// This line works
var trainer = new SdcaMultiClassTrainer(mlContext, "Features", "Label");

I initially thought that the trainer in the mlcontext catalog was not working for any reason as a very similar API was working by directly creating the class.. But it was simply because the order is vice-versa in the case of the new trainer surfaced on the mlContext. ;)

Improvement

Obvious and already "work in progress": It will help if we are consistent in the order of the parameters across all the APIs. "Label" first, "Features" later, or vice-versa, but consistent.
Not sure if somehow we could use strongly-typed types for the column names or some kind of check/control.. like if for the column names we were able to use the fields in the schema classes and the "Label" field was "marked" as "Label" with an attribute, that could also help, I mean something like the following:

var trainer = mlContext.MulticlassClassification.Trainers.StochasticDualCoordinateAscentnameof(
                                                        nameof(MyObservation.MyLabel), 
                                                        "Features");

Then, the Label field is "marked" as Label:

    public class MyObservation
    {
        [Column(ordinal: "0")]
        public string ID;

        [Label]
        [Column(ordinal: "1")]
        public string MyLabel; // This is an issue label, for example "area-System.Threading"

        [Column(ordinal: "2")]
        public string Title;

        [Column(ordinal: "3")]
        public string Description;
    }

Not sure though..., this last approach might be too contrived and not realistic because in almost all the cases the Label is also a generated/Dictionarized field, as well as the "Features" column, both created/generated as new columns out of the original observation schema class, so this last attempt might not be feasible...

Any other possible check/control so errors when using strings for the column names could be less prone to happen?...

The text was updated successfully, but these errors were encountered:

removing the default name for Features and Label colum.

CESARDELATORRE assigned sfilipi and Zruty0 Nov 3, 2018

sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Nov 4, 2018

fixing dotnet#1521

d10ea0a

removing the default name for Features and Label colum.

sfilipi mentioned this issue Nov 4, 2018

More trainer extensions, bug fixes and consistency across trainer extensions #1524

Merged

sfilipi closed this as completed in #1524 Nov 10, 2018

ghost locked as resolved and limited conversation to collaborators Mar 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to make consistent the parameters across the API #1521

Need to make consistent the parameters across the API #1521

CESARDELATORRE commented Nov 3, 2018

Need to make consistent the parameters across the API #1521

Need to make consistent the parameters across the API #1521

Comments

CESARDELATORRE commented Nov 3, 2018