Creation of components through MLContext: advanced options and other feedback #1798

TomFinley · 2018-11-30T23:53:47Z

In our estimators and other similar components often have advanced settings, because, sometimes people have unusual circumstances. At the same time, there is a 95% or 99% scenario for "simple" usage that most people will be happy with. For this reason we have often made a distinction between common and advanced settings, as we see here.

machinelearning/src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs

Lines 29 to 38 in cb37c7e

    
           public static FastTreeRegressionTrainer FastTree(this RegressionContext.RegressionTrainers ctx, 
        
               string labelColumn = DefaultColumnNames.Label, 
        
               string featureColumn = DefaultColumnNames.Features, 
        
               string weights = null, 
        
               int numLeaves = Defaults.NumLeaves, 
        
               int numTrees = Defaults.NumTrees, 
        
               int minDatapointsInLeaves = Defaults.MinDocumentsInLeaves, 
        
               double learningRate = Defaults.LearningRates, 
        
               Action<FastTreeRegressionTrainer.Arguments> advancedSettings = null) 
        
           {

There are some possible things that excited feedback:

Echoing feedback seen in Rename mlContext.Data.TextReader() to mlContext.Data.CreateTextLoader() #1690, these things where we're making something should have the prefix Create, even in situations where this a catalog where we are always creating. Note: Create preferred to Make.
The worth of ASP.NET style configuration was questioned (seen above as Action<FastTreeRegressionTrainer.Arguments> advancedSettings), e.g., there may not be much purpose in having a delegate. The older style where it just takes the Arguments period was preferred.
Having this Arguments object as a nested class the component being created was viewed as positive, but this would be more idiomatically called Options -- Arguments was a holdover name from when these were exclusively for command line arguments, but for the API this is not a great name. So while keeping the general structure of how they are placed currently, they should probably be renamed to Options.
It is good to have the convenience for the simple arguments, however, if we have both simple and advanced settings, we should not mix them but have instead two distinct constructors/extension methods. (E.g., in the above, we would have two methods, one that took the advanced options.) To do otherwise is to invite confusion about which "wins" if we have the setting set in both.
- Note that phase setting "set in both," which suggests that these settings object should retain the "simpler" settings in them. This reinforces feedback elsewhere as seen here.
If the simple arguments are totally sufficient, then there is no need to expose this Arguments class in hte public API. (For practical reasons relating to command line and entry-point usage, we still need to always have these Arguments objects, but if they serve no purpose for the API the class can simply be made internal.)

/cc @KrzysztofCwalina, @terrajobst , on whose feedback this list is primarily based, and who can correct me and provide clarification in case I misspoke.

The text was updated successfully, but these errors were encountered:

TomFinley · 2019-01-02T19:00:09Z

One point that @glebuk brought up I wanted to make explicit, since I did not consider it before, is that by doing this, if we decide down the line to add more options to the "regular" options, we are free to do so. If we had the existing arrangement, then in order to maintain the API, if we made more regular default parameters. E.g., if we have this:

Create(int a, int b, int c, Action<Arguments> advancedSettings = null)

We could not have this, since this could break the signature.

Create(int a, int b, int c, int d, Action<Arguments> advancedSettings = null)

And the only way we see of not breaking the signature would be appending it to the end, which is very awkward and strange looking.

Create(int a, int b, int c, Action<Arguments> advancedSettings = null, int d = 1)

But if we had this:

Create(int a, int b, int c)
Create(Arguments advancedSettings)

Then we're free to do this:

Create(int a, int b, int c, int d = 1)
Create(Arguments advancedSettings)

This PR addresses the estimators inside HalLearners: Two public extension methods, one for simple arguments and the other for advanced options Delete unecessary constructors Pass Options objects as arguments instead of Action delegate Rename Arguments to Options Rename Options objects as options (instead of args or advancedSettings used so far)

* Towards #1798 . This PR addresses the estimators inside HalLearners: Two public extension methods, one for simple arguments and the other for advanced options Delete unecessary constructors Pass Options objects as arguments instead of Action delegate Rename Arguments to Options Rename Options objects as options (instead of args or advancedSettings used so far)

abgoswam · 2019-01-29T18:00:22Z

marking as Done, and closing this isssue. All learners have been taken care of.

* samples and documentation * addressing issue #1798 for the Image Analytics project.

Ivanidzo4ka · 2019-02-06T03:43:30Z

Forgive me my intrusion into this wonderful conversation, but I have question from user, which I'm don't know how to answer.
So we have this issue #2165 and for now user can just call trival transformer constructors and avoid estimators and Fit method.

number 1 in Senja list states

For the transform estimators (former transforms) the following needs to happen:
1- Internalize the ctors

How we come up to this decision, and what is root cause behind hiding constructors for trivial transforms? /cc @JakeRadMSFT

TomFinley added the API Issues pertaining the friendly API label Nov 30, 2018

TomFinley mentioned this issue Nov 30, 2018

Add ML.NET notes from session 5 dotnet/apireviews#81

Merged

glebuk assigned abgoswam Jan 2, 2019

This was referenced Jan 2, 2019

WIP [Please don't review] : Arguments, Options #2000

Closed

Modify API for advanced settings. (FastTree, RandomForest) #2047

Merged

This was referenced Jan 9, 2019

Modify API for advanced settings. (SDCA) #2093

Merged

Refactoring of Constructors #2100

Open

TomFinley mentioned this issue Jan 10, 2019

Precedence between main arguments and advancedSettings #1639

Closed

shauheen added this to the 0119 milestone Jan 10, 2019

sfilipi mentioned this issue Jan 16, 2019

KMeans and Implicit weight cleanup #2158

Merged

abgoswam mentioned this issue Jan 16, 2019

Modify API for advanced settings (several learners) #2163

Merged

sfilipi mentioned this issue Jan 17, 2019

The trainer name types should follow the names used in the contexts #2172

Closed

justinormont mentioned this issue Jan 17, 2019

Towards #1798 . #2170

Merged

abgoswam mentioned this issue Jan 18, 2019

Number of feature columns #2179

Closed

abgoswam closed this as completed Jan 29, 2019

sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Feb 1, 2019

adressing issue dotnet#1798 for the Image Analytics project.

958d05a

sfilipi mentioned this issue Feb 1, 2019

Image analytics documentation, samples, internalization #2372

Merged

sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Feb 1, 2019

adressing issue dotnet#1798 for the Image Analytics project.

1c85070

sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Feb 4, 2019

adressing issue dotnet#1798 for the Image Analytics project.

9804817

artidoro mentioned this issue Feb 5, 2019

Textloader internalizations, documentation, and Arguments refactoring #2417

Merged

sfilipi added a commit that referenced this issue Feb 5, 2019

Image analytics documentation, samples, internalization (#2372)

410a296

* samples and documentation * addressing issue #1798 for the Image Analytics project.

Ivanidzo4ka mentioned this issue Feb 6, 2019

Using ScoreTensorFlow model is a bit confusing #2165

Closed

shauheen removed this from the 0119 milestone Feb 6, 2019

artidoro mentioned this issue Feb 12, 2019

Creation of components through MLContext, internalization, and renaming #2510

Merged

artidoro closed this as completed in #2510 Feb 13, 2019

abgoswam mentioned this issue Feb 14, 2019

Renmants of Arguments keyword in public API #2557

Closed

This was referenced Feb 21, 2019

Mark EntryPoints classes and APIs as internal #2674

Merged

One of the FeaturizeText extensions has the inputColumnNames as required #2768

Closed

eerhardt mentioned this issue Oct 7, 2019

DnnCatalog methods should use a public Options class #4307

Closed

eerhardt mentioned this issue Nov 6, 2019

Hash Transform API that takes in advanced options. #4443

Merged

eerhardt mentioned this issue Oct 8, 2021

Expose the Onnx runtime option for setting the number of threads #5962

Merged

ghost locked as resolved and limited conversation to collaborators Mar 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creation of components through MLContext: advanced options and other feedback #1798

Creation of components through MLContext: advanced options and other feedback #1798

TomFinley commented Nov 30, 2018 •

edited

Loading

TomFinley commented Jan 2, 2019

abgoswam commented Jan 29, 2019 •

edited

Loading

Ivanidzo4ka commented Feb 6, 2019

Creation of components through MLContext: advanced options and other feedback #1798

Creation of components through MLContext: advanced options and other feedback #1798

Comments

TomFinley commented Nov 30, 2018 • edited Loading

TomFinley commented Jan 2, 2019

abgoswam commented Jan 29, 2019 • edited Loading

Ivanidzo4ka commented Feb 6, 2019

TomFinley commented Nov 30, 2018 •

edited

Loading

abgoswam commented Jan 29, 2019 •

edited

Loading