Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of components through MLContext: advanced options and other feedback #1798

Closed
TomFinley opened this issue Nov 30, 2018 · 9 comments · Fixed by #2510
Closed

Creation of components through MLContext: advanced options and other feedback #1798

TomFinley opened this issue Nov 30, 2018 · 9 comments · Fixed by #2510
Assignees
Labels
API Issues pertaining the friendly API

Comments

@TomFinley
Copy link
Contributor

TomFinley commented Nov 30, 2018

In our estimators and other similar components often have advanced settings, because, sometimes people have unusual circumstances. At the same time, there is a 95% or 99% scenario for "simple" usage that most people will be happy with. For this reason we have often made a distinction between common and advanced settings, as we see here.

public static FastTreeRegressionTrainer FastTree(this RegressionContext.RegressionTrainers ctx,
string labelColumn = DefaultColumnNames.Label,
string featureColumn = DefaultColumnNames.Features,
string weights = null,
int numLeaves = Defaults.NumLeaves,
int numTrees = Defaults.NumTrees,
int minDatapointsInLeaves = Defaults.MinDocumentsInLeaves,
double learningRate = Defaults.LearningRates,
Action<FastTreeRegressionTrainer.Arguments> advancedSettings = null)
{

There are some possible things that excited feedback:

  1. Echoing feedback seen in Rename mlContext.Data.TextReader() to mlContext.Data.CreateTextLoader() #1690, these things where we're making something should have the prefix Create, even in situations where this a catalog where we are always creating. Note: Create preferred to Make.

  2. The worth of ASP.NET style configuration was questioned (seen above as Action<FastTreeRegressionTrainer.Arguments> advancedSettings), e.g., there may not be much purpose in having a delegate. The older style where it just takes the Arguments period was preferred.

  3. Having this Arguments object as a nested class the component being created was viewed as positive, but this would be more idiomatically called Options -- Arguments was a holdover name from when these were exclusively for command line arguments, but for the API this is not a great name. So while keeping the general structure of how they are placed currently, they should probably be renamed to Options.

  4. It is good to have the convenience for the simple arguments, however, if we have both simple and advanced settings, we should not mix them but have instead two distinct constructors/extension methods. (E.g., in the above, we would have two methods, one that took the advanced options.) To do otherwise is to invite confusion about which "wins" if we have the setting set in both.

    • Note that phase setting "set in both," which suggests that these settings object should retain the "simpler" settings in them. This reinforces feedback elsewhere as seen here.
  5. If the simple arguments are totally sufficient, then there is no need to expose this Arguments class in hte public API. (For practical reasons relating to command line and entry-point usage, we still need to always have these Arguments objects, but if they serve no purpose for the API the class can simply be made internal.)

/cc @KrzysztofCwalina, @terrajobst , on whose feedback this list is primarily based, and who can correct me and provide clarification in case I misspoke.

@TomFinley TomFinley added the API Issues pertaining the friendly API label Nov 30, 2018
@TomFinley
Copy link
Contributor Author

One point that @glebuk brought up I wanted to make explicit, since I did not consider it before, is that by doing this, if we decide down the line to add more options to the "regular" options, we are free to do so. If we had the existing arrangement, then in order to maintain the API, if we made more regular default parameters. E.g., if we have this:

Create(int a, int b, int c, Action<Arguments> advancedSettings = null)

We could not have this, since this could break the signature.

Create(int a, int b, int c, int d, Action<Arguments> advancedSettings = null)

And the only way we see of not breaking the signature would be appending it to the end, which is very awkward and strange looking.

Create(int a, int b, int c, Action<Arguments> advancedSettings = null, int d = 1)

But if we had this:

Create(int a, int b, int c)
Create(Arguments advancedSettings)

Then we're free to do this:

Create(int a, int b, int c, int d = 1)
Create(Arguments advancedSettings)

@shauheen shauheen added this to the 0119 milestone Jan 10, 2019
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Jan 17, 2019
This PR addresses the estimators inside HalLearners:

Two public extension methods, one for simple arguments and the other for advanced options
Delete unecessary constructors
Pass Options objects as arguments instead of Action delegate
Rename Arguments to Options
Rename Options objects as options (instead of args or advancedSettings used so far)
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Jan 17, 2019
This PR addresses the estimators inside HalLearners:

Two public extension methods, one for simple arguments and the other for advanced options
Delete unecessary constructors
Pass Options objects as arguments instead of Action delegate
Rename Arguments to Options
Rename Options objects as options (instead of args or advancedSettings used so far)
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Jan 18, 2019
This PR addresses the estimators inside HalLearners:

Two public extension methods, one for simple arguments and the other for advanced options
Delete unecessary constructors
Pass Options objects as arguments instead of Action delegate
Rename Arguments to Options
Rename Options objects as options (instead of args or advancedSettings used so far)
sfilipi added a commit that referenced this issue Jan 18, 2019
* Towards #1798 .

This PR addresses the estimators inside HalLearners:

Two public extension methods, one for simple arguments and the other for advanced options
Delete unecessary constructors
Pass Options objects as arguments instead of Action delegate
Rename Arguments to Options
Rename Options objects as options (instead of args or advancedSettings used so far)
@abgoswam
Copy link
Member

abgoswam commented Jan 29, 2019

marking as Done, and closing this isssue. All learners have been taken care of.

sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Feb 1, 2019
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Feb 1, 2019
sfilipi added a commit to sfilipi/machinelearning-1 that referenced this issue Feb 4, 2019
sfilipi added a commit that referenced this issue Feb 5, 2019
* samples and documentation

* addressing issue #1798 for the Image Analytics project.
@Ivanidzo4ka
Copy link
Contributor

Forgive me my intrusion into this wonderful conversation, but I have question from user, which I'm don't know how to answer.
So we have this issue #2165 and for now user can just call trival transformer constructors and avoid estimators and Fit method.

number 1 in Senja list states

For the transform estimators (former transforms) the following needs to happen:
1- Internalize the ctors

How we come up to this decision, and what is root cause behind hiding constructors for trivial transforms? /cc @JakeRadMSFT

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants