Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mass rename of transformers and MLContext extensions for them #1318

Closed
Zruty0 opened this issue Oct 19, 2018 · 12 comments
Closed

Mass rename of transformers and MLContext extensions for them #1318

Zruty0 opened this issue Oct 19, 2018 · 12 comments
Assignees
Labels
API Issues pertaining the friendly API usability Smoothing user interaction or experience

Comments

@Zruty0
Copy link
Contributor

Zruty0 commented Oct 19, 2018

Let's go over all the existing transforms and make sure that they share the same naming conventions:

  • Estimators

    • Should be named ActionPerformingEstimator, or AlgorithmNameTrainer
    • Should be placed in Microsoft.ML.Trainers, Transforms or sub-namespaces if applicable
    • If they take input or output colunms, they should be 'inputColumn' and 'outputColumn', and the outputColumn should be nullable.
  • Trainers

    • If they take features/label columns, they should be in the same order (label, features, weights, other), and have defaults.
    • Important parameters should be listed as ctor arguments.
    • Other parameters should be a delegate over advanced settings
      • Generally, trainers should have exactly one constructor. Overload only if necessary.
  • Transformers

    • Should be named ActionPerformingTransformer
    • Should be placed in Microsoft.Ml.Transforms or sub-namespaces
    • If they are trainable, they should NOT have a public constructor.
    • If they are NOT trainable, they SHOULD have a public constructor.
@justinormont justinormont added API Issues pertaining the friendly API usability Smoothing user interaction or experience labels Oct 19, 2018
@justinormont
Copy link
Contributor

What types of sub-namespaces are we looking for?

For example would we want categories like { Text, Image Normalize, etc }? For example, Microsoft.Ml.Transforms.Text.WordEmbeddingsTransform.

Have we publicly defined the term trainable? Tersely, trainable components take a pass of the data and learns from it, then after being trained it can used. Examples of this include: text featurization (where we take a pass to index the words), label encoding (where we take a pass to index the labels), normalizers (to learn the range of the numbers), and feature selection (to count/learn which features to keep).

@sfilipi
Copy link
Member

sfilipi commented Oct 19, 2018

@Zruty0

If they are trainable, they should NOT have a public constructor.
If they are trainable, they SHOULD have a public constructor.

¯_(ツ)_/¯

@Zruty0
Copy link
Contributor Author

Zruty0 commented Oct 19, 2018

I updated the comment. @justinormont , yes on the namespaces.

And the explanation of trainable is also accurate, thanks.

@sfilipi
Copy link
Member

sfilipi commented Oct 20, 2018

@Zruty0 one more question, should the transforms currently living in Microsoft.ML.Runtime.Data move to Microsoft.ML.Transforms?

@Zruty0
Copy link
Contributor Author

Zruty0 commented Oct 21, 2018

They should move out of 'Runtime'. If there is some form of inherent grouping, it may or may not be reflected in sub-namespace, but the root should be Microsoft.ML.Transforms.

@TomFinley
Copy link
Contributor

One thing I wonder if we should make explicit, is if a Transformer is produced by exactly one Estimator, whether it should tend to be named the same as the estimator? That is FooEstimator should tend to produce FooTransformer? Obviously not applicable in the case where a transformer is produced by multiple estimators (e.g., a drop slots transformer can be produced by multiple estimators, a linear model can be produced many ways, etc.)

@Zruty0
Copy link
Contributor Author

Zruty0 commented Oct 22, 2018

That was the intent, yes.

@justinormont
Copy link
Contributor

Do we have candidate list of the new names?

Current name New Name
x x'

Is there a public place where this list can be grown? A wiki like interface could be suitable.

@Zruty0
Copy link
Contributor Author

Zruty0 commented Oct 23, 2018

@justinormont , I don't think we need to produce a separate list. The old names are sort of incidental, so there is no value in preserving them for posterity. The new names will be reflected in the documentation. The renaming itself is abundantly visible on the pull request.

@sfilipi
Copy link
Member

sfilipi commented Oct 29, 2018

@eerhardt suggested that we also put the transformers and estimators in subfolders based on sub-categories.

@artidoro
Copy link
Contributor

artidoro commented Oct 31, 2018

We should not forget to rename the transformers. For example in v0.7, ValueToKeyMappingEstimator returns a TermTransform on .fit(). And when you need to provide arguments as ColumInfo[], it looks like:

dataView = new ValueToKeyMappingEstimator(Env, new[]{
new TermTransform.ColumnInfo("A", "TermA"),
new TermTransform.ColumnInfo("B", "TermB"),
new TermTransform.ColumnInfo("C", "TermC", textKeyValues:true)
}).Fit(dataView).Transform(dataView);

        dataView = new ValueToKeyMappingEstimator(Env, new[]{
                new TermTransform.ColumnInfo("A", "TermA"),
                new TermTransform.ColumnInfo("B", "TermB"),
                new TermTransform.ColumnInfo("C", "TermC", textKeyValues:true)
            }).Fit(dataView).Transform(dataView);

Which is not ideal, since estimator name does not match transformer name.

@CESARDELATORRE
Copy link
Contributor

In addition to the internal estimator classes, for the MLContext catalog, I'd like to highlight that many methods creating estimators in the new MLContext catalog are named in a way that look like properties, with a noun instead of having a verb, since they are methods.

According to C# conventions (and most languages), a method's name should have a verb describing the action being performed by that method:

https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/names-of-type-members#names-of-methods

For example, the following are current methods creating objects:

mlContext.Data.TextReader(new TextLoader.Arguments()...)
mlContext.Transforms.Categorical.OneHotEncoding("VendorId", "VendorIdEncoded")

(In particular, the "TextReader" method since it is creating a TextLoader, it should also be renamed to TextLoader as part of the method's name, but that is a different/particular issue).

And all the methods for creating trainers from the MLContext catalog, such as:

mlContext.BinaryClassification.Trainers.FastTree(label: "Label", features: "Features");
mlContext.Regression.Trainers.StochasticDualCoordinateAscent(label: "Label", features: "Features");

When you see that code the first time, due to the fact that the method's name is a noun, it feels like a Property object, but it is not, they are methods.

I think those methods should be named as something like:

mlContext.Data.CreateTextLoader(new TextLoader.Arguments() ...)
mlContext.Transforms.Categorical.CreateOneHotEncodingEstimator("VendorId", "VendorIdEncoded")
mlContext.BinaryClassification.Trainers.CreateFastTreeTrainer(label: "Label", features: "Features");
mlContext.Regression.Trainers.CreateSDCATrainer(label: "Label", features: "Features");

So it feels like methods not as properties.
Another related issue is with methods that currently have a verb describing an action, but in reality they are not performing that action but creating an object/estimator, such as:

mlContext.Transforms.Normalize(inputName: "PassengerCount", mode: NormalizerMode.MeanVariance)
mlContext.Transforms.Concatenate("Features", "VendorIdEncoded", "RateCodeEncoded")

Those objects are not normalizing or concatenating something in that moment within the object owning the method. In reality, they are creating an object, so again they probably should be named with a verb related to that object "Creation" of a specific type. Something like:

mlContext.Transforms.CreateNormalizingEstimator(inputName: "PassengerCount", mode: NormalizerMode.MeanVariance)
mlContext.Transforms.CreateConcatenatingEstimator("Features", "VendorIdEncoded", "RateCodeEncoded")

What I'm proposing is what is aligned to standard C# naming conventions and what C# developers are used to.
I recognize that once you know the framework API, it might be shorter not to have the "Create" verb, but the current approaches feel confusing for me and could also feel confusing for any C# developer seeing that code for the first time.

Especially having methods with just a noun feels like a Property object instead of a method..

Thoughts?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API usability Smoothing user interaction or experience
Projects
None yet
Development

No branches or pull requests

6 participants