Mass rename of transformers and MLContext extensions for them #1318

Zruty0 · 2018-10-19T17:49:32Z

Let's go over all the existing transforms and make sure that they share the same naming conventions:

Estimators
- Should be named ActionPerformingEstimator, or AlgorithmNameTrainer
- Should be placed in Microsoft.ML.Trainers, Transforms or sub-namespaces if applicable
- If they take input or output colunms, they should be 'inputColumn' and 'outputColumn', and the outputColumn should be nullable.
Trainers
- If they take features/label columns, they should be in the same order (label, features, weights, other), and have defaults.
- Important parameters should be listed as ctor arguments.
- Other parameters should be a delegate over advanced settings
  - Generally, trainers should have exactly one constructor. Overload only if necessary.
Transformers
- Should be named ActionPerformingTransformer
- Should be placed in Microsoft.Ml.Transforms or sub-namespaces
- If they are trainable, they should NOT have a public constructor.
- If they are NOT trainable, they SHOULD have a public constructor.

The text was updated successfully, but these errors were encountered:

justinormont · 2018-10-19T21:45:10Z

What types of sub-namespaces are we looking for?

For example would we want categories like { Text, Image Normalize, etc }? For example, Microsoft.Ml.Transforms.Text.WordEmbeddingsTransform.

Have we publicly defined the term trainable? Tersely, trainable components take a pass of the data and learns from it, then after being trained it can used. Examples of this include: text featurization (where we take a pass to index the words), label encoding (where we take a pass to index the labels), normalizers (to learn the range of the numbers), and feature selection (to count/learn which features to keep).

sfilipi · 2018-10-19T23:02:37Z

@Zruty0

If they are trainable, they should NOT have a public constructor.
If they are trainable, they SHOULD have a public constructor.

¯_(ツ)_/¯

Zruty0 · 2018-10-19T23:34:27Z

I updated the comment. @justinormont , yes on the namespaces.

And the explanation of trainable is also accurate, thanks.

sfilipi · 2018-10-20T03:28:52Z

@Zruty0 one more question, should the transforms currently living in Microsoft.ML.Runtime.Data move to Microsoft.ML.Transforms?

Zruty0 · 2018-10-21T21:36:08Z

They should move out of 'Runtime'. If there is some form of inherent grouping, it may or may not be reflected in sub-namespace, but the root should be Microsoft.ML.Transforms.

TomFinley · 2018-10-22T16:51:29Z

One thing I wonder if we should make explicit, is if a Transformer is produced by exactly one Estimator, whether it should tend to be named the same as the estimator? That is FooEstimator should tend to produce FooTransformer? Obviously not applicable in the case where a transformer is produced by multiple estimators (e.g., a drop slots transformer can be produced by multiple estimators, a linear model can be produced many ways, etc.)

Zruty0 · 2018-10-22T17:12:11Z

That was the intent, yes.

justinormont · 2018-10-23T05:56:49Z

Do we have candidate list of the new names?

Current name	New Name
x	x'

Is there a public place where this list can be grown? A wiki like interface could be suitable.

Zruty0 · 2018-10-23T17:53:03Z

@justinormont , I don't think we need to produce a separate list. The old names are sort of incidental, so there is no value in preserving them for posterity. The new names will be reflected in the documentation. The renaming itself is abundantly visible on the pull request.

sfilipi · 2018-10-29T18:30:42Z

@eerhardt suggested that we also put the transformers and estimators in subfolders based on sub-categories.

artidoro · 2018-10-31T22:16:06Z

We should not forget to rename the transformers. For example in v0.7, ValueToKeyMappingEstimator returns a TermTransform on .fit(). And when you need to provide arguments as ColumInfo[], it looks like:

machinelearning/test/Microsoft.ML.Tests/Transformers/KeyToBinaryVectorEstimatorTest.cs

Lines 50 to 54 in 9d33efe

    
           dataView = new ValueToKeyMappingEstimator(Env, new[]{ 
        
                   new TermTransform.ColumnInfo("A", "TermA"), 
        
                   new TermTransform.ColumnInfo("B", "TermB"), 
        
                   new TermTransform.ColumnInfo("C", "TermC", textKeyValues:true) 
        
               }).Fit(dataView).Transform(dataView);

        dataView = new ValueToKeyMappingEstimator(Env, new[]{
                new TermTransform.ColumnInfo("A", "TermA"),
                new TermTransform.ColumnInfo("B", "TermB"),
                new TermTransform.ColumnInfo("C", "TermC", textKeyValues:true)
            }).Fit(dataView).Transform(dataView);

Which is not ideal, since estimator name does not match transformer name.

CESARDELATORRE · 2018-11-21T00:04:22Z

In addition to the internal estimator classes, for the MLContext catalog, I'd like to highlight that many methods creating estimators in the new MLContext catalog are named in a way that look like properties, with a noun instead of having a verb, since they are methods.

According to C# conventions (and most languages), a method's name should have a verb describing the action being performed by that method:

https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/names-of-type-members#names-of-methods

For example, the following are current methods creating objects:

mlContext.Data.TextReader(new TextLoader.Arguments()...)
mlContext.Transforms.Categorical.OneHotEncoding("VendorId", "VendorIdEncoded")

(In particular, the "TextReader" method since it is creating a TextLoader, it should also be renamed to TextLoader as part of the method's name, but that is a different/particular issue).

And all the methods for creating trainers from the MLContext catalog, such as:

mlContext.BinaryClassification.Trainers.FastTree(label: "Label", features: "Features");
mlContext.Regression.Trainers.StochasticDualCoordinateAscent(label: "Label", features: "Features");

When you see that code the first time, due to the fact that the method's name is a noun, it feels like a Property object, but it is not, they are methods.

I think those methods should be named as something like:

mlContext.Data.CreateTextLoader(new TextLoader.Arguments() ...)
mlContext.Transforms.Categorical.CreateOneHotEncodingEstimator("VendorId", "VendorIdEncoded")
mlContext.BinaryClassification.Trainers.CreateFastTreeTrainer(label: "Label", features: "Features");
mlContext.Regression.Trainers.CreateSDCATrainer(label: "Label", features: "Features");

So it feels like methods not as properties.
Another related issue is with methods that currently have a verb describing an action, but in reality they are not performing that action but creating an object/estimator, such as:

mlContext.Transforms.Normalize(inputName: "PassengerCount", mode: NormalizerMode.MeanVariance)
mlContext.Transforms.Concatenate("Features", "VendorIdEncoded", "RateCodeEncoded")

Those objects are not normalizing or concatenating something in that moment within the object owning the method. In reality, they are creating an object, so again they probably should be named with a verb related to that object "Creation" of a specific type. Something like:

mlContext.Transforms.CreateNormalizingEstimator(inputName: "PassengerCount", mode: NormalizerMode.MeanVariance)
mlContext.Transforms.CreateConcatenatingEstimator("Features", "VendorIdEncoded", "RateCodeEncoded")

What I'm proposing is what is aligned to standard C# naming conventions and what C# developers are used to.
I recognize that once you know the framework API, it might be shorter not to have the "Create" verb, but the current approaches feel confusing for me and could also feel confusing for any C# developer seeing that code for the first time.

Especially having methods with just a noun feels like a Property object instead of a method..

Thoughts?

Zruty0 assigned sfilipi Oct 19, 2018

Zruty0 mentioned this issue Oct 19, 2018

MLContext extensions for trainers #1319

Closed

justinormont added API Issues pertaining the friendly API usability Smoothing user interaction or experience labels Oct 19, 2018

sfilipi mentioned this issue Oct 20, 2018

Renaming some transforms to follow the estimator naming convention. #1328

Merged

sfilipi mentioned this issue Oct 23, 2018

Learners live on Microsoft.ML.Trainers #1340

Closed

This was referenced Oct 23, 2018

Moving FastTree from Runtime to Trainers. #1347

Merged

Trainer estimator cleanup for FastTrees and LightGBM #1352

Merged

Adding extensions for Hal learners. More namespace re-ogr. #1370

Merged

This was referenced Oct 30, 2018

Adding transform extensions #1448

Merged

namespace moves for more transforms #1453

Merged

more namespace move for transforms #1457

Merged

Last namespace re-org #1458

Merged

Adding transform extensions #1460

Merged

artidoro mentioned this issue Oct 31, 2018

MatrixFactorizationTrainer should include Estimator in its name #1484

Closed

This was referenced Nov 1, 2018

Adding another handful transform's extensions #1494

Merged

More trainer extensions, bug fixes and consistency across trainer extensions #1524

Merged

This was referenced Nov 9, 2018

Renaming transforms to transformers Part 1 #1588

Merged

more transform => transformer renaming #1590

Merged

renaming transforms -> transformers #1606

Merged

new names, per 1318 description. #1607

Merged

rogancarr mentioned this issue Nov 19, 2018

How to use LinearSvm? #1673

Closed

sfilipi mentioned this issue Nov 20, 2018

adding some trainer extensions on the StandardLearners catalog. Correcting namespace, and names #1682

Merged

Zruty0 mentioned this issue Nov 21, 2018

Rename mlContext.Data.TextReader() to mlContext.Data.CreateTextLoader() #1690

Closed

sfilipi closed this as completed in #1682 Nov 27, 2018

sfilipi mentioned this issue Dec 4, 2018

Public API for KMeansPredictor #1739

Merged

TomFinley mentioned this issue Jan 19, 2019

Estimator arguments should take output column name as first parameter, any inputs as subsequent parameters #2064

Closed

eerhardt mentioned this issue Mar 11, 2019

TrainersName pattern (Discussion) #2762

Closed

ghost locked as resolved and limited conversation to collaborators Mar 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mass rename of transformers and MLContext extensions for them #1318

Mass rename of transformers and MLContext extensions for them #1318

Zruty0 commented Oct 19, 2018 •

edited

Loading

justinormont commented Oct 19, 2018

sfilipi commented Oct 19, 2018

Zruty0 commented Oct 19, 2018

sfilipi commented Oct 20, 2018 •

edited

Loading

Zruty0 commented Oct 21, 2018

TomFinley commented Oct 22, 2018

Zruty0 commented Oct 22, 2018

justinormont commented Oct 23, 2018

Zruty0 commented Oct 23, 2018

sfilipi commented Oct 29, 2018

artidoro commented Oct 31, 2018 •

edited

Loading

CESARDELATORRE commented Nov 21, 2018

Mass rename of transformers and MLContext extensions for them #1318

Mass rename of transformers and MLContext extensions for them #1318

Comments

Zruty0 commented Oct 19, 2018 • edited Loading

justinormont commented Oct 19, 2018

sfilipi commented Oct 19, 2018

Zruty0 commented Oct 19, 2018

sfilipi commented Oct 20, 2018 • edited Loading

Zruty0 commented Oct 21, 2018

TomFinley commented Oct 22, 2018

Zruty0 commented Oct 22, 2018

justinormont commented Oct 23, 2018

Zruty0 commented Oct 23, 2018

sfilipi commented Oct 29, 2018

artidoro commented Oct 31, 2018 • edited Loading

CESARDELATORRE commented Nov 21, 2018

Zruty0 commented Oct 19, 2018 •

edited

Loading

sfilipi commented Oct 20, 2018 •

edited

Loading

artidoro commented Oct 31, 2018 •

edited

Loading