-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding transform extensions #1460
Conversation
/// <param name="columnsToDrop">The array of column names to drop.</param> | ||
public static ColumnSelectingEstimator DropColumns(this TransformsCatalog catalog, params string[] columnsToDrop) | ||
{ | ||
return new ColumnSelectingEstimator(CatalogUtils.GetEnvironment(catalog), null, columnsToDrop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null, columnsToDrop [](start = 86, length = 19)
it would be more readable to use named arguments here and above, b/c ColumnSelectingEstimator takes so many optional arguments. #Resolved
/// Converts the images to grayscale. | ||
/// </summary> | ||
/// <param name="catalog">The transform's catalog.</param> | ||
/// <param name="columns">The name of the columns containing the image paths(first item of the tuple), and the name of the resulting output column (second item of the tuple).</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sulting output column (second item of the tuple). [](start = 133, length = 57)
nit: What is our max line length guideline? I keep scrolling left/right to see the whole thing :) #Resolved
/// </summary> | ||
/// <param name="catalog">The transform's catalog.</param> | ||
/// <param name="columns">The name of the columns containing the image paths(first item of the tuple), and the name of the resulting output column (second item of the tuple).</param> | ||
public static ImageGrayscalingEstimator Grayscale(this TransformsCatalog catalog, params (string input, string output)[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(string input, string output)[] columns [](start = 96, length = 40)
how come some extensions are multi input/output with tuples, some are single input/output, and some are multi input/output with ColumnInfo? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, one per constructor; they follow the ctor signatures.
In reply to: 229707525 [](ancestors = 229707525)
public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, params KeyToBinaryVectorTransform.ColumnInfo[] columns) | ||
=> new KeyToBinaryVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), columns); | ||
|
||
public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, string name, string source = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string name, string source = null [](start = 125, length = 33)
Did we change the naming convention from name/source to inputColumn/outputColumn? Or name/source is still valid?
Also they're sometimes called input/output.
Also I noticed, that some some of the extensions have output = null, as optional and some don't.
If it's too much for this PR, please add an issue for reviewing the full API surface for parameter consistency and mention all the different examples.
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going with inputColumn outputColumn. I'll scan for inconsistencies. Thanks for pointing out.
In reply to: 229710031 [](ancestors = 229710031)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are a number of places where we have source->name usage (outside of this PR). The issue should address all instances right?
@@ -82,7 +82,7 @@ void TestSimpleCase() | |||
var xyData = new List<TestDataXY> { new TestDataXY() { A = new float[inputSize] } }; | |||
var stringData = new List<TestDataDifferntType> { new TestDataDifferntType() { data_0 = new string[inputSize] } }; | |||
var sizeData = new List<TestDataSize> { new TestDataSize() { data_0 = new float[2] } }; | |||
var pipe = new OnnxEstimator(Env, modelFile, "data_0", "softmaxout_1"); | |||
var pipe = new OnnxConvertingEstimator(Env, modelFile, "data_0", "softmaxout_1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting [](start = 31, length = 10)
OnnxScoringEstimator could be a better name. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
/// <summary> | ||
/// Constructs the Select Columns Estimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Columns Selecting Estimator #Resolved
/// </summary> | ||
/// <param name="catalog">The categorical transform's catalog.</param> | ||
/// <param name="columns">The input column to map back to vectors.</param> | ||
public static KeyToVectorMappingEstimator ToVector(this TransformsCatalog.CategoricalTransforms catalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CategoricalTransforms [](start = 82, length = 21)
Actually I was expecting ToKey
, ToVector
, ToBinaryVector
and ToValue
to be in Conversions
, not Categorical
. In any case, let's not shorten it too much: MapKeyToVector
, MapValueToKey
etc. would be better I think #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would the respective Estimator live in the Conversions namespace?
In reply to: 229773742 [](ancestors = 229773742)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would vote to keep them in the Categorical, if the opposite operation (Term) is there.
In reply to: 229779270 [](ancestors = 229779270,229773742)
/// </summary> | ||
/// <param name="catalog">The categorical transform's catalog.</param> | ||
/// <param name="columns">The names of the input columns of the transformation and the corresponding names for the output columns.</param> | ||
public static MissingValueIndicatorEstimator IndicateMissingValues(this TransformsCatalog catalog, params (string inputColumn, string outputColumn)[] columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indicate [](start = 53, length = 8)
'indicate' seems weird. I don't have a better suggestion though... #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should be fine. sklearn uses the same terminology: http://scikit-learn.org/stable/modules/generated/sklearn.impute.MissingIndicator.html#sklearn.impute.MissingIndicator
In reply to: 229778482 [](ancestors = 229778482)
public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, params KeyToBinaryVectorTransform.ColumnInfo[] columns) | ||
=> new KeyToBinaryVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), columns); | ||
|
||
public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, string name, string source = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToBinaryVector [](start = 56, length = 14)
Summary #Resolved
/// <param name="catalog">The text-related transform's catalog.</param> | ||
/// <param name="columns">Pairs of columns to run the tokenization on.</param> | ||
/// <param name="separators">The separators to use (uses space character by default).</param> | ||
public static WordTokenizingEstimator TokenizeWord(this TransformsCatalog.TextTransforms catalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TokenizeWord [](start = 46, length = 12)
TokenizeWords
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did one per category in the two assemblies that gather the most of the transforms. That ok? In reply to: 434763291 [](ancestors = 434763291) Refers to: src/Microsoft.ML.Data/Transforms/TransformsExtensionsCatalog.cs:14 in 0b736b2. [](commit_id = 0b736b2, deletion_comment = False) |
/// <param name="colors">The color schema as defined in <see cref="ImagePixelExtractorTransform.ColorBits"/>.</param> | ||
/// <param name="interleave"></param> | ||
public static ImagePixelExtractingEstimator ExtractPixels(this TransformsCatalog catalog, string inputColumn, string outputColumn, | ||
ImagePixelExtractorTransform.ColorBits colors = ImagePixelExtractorTransform.ColorBits.Rgb, bool interleave = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tabbing?
/// <param name="cropAnchor">Where to place the anchor, to start cropping. Options defined in <see cref="ImageResizerTransform.Anchor"/></param> | ||
/// <returns></returns> | ||
public static ImageResizingEstimator Resize(this TransformsCatalog catalog, string inputColumn, string outputColumn, | ||
int imageWidth, int imageHeight, ImageResizerTransform.ResizingKind resizing = ImageResizerTransform.ResizingKind.IsoCrop, ImageResizerTransform.Anchor cropAnchor = ImageResizerTransform.Anchor.Center) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tabbing here as well?
/// </summary> | ||
/// <param name="env">The estimator's local <see cref="IHostEnvironment"/>.</param> | ||
/// <param name="columns">The name of the columns containing the image paths(first item of the tuple), and the name of the resulting output column (second item of the tuple).</param> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete line
public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, params KeyToBinaryVectorTransform.ColumnInfo[] columns) | ||
=> new KeyToBinaryVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), columns); | ||
|
||
public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, string name, string source = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are a number of places where we have source->name usage (outside of this PR). The issue should address all instances right?
@@ -84,7 +84,7 @@ public sealed class Arguments | |||
|
|||
[Argument(ArgumentType.AtMostOnce, | |||
HelpText = "Whether to combine multiple indicator vectors into a single bag vector instead of concatenating them. This is only relevant when the input is a vector.")] | |||
public bool Bag = KeyToVectorEstimator.Defaults.Bag; | |||
public bool Bag = KeyToVectorMappingEstimator.Defaults.Bag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only part of the work right? I.e. the transforms will also be updated to KeyToVectorMappingTransform?
Addresses part of #1318 adding xtensions for all text related estimators.