Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding transform extensions #1460

Merged
merged 9 commits into from
Oct 31, 2018
Merged

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Oct 30, 2018

Addresses part of #1318 adding xtensions for all text related estimators.

@sfilipi sfilipi changed the title Adding all extensions for the Text related transformation Adding transform extensions Oct 31, 2018
@sfilipi sfilipi self-assigned this Oct 31, 2018
@sfilipi sfilipi added the API Issues pertaining the friendly API label Oct 31, 2018
@sfilipi sfilipi requested a review from singlis October 31, 2018 07:33
/// <param name="columnsToDrop">The array of column names to drop.</param>
public static ColumnSelectingEstimator DropColumns(this TransformsCatalog catalog, params string[] columnsToDrop)
{
return new ColumnSelectingEstimator(CatalogUtils.GetEnvironment(catalog), null, columnsToDrop);
Copy link

@shmoradims shmoradims Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null, columnsToDrop [](start = 86, length = 19)

it would be more readable to use named arguments here and above, b/c ColumnSelectingEstimator takes so many optional arguments. #Resolved

/// Converts the images to grayscale.
/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="columns">The name of the columns containing the image paths(first item of the tuple), and the name of the resulting output column (second item of the tuple).</param>
Copy link

@shmoradims shmoradims Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sulting output column (second item of the tuple). [](start = 133, length = 57)

nit: What is our max line length guideline? I keep scrolling left/right to see the whole thing :) #Resolved

/// </summary>
/// <param name="catalog">The transform's catalog.</param>
/// <param name="columns">The name of the columns containing the image paths(first item of the tuple), and the name of the resulting output column (second item of the tuple).</param>
public static ImageGrayscalingEstimator Grayscale(this TransformsCatalog catalog, params (string input, string output)[] columns)
Copy link

@shmoradims shmoradims Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(string input, string output)[] columns [](start = 96, length = 40)

how come some extensions are multi input/output with tuples, some are single input/output, and some are multi input/output with ColumnInfo? #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, one per constructor; they follow the ctor signatures.


In reply to: 229707525 [](ancestors = 229707525)

public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, params KeyToBinaryVectorTransform.ColumnInfo[] columns)
=> new KeyToBinaryVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), columns);

public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, string name, string source = null)
Copy link

@shmoradims shmoradims Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string name, string source = null [](start = 125, length = 33)

Did we change the naming convention from name/source to inputColumn/outputColumn? Or name/source is still valid?

Also they're sometimes called input/output.

Also I noticed, that some some of the extensions have output = null, as optional and some don't.

If it's too much for this PR, please add an issue for reviewing the full API surface for parameter consistency and mention all the different examples.
#Resolved

Copy link
Member Author

@sfilipi sfilipi Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going with inputColumn outputColumn. I'll scan for inconsistencies. Thanks for pointing out.


In reply to: 229710031 [](ancestors = 229710031)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are a number of places where we have source->name usage (outside of this PR). The issue should address all instances right?

@@ -82,7 +82,7 @@ void TestSimpleCase()
var xyData = new List<TestDataXY> { new TestDataXY() { A = new float[inputSize] } };
var stringData = new List<TestDataDifferntType> { new TestDataDifferntType() { data_0 = new string[inputSize] } };
var sizeData = new List<TestDataSize> { new TestDataSize() { data_0 = new float[2] } };
var pipe = new OnnxEstimator(Env, modelFile, "data_0", "softmaxout_1");
var pipe = new OnnxConvertingEstimator(Env, modelFile, "data_0", "softmaxout_1");
Copy link

@shmoradims shmoradims Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting [](start = 31, length = 10)

OnnxScoringEstimator could be a better name. #Resolved

Copy link

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

}

/// <summary>
/// Constructs the Select Columns Estimator.
Copy link
Member

@singlis singlis Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Columns Selecting Estimator #Resolved

/// </summary>
/// <param name="catalog">The categorical transform's catalog.</param>
/// <param name="columns">The input column to map back to vectors.</param>
public static KeyToVectorMappingEstimator ToVector(this TransformsCatalog.CategoricalTransforms catalog,
Copy link
Contributor

@Zruty0 Zruty0 Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CategoricalTransforms [](start = 82, length = 21)

Actually I was expecting ToKey, ToVector, ToBinaryVector and ToValue to be in Conversions, not Categorical. In any case, let's not shorten it too much: MapKeyToVector, MapValueToKey etc. would be better I think #Pending

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the respective Estimator live in the Conversions namespace?


In reply to: 229773742 [](ancestors = 229773742)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would vote to keep them in the Categorical, if the opposite operation (Term) is there.


In reply to: 229779270 [](ancestors = 229779270,229773742)

/// </summary>
/// <param name="catalog">The categorical transform's catalog.</param>
/// <param name="columns">The names of the input columns of the transformation and the corresponding names for the output columns.</param>
public static MissingValueIndicatorEstimator IndicateMissingValues(this TransformsCatalog catalog, params (string inputColumn, string outputColumn)[] columns)
Copy link
Contributor

@Zruty0 Zruty0 Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indicate [](start = 53, length = 8)

'indicate' seems weird. I don't have a better suggestion though... #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should be fine. sklearn uses the same terminology: http://scikit-learn.org/stable/modules/generated/sklearn.impute.MissingIndicator.html#sklearn.impute.MissingIndicator


In reply to: 229778482 [](ancestors = 229778482)

public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, params KeyToBinaryVectorTransform.ColumnInfo[] columns)
=> new KeyToBinaryVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), columns);

public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, string name, string source = null)
Copy link
Contributor

@Zruty0 Zruty0 Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToBinaryVector [](start = 56, length = 14)

Summary #Resolved

/// <param name="catalog">The text-related transform's catalog.</param>
/// <param name="columns">Pairs of columns to run the tokenization on.</param>
/// <param name="separators">The separators to use (uses space character by default).</param>
public static WordTokenizingEstimator TokenizeWord(this TransformsCatalog.TextTransforms catalog,
Copy link
Contributor

@Zruty0 Zruty0 Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokenizeWord [](start = 46, length = 12)

TokenizeWords #Resolved

@Zruty0
Copy link
Contributor

Zruty0 commented Oct 31, 2018

public static class ColumnCopyingCatalog

why not merge all the static catalog classes into one? I think we agreed to have one per assembly #Pending


Refers to: src/Microsoft.ML.Data/Transforms/TransformsExtensionsCatalog.cs:14 in 0b736b2. [](commit_id = 0b736b2, deletion_comment = False)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi
Copy link
Member Author

sfilipi commented Oct 31, 2018

public static class ColumnCopyingCatalog

I did one per category in the two assemblies that gather the most of the transforms. That ok?


In reply to: 434763291 [](ancestors = 434763291)


Refers to: src/Microsoft.ML.Data/Transforms/TransformsExtensionsCatalog.cs:14 in 0b736b2. [](commit_id = 0b736b2, deletion_comment = False)

/// <param name="colors">The color schema as defined in <see cref="ImagePixelExtractorTransform.ColorBits"/>.</param>
/// <param name="interleave"></param>
public static ImagePixelExtractingEstimator ExtractPixels(this TransformsCatalog catalog, string inputColumn, string outputColumn,
ImagePixelExtractorTransform.ColorBits colors = ImagePixelExtractorTransform.ColorBits.Rgb, bool interleave = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tabbing?

/// <param name="cropAnchor">Where to place the anchor, to start cropping. Options defined in <see cref="ImageResizerTransform.Anchor"/></param>
/// <returns></returns>
public static ImageResizingEstimator Resize(this TransformsCatalog catalog, string inputColumn, string outputColumn,
int imageWidth, int imageHeight, ImageResizerTransform.ResizingKind resizing = ImageResizerTransform.ResizingKind.IsoCrop, ImageResizerTransform.Anchor cropAnchor = ImageResizerTransform.Anchor.Center)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tabbing here as well?

/// </summary>
/// <param name="env">The estimator's local <see cref="IHostEnvironment"/>.</param>
/// <param name="columns">The name of the columns containing the image paths(first item of the tuple), and the name of the resulting output column (second item of the tuple).</param>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete line

public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, params KeyToBinaryVectorTransform.ColumnInfo[] columns)
=> new KeyToBinaryVectorMappingEstimator(CatalogUtils.GetEnvironment(catalog), columns);

public static KeyToBinaryVectorMappingEstimator ToBinaryVector(this TransformsCatalog.CategoricalTransforms catalog, string name, string source = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are a number of places where we have source->name usage (outside of this PR). The issue should address all instances right?

@@ -84,7 +84,7 @@ public sealed class Arguments

[Argument(ArgumentType.AtMostOnce,
HelpText = "Whether to combine multiple indicator vectors into a single bag vector instead of concatenating them. This is only relevant when the input is a vector.")]
public bool Bag = KeyToVectorEstimator.Defaults.Bag;
public bool Bag = KeyToVectorMappingEstimator.Defaults.Bag;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only part of the work right? I.e. the transforms will also be updated to KeyToVectorMappingTransform?

@sfilipi sfilipi merged commit 9d33efe into dotnet:master Oct 31, 2018
@sfilipi sfilipi deleted the addingTransformXtensions branch October 31, 2018 18:05
@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants