Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Refactoring of ColumnOptions for ImagePixelExtractor #2893

Closed
wants to merge 2 commits into from

Conversation

artidoro
Copy link
Contributor

@artidoro artidoro commented Mar 8, 2019

Related to #2884.

The change does the following things:

  1. Internalizes the immutable class ColumnInfos and moves it to the transformer
  2. Exposes the previously named Column class and renames it ColumnOptions
  3. Internalizes entrypoint only fields in ColumnOptions, renames those that did not match the public API
  4. Refactored the constructor so that it takes the new ColumnOptions class

The overall behavior of the Estimator/Transformer API is unchanged (besides the use of the new mutable class).
However, this change alters the entrypoints API behavior in the following way.
Past behavior:

  • Can set column specific options.
  • Can set overall options for all columns
  • Can set overall options for some column and column specific options for others.

After this change:

  • Can set column specific options.
  • Can set overall options for all columns

I would also like to change the name of the input and output columns (in SourceNameColumnBase) to something more appropriate than Name and Source, but since that will touch a lot of files, I would like some feedback on the approach first.

@artidoro artidoro self-assigned this Mar 8, 2019
@artidoro artidoro added the API Issues pertaining the friendly API label Mar 8, 2019
@artidoro artidoro added this to the 0319 milestone Mar 8, 2019
@codecov
Copy link

codecov bot commented Mar 8, 2019

Codecov Report

Merging #2893 into master will decrease coverage by <.01%.
The diff coverage is 80.81%.

@@            Coverage Diff             @@
##           master    #2893      +/-   ##
==========================================
- Coverage   71.81%    71.8%   -0.01%     
==========================================
  Files         812      812              
  Lines      142644   142635       -9     
  Branches    16090    16089       -1     
==========================================
- Hits       102433   102424       -9     
- Misses      35828    35840      +12     
+ Partials     4383     4371      -12
Flag Coverage Δ
#Debug 71.8% <80.81%> (-0.01%) ⬇️
#production 67.95% <64.13%> (-0.01%) ⬇️
#test 86.24% <100%> (ø) ⬆️
Impacted Files Coverage Δ
...Microsoft.ML.Data/Transforms/ColumnBindingsBase.cs 72.76% <ø> (ø) ⬆️
...st/Microsoft.ML.Tests/Scenarios/TensorflowTests.cs 100% <100%> (ø) ⬆️
...cenariosWithDirectInstantiation/TensorflowTests.cs 92.51% <100%> (ø) ⬆️
test/Microsoft.ML.Tests/ImagesTests.cs 98.81% <100%> (ø) ⬆️
...c/Microsoft.ML.StaticPipe/ImageTransformsStatic.cs 95.45% <100%> (-0.06%) ⬇️
...c/Microsoft.ML.ImageAnalytics/ExtensionsCatalog.cs 80.95% <100%> (+43.45%) ⬆️
src/Microsoft.ML.StaticPipe/ImageStaticPipe.cs 54.54% <50%> (ø) ⬆️
...Microsoft.ML.ImageAnalytics/ImagePixelExtractor.cs 80.31% <60.31%> (-2.25%) ⬇️
src/Microsoft.ML.Transforms/Text/LdaTransform.cs 90.22% <0%> (+0.63%) ⬆️

@artidoro artidoro requested a review from wschin March 9, 2019 00:07
@TomFinley
Copy link
Contributor

    public bool TrySanitize()

I feel we should make this internal.


Refers to: src/Microsoft.ML.Data/Transforms/ColumnBindingsBase.cs:97 in be097f0. [](commit_id = be097f0, deletion_comment = False)

@TomFinley
Copy link
Contributor

    protected virtual bool TryUnparseCore(StringBuilder sb)

If this is now to be in the public surface then we'll have to private protected these guys.


Refers to: src/Microsoft.ML.Data/Transforms/ColumnBindingsBase.cs:197 in be097f0. [](commit_id = be097f0, deletion_comment = False)

@TomFinley
Copy link
Contributor

    protected virtual bool TryUnparseCore(StringBuilder sb, string extra)

This too.


Refers to: src/Microsoft.ML.Data/Transforms/ColumnBindingsBase.cs:222 in be097f0. [](commit_id = be097f0, deletion_comment = False)

@@ -15,7 +15,7 @@

namespace Microsoft.ML.Data
{
internal abstract class SourceNameColumnBase
public abstract class SourceNameColumnBase
{
[Argument(ArgumentType.AtMostOnce, HelpText = "Name of the new column", ShortName = "name")]
public string Name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name [](start = 22, length = 4)

The names will have to change in accordance with the conclusion of #2064, but let's have agreement on the general approach first. (If we don't also want the entry-points to change yet, remember the Name property on the attribute.)

{
[Argument(ArgumentType.Multiple | ArgumentType.Required, HelpText = "New column definition(s) (optional form: name:src)", Name = "Column", ShortName = "col", SortOrder = 1)]
public ImagePixelExtractingEstimator.ColumnOptions[] Columns;

[Argument(ArgumentType.AtMostOnce, HelpText = "Whether to use alpha channel", ShortName = "alpha")]
public bool? UseAlpha;
Copy link
Contributor

@TomFinley TomFinley Mar 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UseAlpha [](start = 25, length = 8)

These are a little funky. In entry-point land, and command line land, there would be the arguments to the transform, and then the arguments to each column that could potentially override it, which is why they are nullable. But, in this land, there's no particular reason for them to be nullable. So, I kind of feel like this is not really well suited to the purpose. Indeed I might wonder if it really is worth it?

@TomFinley
Copy link
Contributor

TomFinley commented Mar 9, 2019

So I'm not sure about this. It seems like the existing column structure (being nullable to allow overrides of more global options which don't exist in our interface) don't quite work. Also this particular example brings up the case that not all transformers practically benefit from this "multi-column mapping" idiom that we felt like we had to obey in the command line (since a universal interface there was more essential). Also the size of the change makes me worried this is way too much to take on prior to v1, even if we were to take this on prior to v1.

What you think about just hiding the things for now, including the ColumnOptions extension methods and the classes, and just calling it a day, at least for now? Maybe one or two isolated enhancements of new column options -- I guess type conversion could benefit -- but otherwise the urgency isn't quite clear to me.

@artidoro
Copy link
Contributor Author

I agree that changing the behavior of the command line and the entrypoint API is such a fashion is too significant to be a valid option before V1. I will close this PR.

@artidoro artidoro closed this Mar 11, 2019
@artidoro artidoro deleted the columnoptions branch March 13, 2019 17:54
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants