Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One of the FeaturizeText extensions has the inputColumnNames as required #2768

Closed
sfilipi opened this issue Feb 27, 2019 · 2 comments
Closed

Comments

@sfilipi
Copy link
Member

sfilipi commented Feb 27, 2019

The are two extensions for Featurize text.

    public static TextFeaturizingEstimator FeaturizeText(this TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = null);

    public static TextFeaturizingEstimator FeaturizeText(this TransformsCatalog.TextTransforms catalog, string outputColumnName, IEnumerable<string> inputColumnNames, TextFeaturizingEstimator.Options options);

IMO,

  • On the first one, TextFeaturizingEstimator.Options options should exist as optional.
  • On the second one, TextFeaturizingEstimator.Options options should exist as optional.

ATM, if i want to use the TextFeaturizingEstimator.Options, i have to create an IEnumerable for inputColumnNames and give it the output column name.

mlContext.Transforms.Text.FeaturizeText("CarrierName",
    Enumerable.Repeat("CarrierName", 1),
    new TextFeaturizingEstimator.Options
    {
              KeepDiacritics = false,
              KeepPunctuations = false,
              TextCase =TextNormalizingEstimator.CaseNormalizationMode.Lower,
              OutputTokens = true,
              VectorNormalizer = TextFeaturizingEstimator.TextNormKind.L2
     }))
@TomFinley
Copy link
Contributor

Interesting find, thank you @sfilipi. As far as I can see, elsewhere when we have options, we have only those options, and they are non-optional. This pursuant to #1798. So it seems like the second signature should not take inputColumnNames, but it should instead have an input column names on its options. (Which as far as I can tell it does.) By having both, we are inviting the potential of the question "who wins" on these two settings. (See again #1798.)

So I would prefer to solve this in another way:

  1. The first one stays as is.
  2. The second one does not accept IEnumerable, but instead takes an option.

If we find that some advanced options are in fact valuable, we should consider adding them as (probably) optional parameters to the first one.

@sfilipi
Copy link
Member Author

sfilipi commented Apr 23, 2019

Completed in #2815

@sfilipi sfilipi closed this as completed Apr 23, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants