Make TextFeaturizer's extractors etc. configurable again #838

Zruty0 · 2018-09-05T19:55:20Z

#801 is making the parameters default and hardcoded for the following options of Text featurizer:

Stop word remover (defaults to none) - Fixed in Made 'StopWordsRemover' in TextFeaturizer configurable again. #2962
Custom term dictionary (defaults to none) - This parameter is hidden
Word feature extractor (defaults to unigrams) - Fixed in Exposed ngram extraction options in TextFeaturizer #2911
Char feature extractor (defaults to 3-char) - Fixed in Exposed ngram extraction options in TextFeaturizer #2911

Once individual building blocks become estimators, we should bring these parameters back (in a form of estimator for word/char extractor etc.).

Or maybe we shouldn't, and instead just demonstrate how to compose your version of text transform from the individual building blocks?

justinormont · 2018-09-13T04:00:04Z

The text transform is an incredible time savings and lowers the bar to entry for making a good NLP model.

Building from the individual blocks is a rough road to travel, and doesn't add much extra power. The only case I recall needing to use the individual blocks was when using the lemmatizer, which I don't think is available in ML.NET.

Also, I'd recommend Bigrams+Trichar as the defaults, which matches our default text recipe.

eerhardt · 2019-03-02T20:57:41Z

Is this strictly adding new API? Can this be done without a public API breaking change? If so, I think we can remove it from Project 13, and it can be added after v1.0.

But if this requires a public API breaking change, then it can be left in Project 13.

Zruty0 added the API Issues pertaining the friendly API label Sep 5, 2018

shauheen added this to the 0918 milestone Sep 5, 2018

shauheen removed this from the 0918 milestone Sep 25, 2018

zeahmed self-assigned this Jan 4, 2019

zeahmed removed their assignment Feb 12, 2019

shauheen mentioned this issue Feb 22, 2019

Modify API for FeaturizeText ? #2460

Closed

najeeb-kazmi mentioned this issue Feb 28, 2019

TextFeaturizer cannot specify n-grams for words or characters #2802

Closed

shauheen closed this as completed Mar 1, 2019

shauheen reopened this Mar 1, 2019

zeahmed self-assigned this Mar 4, 2019

shauheen added this to the 0319 milestone Mar 5, 2019

This was referenced Mar 11, 2019

Exposed ngram extraction options in TextFeaturizer #2911

Merged

Made 'StopWordsRemover' in TextFeaturizer configurable again. #2962

Merged

zeahmed closed this as completed Mar 21, 2019

ghost locked as resolved and limited conversation to collaborators Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make TextFeaturizer's extractors etc. configurable again #838

Make TextFeaturizer's extractors etc. configurable again #838

Zruty0 commented Sep 5, 2018 •

edited by zeahmed

Loading

justinormont commented Sep 13, 2018

eerhardt commented Mar 2, 2019

Make TextFeaturizer's extractors etc. configurable again #838

Make TextFeaturizer's extractors etc. configurable again #838

Comments

Zruty0 commented Sep 5, 2018 • edited by zeahmed Loading

justinormont commented Sep 13, 2018

eerhardt commented Mar 2, 2019

Zruty0 commented Sep 5, 2018 •

edited by zeahmed

Loading