You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text transform is an incredible time savings and lowers the bar to entry for making a good NLP model.
Building from the individual blocks is a rough road to travel, and doesn't add much extra power. The only case I recall needing to use the individual blocks was when using the lemmatizer, which I don't think is available in ML.NET.
Also, I'd recommend Bigrams+Trichar as the defaults, which matches our default text recipe.
Is this strictly adding new API? Can this be done without a public API breaking change? If so, I think we can remove it from Project 13, and it can be added after v1.0.
But if this requires a public API breaking change, then it can be left in Project 13.
#801 is making the parameters default and hardcoded for the following options of Text featurizer:
Once individual building blocks become estimators, we should bring these parameters back (in a form of estimator for word/char extractor etc.).
Or maybe we shouldn't, and instead just demonstrate how to compose your version of text transform from the individual building blocks?
The text was updated successfully, but these errors were encountered: