You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FeaturizeText was upgraded to allow specification of n-grams for words and characters. However, now it awkward to use FeaturizeTextwithout specifying n-grams. It is now necessary to explicitly set CharFeatureExtractor as null.
This is how to compose a bag-of-words with the current API:
What is char-level tokenization? My impression is a process to generate ['a', 'b', 'c'] out of "abc". Also, I personally consider ['a', 'b', 'c'] as 1-grams. Therefore, char-level tokenization is valid only if NgramLength is greater than 1 (precisely equal to 1), and we'd better throw when NgramLength=0. Unfortunately, I don't have another solution to make disabling char-level tokenization easier.. @zeahmed, any comment?
FeaturizeText
was upgraded to allow specification of n-grams for words and characters. However, now it awkward to useFeaturizeText
without specifying n-grams. It is now necessary to explicitly setCharFeatureExtractor
asnull
.This is how to compose a bag-of-words with the current API:
I would expect to be able to do something like
But this throws an error that
Skipgrams
is not less-thanNgramLength
, andSkipgrams
must be positive.Overall, it is a bit awkward and not obvious that you have to manually null a option. Is this the API we want to ship in v1.0?
The text was updated successfully, but these errors were encountered: