fixes error of get_feature_names removal #235
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Error when using scikit-learn >= 1.2.0
pyLDAvis.sklearn.prepare raises an error due to a missing method get_feature_names() for the vectorizer argument.
AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'
Using the documentation of sklearn.feature_extraction.text.CountVectorizer as an example. It can be seen this function was deprecated in 1.0 docs, and removed in 1.2 docs. The same is true for the other vectorizer that can be used TfidfVectorizer.
The recommendation in those docs is to use get_feature_names_out() as a replacement.
Instead of returning a list of feature names, this now returns an ndarray of them. Though both being iterable types it makes no difference for the use case, where reference is only required to array-like.
This fix would also be backwards compatible to at least scikit-learn 1.0.
Tested on a fresh conda environment with Python==3.10.8, and gives expected behaviour.