-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding common terms to phrases model #1568
Adding common terms to phrases model #1568
Conversation
@gojomo did you want to review this ? |
@piskvorky you proposed me once your help on #1263. This is the corresponding new PR. It has been stalled yet. |
Thanks for the PR @alexgarel ! @menshikh-iv will review soon, sorry for the long wait, we were busy refactoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good for me, big thanks @alexgarel 🔥, please add explicit inheritance and I'll merge your PR
gensim/models/phrases.py
Outdated
@@ -98,7 +114,53 @@ def _is_single(obj): | |||
return False, obj_iter | |||
|
|||
|
|||
class Phrases(interfaces.TransformationABC): | |||
class SentenceAnalyzer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inherit object
explicitly
gensim/models/phrases.py
Outdated
class Phrases(interfaces.TransformationABC): | ||
class SentenceAnalyzer: | ||
|
||
def analyze_sentence(self, sentence, threshold, common_terms, scoring): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You create a new class for avoiding copy-paste in two child classes, I'm correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's it.
@menshikh-iv I did the requested changes. Appveyor did not pass, but for a strange reason that seems uncorrelated… |
Nice @alexgarel, please resolve merge conflict and we'll merge this PR. |
@alexgarel please ping me when you'll be ready to review. |
@menshikh-iv I finally did it :-) |
Very nice feature, big thanks @alexgarel 🔥 🥇 Contgrazt with first contribution 👍 |
Hi @alexgarel, we think that this feature is very nice, for this reason, we want to write the blogpost about it. Can you help with blogpost (or even write yourself)? |
@alexgarel even a short summary would be useful, to start with: what is this new functionality for, what can be done now that couldn't be done before, how to use it properly. We'll include your description in the release notes, to give users motivation to try it out and use it :) |
Hello, thanks a lot for the encouragement. Time is quite a problem for me in this very moment, what is the schedule ? |
Summary and motivation "for dummies": 1-3 paragraphs, as soon as you can. Blog post: at your leisure; do you think you could have a draft by the end of November? |
Ok, I'll try to do my best :-) |
@piskvorky I though the Summary and motivation "for dummies" was meant to be in release notes… but release note is already published, isn't it ? However here is a short description: Phrases now as a It offers an alternative to common terms (aka stop words) removal before bigram detection, allowing to keep the information they carry. It may for example capture « eye of the beholder » or « code of conduct » and also distinguish « car with driver » from « car without driver ». |
It is already published, but can be edited :) The summary is also great for the module docstrings. Thanks a lot! |
I have a Phrases model that was computed using Gensim 2.2.0 but because of these changes, I cannot load it anymore. This is the error I get:
Any suggestions on what I can do to fix this? :) Cheers! |
@menshikh-iv was this tested for backward compatibility? I think this could be easy (using empty Sorry @dldx , bug fix coming soon :) Can you open a new issue for this? |
Thanks for developing Gensim! It's great! |
This is an implementation for #1258 (and a replacement to aborted PR #1263).
I've tested different implementation. This one in my view is effective and helps code maintenability.