Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding common terms to phrases model #1568

Merged
merged 7 commits into from
Nov 1, 2017

Conversation

alexgarel
Copy link
Contributor

This is an implementation for #1258 (and a replacement to aborted PR #1263).

I've tested different implementation. This one in my view is effective and helps code maintenability.

@alexgarel
Copy link
Contributor Author

@gojomo did you want to review this ?

@alexgarel
Copy link
Contributor Author

@piskvorky you proposed me once your help on #1263. This is the corresponding new PR. It has been stalled yet.
No hurry, you maybe focused on other topics, but just wanted you to know it exists :-)

@piskvorky
Copy link
Owner

Thanks for the PR @alexgarel ! @menshikh-iv will review soon, sorry for the long wait, we were busy refactoring.

Copy link
Contributor

@menshikh-iv menshikh-iv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good for me, big thanks @alexgarel 🔥, please add explicit inheritance and I'll merge your PR

@@ -98,7 +114,53 @@ def _is_single(obj):
return False, obj_iter


class Phrases(interfaces.TransformationABC):
class SentenceAnalyzer:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inherit object explicitly

class Phrases(interfaces.TransformationABC):
class SentenceAnalyzer:

def analyze_sentence(self, sentence, threshold, common_terms, scoring):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You create a new class for avoiding copy-paste in two child classes, I'm correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's it.

@alexgarel
Copy link
Contributor Author

@menshikh-iv I did the requested changes.

Appveyor did not pass, but for a strange reason that seems uncorrelated…

@menshikh-iv
Copy link
Contributor

Nice @alexgarel, please resolve merge conflict and we'll merge this PR.

@menshikh-iv
Copy link
Contributor

@alexgarel please ping me when you'll be ready to review.

@alexgarel
Copy link
Contributor Author

@menshikh-iv I finally did it :-)

@menshikh-iv
Copy link
Contributor

Very nice feature, big thanks @alexgarel 🔥 🥇 Contgrazt with first contribution 👍

@menshikh-iv menshikh-iv merged commit b4515e0 into piskvorky:develop Nov 1, 2017
@menshikh-iv
Copy link
Contributor

Hi @alexgarel, we think that this feature is very nice, for this reason, we want to write the blogpost about it. Can you help with blogpost (or even write yourself)?

@piskvorky
Copy link
Owner

piskvorky commented Nov 13, 2017

@alexgarel even a short summary would be useful, to start with: what is this new functionality for, what can be done now that couldn't be done before, how to use it properly.

We'll include your description in the release notes, to give users motivation to try it out and use it :)

@alexgarel
Copy link
Contributor Author

Hello, thanks a lot for the encouragement. Time is quite a problem for me in this very moment, what is the schedule ?

@piskvorky
Copy link
Owner

piskvorky commented Nov 13, 2017

Summary and motivation "for dummies": 1-3 paragraphs, as soon as you can.

Blog post: at your leisure; do you think you could have a draft by the end of November?

@alexgarel
Copy link
Contributor Author

Ok, I'll try to do my best :-)

@alexgarel
Copy link
Contributor Author

@piskvorky I though the Summary and motivation "for dummies" was meant to be in release notes… but release note is already published, isn't it ?

However here is a short description:

Phrases now as a common_terms optional keyword arguments. You can pass it a list of words. Those words will have a special behaviour : their presence between two words won't prevent bigram detection but they will be kept if the resulting expression has a high enough score. Frequency of those common terms alone are not taken into account when computing the score of the expression.

It offers an alternative to common terms (aka stop words) removal before bigram detection, allowing to keep the information they carry. It may for example capture « eye of the beholder » or « code of conduct » and also distinguish « car with driver » from « car without driver ».

@piskvorky
Copy link
Owner

It is already published, but can be edited :) The summary is also great for the module docstrings.

Thanks a lot!

@dldx
Copy link

dldx commented Dec 2, 2017

I have a Phrases model that was computed using Gensim 2.2.0 but because of these changes, I cannot load it anymore. This is the error I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>()

~/Stuff/Sources/anaconda3/envs/nlp/lib/python3.6/site-packages/gensim/models/phrases.py in __init__(self, phrases_model)
    546         self.delimiter = phrases_model.delimiter
    547         self.scoring = phrases_model.scoring
--> 548         self.common_terms = phrases_model.common_terms
    549         corpus = self.pseudocorpus(phrases_model)
    550         self.phrasegrams = {}

AttributeError: 'Phrases' object has no attribute 'common_terms'

Any suggestions on what I can do to fix this? :)

Cheers!

@piskvorky
Copy link
Owner

piskvorky commented Dec 2, 2017

@menshikh-iv was this tested for backward compatibility?

I think this could be easy (using empty common_terms on load if it's missing, or something like that), but we definitely need that.

Sorry @dldx , bug fix coming soon :) Can you open a new issue for this?

@dldx
Copy link

dldx commented Dec 2, 2017

@piskvorky #1751

Thanks for developing Gensim! It's great!

@gojomo gojomo mentioned this pull request Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants