Refactor documentation for `gensim.models.phrases` #1950

CLearERR · 2018-03-04T18:38:50Z

menshikh-iv · 2018-03-14T04:19:18Z

gensim/models/phrases.py

+
+        Parameters
+        ----------
+        worda : str


don't forget about descriptions

menshikh-iv · 2018-03-14T04:19:50Z

gensim/models/phrases.py

+        Parameters
+        ----------
+        args : object
+            Sequence of arguments, see :meth:`...` for more information.


...? you should link to SaveLoad.load I think

menshikh-iv · 2018-03-14T04:21:13Z

gensim/models/phrases.py

-    and `phrases[corpus]` syntax.
-
+    """Detect phrases, based on collected collocation counts. Adjacent words that appear together more frequently than
+    expected are joined together with the `_` character. It can be used to generate phrases on the fly,


_ - this can be changed

menshikh-iv · 2018-03-14T04:21:33Z

gensim/models/phrases.py

-        setting. `scoring` can be set with either a string that refers to a built-in scoring function,
-        or with a function with the expected parameter names. Two built-in scoring functions are available
-        by setting `scoring` to a string:
+        sentences : list of str, optional


iterable of list of str

menshikh-iv · 2018-03-14T04:21:49Z

gensim/models/phrases.py

+        min_count : int, optional
+            Ignore all words and bigrams with total collected count lower
+            than this.
+        threshold : int, optional


menshikh-iv · 2018-03-14T04:22:56Z

gensim/models/phrases.py

+            available memory you have.
+        delimiter : str, optional
+            Glue character used to join collocation tokens, should be a byte string (e.g. b'_').
+        scoring : {'default', 'npmi'} http://www.sphinx-doc.org/en/master/rest.html


what's a link?

menshikh-iv · 2018-03-14T04:23:15Z

gensim/models/phrases.py

+            Specify how potential phrases are scored for comparison to the `threshold` setting.
+            `scoring` can be set with either a string that refers to a built-in scoring function, or with a function
+            with the expected parameter names. Two built-in scoring functions are available by setting `scoring` to a
+            string:

        'default': from "Efficient Estimaton of Word Representations in Vector Space" by


missing part (don't forget to use enumerate list)

menshikh-iv · 2018-03-14T04:23:42Z

gensim/models/phrases.py

+        Parameters
+        ----------
+        args : object
+            Sequence of arguments, see :meth:`...` for more information.


same comments as for previous load

menshikh-iv · 2018-03-14T04:24:06Z

gensim/models/phrases.py

@@ -373,7 +408,17 @@ def __str__(self):
    @staticmethod
    def learn_vocab(sentences, max_vocab_size, delimiter=b'_', progress_per=10000,
                    common_terms=frozenset()):
-        """Collect unigram/bigram counts from the `sentences` iterable."""
+        """Collect unigram/bigram counts from the `sentences` iterable. #TODO: Через пустой Phrasers


#TODO: Через пустой Phrasers - so Russian :D

menshikh-iv · 2018-03-14T04:24:54Z

gensim/models/phrases.py

        try:
            return self.phrasegrams[tuple(components)][1]
        except KeyError:
            return -1

    def __getitem__(self, sentence):
-        """
-        Convert the input tokens `sentence` (=list of unicode strings) into phrase
+        """Convert the input tokens `sentence` (=list of unicode strings) into phrase


don't use this (=list of unicode strings), better to write concrete types for arguments.

menshikh-iv · 2018-03-21T06:08:00Z

gensim/models/phrases.py

@@ -68,11 +70,6 @@
 >>> print(bigram[sent])
 [u'the', u'mayor', u'shows', u'his', u'lack_of_interest']


You should fix this example

menshikh-iv · 2018-03-21T06:08:43Z

gensim/models/phrases.py

+            with the expected parameter names. Two built-in scoring functions are available by setting `scoring` to a
+            string:
+
+            1. `default` - :meth:`~gensim.models.phrases.original_scorer`.


this is :func:, not :meth:

menshikh-iv · 2018-03-21T06:09:07Z

gensim/models/phrases.py

+        >>> from gensim.models.phrases import Phrases
+        >>> sentences = Text8Corpus(datapath('testcorpus.txt'))
+        >>> bigram = Phrases(sentences, min_count=5, threshold=100)
+        >>> print bigram


That's a bad example, you should to show bigram extraction here

menshikh-iv · 2018-03-21T06:10:14Z

gensim/models/phrases.py

+        Parameters
+        ----------
+        args : object
+            Sequence of arguments, see :class:`~gensim.models.phrases.Phrases` for more information.


Incorrect links, you should refer to parent class (i.e. SaveLoad.load)

menshikh-iv · 2018-03-21T06:10:46Z

gensim/models/phrases.py

+        >>> from gensim.models.phrases import Phrases
+        >>> sentences = Text8Corpus(datapath('testcorpus.txt'))
+        >>> learned = Phrases.learn_vocab(sentences,40000)
+        >>> print learned


what is it?

menshikh-iv · 2018-03-21T06:15:24Z

gensim/models/phrases.py

+        >>> from gensim.models.phrases import Phrases, Phraser
+        >>> sentences = Text8Corpus(datapath('testcorpus.txt'))
+        >>> phrases_model = Phrases(sentences, min_count=5, threshold=100)
+        >>> phraser_model = Phraser(phrases_model)


and what? how to use this classes?

menshikh-iv · 2018-03-21T06:15:37Z

gensim/models/phrases.py

+        >>> phrases_model = Phrases(sentences, min_count=5, threshold=100)
+        >>> phraser_model = Phraser(phrases_model)
+        >>> pseudo = phraser_model.pseudocorpus(phrases_model)
+        //>>> phraser_model.score_item("tree","human",pseudo,'default')


menshikh-iv · 2018-03-21T06:16:09Z

gensim/models/phrases.py

+        >>> phraser_model = Phraser(phrases_model)
+        >>> pseudo = phraser_model.pseudocorpus(phrases_model)
+        //>>> phraser_model.score_item("tree","human",pseudo,'default')
+        >>> phraser_model.score_item(u"tree",u"human",pseudo,'default')


I'm not sure about demonstration of this function, this isn't really needed

menshikh-iv · 2018-03-21T06:16:41Z

gensim/models/phrases.py

+        >>> sentences = Text8Corpus(datapath('testcorpus.txt'))
+        >>> phrases_model = Phrases(sentences, min_count=5, threshold=100)
+        >>> phraser_model = Phraser(phrases_model)
+        >>> pseudo = phraser_model.pseudocorpus(phrases_model)


why did this need?

menshikh-iv · 2018-03-21T06:17:01Z

gensim/models/phrases.py

+        >>> phrases_model = Phrases(sentences, min_count=5, threshold=100)
+        >>> phraser_model = Phraser(phrases_model)
+        >>> pseudo = phraser_model.pseudocorpus(phrases_model)
+        >>> phraser_model["tree", "human"]


incorrect, please use phraser_model[["tree", "human"]]

menshikh-iv · 2018-04-03T10:49:47Z

Good work @CLearERR 👍

piskvorky

Minor code style fixes needed.

piskvorky · 2018-04-04T11:42:19Z

gensim/models/phrases.py

@@ -933,12 +956,40 @@ def score_item(self, worda, wordb, components, scorer):
        >>> from gensim.models.word2vec import Text8Corpus
        >>> from gensim.models.phrases import Phrases, Phraser
        >>> sentences = Text8Corpus(datapath('testcorpus.txt'))
+        >>> #train the detector with


PEP8: # followed by one space (here and elsewhere).

piskvorky · 2018-04-04T11:42:40Z

gensim/models/phrases.py

+        >>> #So we get 2 phrases
+        >>> res = phraser_model[sent]
+        >>> for phrase in res:
+        >>>     print phrase


Best use brackets, for py3k compatibility.

Create PR

f0722da

menshikh-iv added the incubator project PR is RaRe incubator project label Mar 5, 2018

CLearERR added 5 commits March 6, 2018 00:40

Small changes

58aef2e

More small changes

7937e5e

Merge remote-tracking branch 'upstream/develop' into modelphrases

866d650

Additions

acce08a

Additions II

78dd276

menshikh-iv suggested changes Mar 14, 2018

View reviewed changes

menshikh-iv changed the title ~~Create PRRefactor documentation for gensim.models.phrases.~~ Refactor documentation for gensim.models.phrases Mar 14, 2018

CLearERR added 4 commits March 16, 2018 01:31

Additions III

87e571f

Updates & Example I

8f28017

Updates N

38422e0

Added examples

2e2d0a1

menshikh-iv suggested changes Mar 21, 2018

View reviewed changes

CLearERR added 7 commits March 22, 2018 01:33

Partial fix

0f3a972

Merge remote-tracking branch 'upstream/develop' into modelphrases

76462be

Fixed phrases example

c644a80

Improved examples(beta)

ac39eb4

Fixed links

4aa2e29

More examples II

f9a34f0

Final checks

12a3bb5

menshikh-iv added 4 commits April 3, 2018 11:52

fix phrases[1]

aa9c535

fix phrases[2]

c955e7d

fix phrases[3]

3844e4f

fix phrases[4]

0e3bb9e

menshikh-iv merged commit 5677ab3 into piskvorky:develop Apr 3, 2018

piskvorky reviewed Apr 4, 2018

View reviewed changes

menshikh-iv added the style checking label Apr 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor documentation for `gensim.models.phrases` #1950

Refactor documentation for `gensim.models.phrases` #1950

CLearERR commented Mar 4, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 14, 2018

menshikh-iv Mar 21, 2018

menshikh-iv Mar 21, 2018

menshikh-iv Mar 21, 2018

menshikh-iv Mar 21, 2018

menshikh-iv Mar 21, 2018

menshikh-iv Mar 21, 2018

menshikh-iv Mar 21, 2018

menshikh-iv Mar 21, 2018

menshikh-iv Mar 21, 2018

menshikh-iv Mar 21, 2018

menshikh-iv commented Apr 3, 2018

piskvorky left a comment

piskvorky Apr 4, 2018

piskvorky Apr 4, 2018

		@@ -68,11 +70,6 @@
		>>> print(bigram[sent])
		[u'the', u'mayor', u'shows', u'his', u'lack_of_interest']

Refactor documentation for gensim.models.phrases #1950

Refactor documentation for gensim.models.phrases #1950

Conversation

CLearERR commented Mar 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Apr 3, 2018

piskvorky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Refactor documentation for `gensim.models.phrases` #1950

Refactor documentation for `gensim.models.phrases` #1950