-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix documentation for gensim.corpora
. Partial fix #1671
#1729
Conversation
Merged C D C D C Merge D C C kk C C
Merge develop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please continue your work, what a voluminous PR 👍
gensim/corpora/bleicorpus.py
Outdated
Parameters | ||
---------- | ||
fname : str | ||
Serialized corpus's filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dot on the end of sentence (everywhere)
gensim/corpora/bleicorpus.py
Outdated
corpus : iterable | ||
Iterable of documents | ||
id2word : dict of (str, str), optional | ||
Transforms id to word (Default value = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no default values in docstrings (everywhere)
gensim/corpora/bleicorpus.py
Outdated
---------- | ||
fname : str | ||
Serialized corpus's filename | ||
fname_vocab : str or None, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to understand how to:
- Document multiple types of argument (i.e. when the parameter can be type
X
orY
) - Document multiple types for "Return" section
- Correctly specify the parent class (if there are many heirs)
gensim/corpora/bleicorpus.py
Outdated
---------- | ||
fname : str | ||
Filename | ||
corpus : iterable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterable of ... ? (here and everywhere)
gensim/corpora/bleicorpus.py
Outdated
|
||
Returns | ||
------- | ||
list of (int, float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing parameter description (here and everywhere)
gensim/corpora/indexedcorpus.py
Outdated
>>> corpus_with_random_access = gensim.corpora.SvmLightCorpus('tstfile.svmlight') | ||
>>> print(corpus_with_random_access[1]) | ||
[(0, 1.0), (1, 2.0)] | ||
>>> corpus = [[(1, 0.5)], [(0, 1.0), (1, 2.0)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples should be executable and split into 3 sections: imports, data preparation, direct functionality
>>> from .. import ...
>>> import ...
>>>
>>> data = ...
>>> makesomething(data)
gensim/corpora/lowcorpus.py
Outdated
return [word for word in utils.to_unicode(s).strip().split(' ') if word] | ||
|
||
|
||
class LowCorpus(IndexedCorpus): | ||
""" | ||
List_Of_Words corpus handles input in GibbsLda++ format. | ||
"""List_Of_Words corpus handles input in GibbsLda++ format. | ||
|
||
Quoting http://gibbslda.sourceforge.net/#3.2_Input_Data_Format:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link in other format
gensim/corpora/lowcorpus.py
Outdated
|
||
Parameters | ||
---------- | ||
s : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???? (empty descriptions here and everywhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs formally comply with numpy style now but not all type annotations and descriptions are there.
:)
gensim.corpora
. Partial fix #1671
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it was one of the first files in corpora, I didn't know about some of the specification features.
gensim/corpora/bleicorpus.py
Outdated
vocab/vocab.txt file. | ||
File path to Serialized corpus. | ||
fname_vocab : str, optional | ||
Vocabulary file. If `fname_vocab` is None, searching for the vocab.txt or `fname_vocab`.vocab file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure it's fname_vocab
.vocab? fname_vocab
is none, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, I added correct description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still don't get it. It should be `fname`.vocab, `fname_vocab`.vocab is undefined!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite :) I go through the code with ipdb for this case, this is significantly "wider" that we discuss here (I already fix it).
gensim/corpora/bleicorpus.py
Outdated
Filename. | ||
corpus : iterable | ||
Iterable of documents. | ||
Path to output filename. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To output file
gensim/corpora/bleicorpus.py
Outdated
Iterable of documents. | ||
Path to output filename. | ||
corpus : iterable of iterable of (int, float) | ||
Input corpus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obvious, no additional information provided. There's no need to have descriptions for all arguments. :)
gensim/corpora/bleicorpus.py
Outdated
@@ -153,16 +160,18 @@ def save_corpus(fname, corpus, id2word=None, metadata=False): | |||
return offsets | |||
|
|||
def docbyoffset(self, offset): | |||
"""Return document corresponding to `offset`. | |||
"""Get document corresponding to `offset`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First line of docstring should always end with a dot.
gensim/corpora/bleicorpus.py
Outdated
Parameters | ||
---------- | ||
fname : str | ||
File path to Serialized corpus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path to corpus here and in other corpora maybe?
gensim/corpora/bleicorpus.py
Outdated
fname : str | ||
File path to Serialized corpus. | ||
fname_vocab : str, optional | ||
Vocabulary file. If `fname_vocab` is None, searching for the vocab.txt or `fname_vocab`.vocab file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vocabulary file. If fname_vocab
is None, searching for the vocab.txt or fname
.vocab file.
gensim/corpora/bleicorpus.py
Outdated
fname : str | ||
Path to output filename. | ||
corpus : iterable of iterable of (int, float) | ||
Input corpus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still think that it's not necessary. Also, there's a dot missing at the end of the line.
gensim/corpora/bleicorpus.py
Outdated
@@ -121,8 +160,19 @@ def save_corpus(fname, corpus, id2word=None, metadata=False): | |||
return offsets | |||
|
|||
def docbyoffset(self, offset): | |||
""" | |||
Return the document stored at file position `offset`. | |||
"""Get document corresponding to `offset`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line should end with a dot.
gensim/corpora/indexedcorpus.py
Outdated
Parameters | ||
---------- | ||
fname : str | ||
Path to output filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dots at the end of the line. Did I miss these? O_o
gensim/corpora/svmlightcorpus.py
Outdated
|
||
def line2doc(self, line): | ||
""" | ||
Create a document from a single line (string) in SVMlight format | ||
"""Get a document from a single line in SVMlight format, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line should end with a dot.
gensim/corpora/wikicorpus.py
Outdated
Parameters | ||
---------- | ||
s : str | ||
String containing markup template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dot at the EOL.
gensim/corpora/wikicorpus.py
Outdated
token_min_len : int | ||
Minimal token length. | ||
token_max_len : int | ||
Maximal token length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dot
gensim/corpora/wikicorpus.py
Outdated
f : file | ||
File-like object. | ||
filter_namespaces : list of str or bool | ||
Namespaces that will be extracted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dot
the standard corpus interface instead of this function:: | ||
Notes | ||
----- | ||
This iterates over the **texts**. If you want vectors, just use the standard corpus interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dot
…iskvorky#1729) * Fix typo * Make `save_corpus` private * Annotate `bleicorpus.py` * Make __save_corpus weakly private * Fix _save_corpus in tests * Fix _save_corpus[2] * Document bleicorpus in Numpy style * Document indexedcorpus * Annotate csvcorpus * Add "Yields" section * Make `_save_corpus` public * Annotate bleicorpus * Fix indentation in bleicorpus * `_save_corpus` -> `save_corpus` * Annotate bleicorpus * Convert dictionary docs to numpy style * Convert hashdictionary docs to numpy style * Convert indexedcorpus docs to numpy style * Convert lowcorpus docs to numpy style * Convert malletcorpus docs to numpy style * Convert mmcorpus docs to numpy style * Convert sharded_corpus docs to numpy style * Convert svmlightcorpus docs to numpy style * Convert textcorpus docs to numpy style * Convert ucicorpus docs to numpy style * Convert wikicorpus docs to numpy style * Add sphinx tweaks * Remove trailing whitespaces * Annotate wikicorpus * SVMLight Corpus annotated * Fix TODO * Fix grammar mistake * Undo changes to dictionary * Undo changes to hashdictionary * Document indexedcorpus * Document indexedcorpus[2] Fix identation * Remove redundant files * Add more dots. :) * Fix monospace * remove useless method * fix bleicorpus * fix csvcorpus * fix indexedcorpus * fix svmlightcorpus * fix wikicorpus[1] * fix wikicorpus[2] * fix wikicorpus[3] * fix review comments
Fix #1671
Docs formally comply with numpy style now but not all type annotations and descriptions are there.