Wrong return types in docstrings for wv.evaluate_word_analogies and wv.evaluate_word_pairs #2203

Stigjb · 2018-10-01T12:48:01Z

This is the documented return type for KeyedVectors.evaluate_word_analogies:
https://github.com/RaRe-Technologies/gensim/blob/6a4424d3e465dea0afcddc4f4e2410cf0f52dba1/gensim/models/keyedvectors.py#L1070

The dicts in the second element of the return tuple have as values either a string or a list of 4-tuples. For instance:

{
    'section': 'family',
    'correct': [('he', 'she', 'dad', 'mom'), ...],
    'incorrect': [('man', 'woman', 'groom', 'bride'), ...]
}

So I believe the return value of evaluate_word_analogies is actually Tuple[float, List[Dict[str, Union[str, List[Tuple[str, str, str, str]]]]] (in PEP-484 style annotation).

Then, in KeyedVectors.evaluate_word_pairs, the documented return type is:
https://github.com/RaRe-Technologies/gensim/blob/6a4424d3e465dea0afcddc4f4e2410cf0f52dba1/gensim/models/keyedvectors.py#L1284

Here, the second and third elements of the return tuple are actually pairs of floats, since they are the values returned from scipy.stats.pearsonr and spearmanr, which return both the correlation and a p-value. I don't know whether the p-value should be discarded or the documented return type be updated to (float, (float, float), (float, float)).

The text was updated successfully, but these errors were encountered:

menshikh-iv · 2018-10-02T04:28:49Z

Hello @Stigjb, thanks for the report

So I believe the return value of evaluate_word_analogies is actually Tuple[float, List[Dict[str, Union[str, List[Tuple[str, str, str, str]]]]] (in PEP-484 style annotation).

I would be happy to use PEP484 annotation style, but I'm not sure how to use it in docstring (we use sphinx & napoleon plugin for numpy-style docstrings), also, we can't annotate arguments directly in the definition using typing backport lib (because of no new syntax support). Have you any thought about it?

Of course, this is a bug in current return type and should be fixed, but maybe you have some good ideas how to apply PEP484 in our case?

Stigjb · 2018-10-02T11:58:29Z

I looked at the documentation for Numpy-style docstrings as well as examples from Numpy and Pandas^[1] with multiple return values, and think I found a way to describe these complex types better. See the PR.

The reason that I used PEP484 style in the original issue was just that it is the type annotation style I'm most fluent in, not a suggestion to switch over.

[1] pandas docstring guide # Parameter types

Stigjb · 2018-10-03T09:22:12Z

I found out that I got a detail wrong, please look at #2207 @menshikh-iv

menshikh-iv added the documentation Current issue related to documentation label Oct 2, 2018

menshikh-iv added bug Issue described a bug difficulty easy Easy issue: required small fix Hacktoberfest Issues marked for hacktoberfest labels Oct 2, 2018

Stigjb mentioned this issue Oct 2, 2018

Update documented return types of evaluation funcs #2205

Merged

menshikh-iv closed this as completed in #2205 Oct 3, 2018

Stigjb mentioned this issue Oct 3, 2018

Fix return type in KeyedVector.evaluate_word_analogies #2207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong return types in docstrings for wv.evaluate_word_analogies and wv.evaluate_word_pairs #2203

Wrong return types in docstrings for wv.evaluate_word_analogies and wv.evaluate_word_pairs #2203

Stigjb commented Oct 1, 2018

menshikh-iv commented Oct 2, 2018

Stigjb commented Oct 2, 2018

Stigjb commented Oct 3, 2018

Wrong return types in docstrings for wv.evaluate_word_analogies and wv.evaluate_word_pairs #2203

Wrong return types in docstrings for wv.evaluate_word_analogies and wv.evaluate_word_pairs #2203

Comments

Stigjb commented Oct 1, 2018

menshikh-iv commented Oct 2, 2018

Stigjb commented Oct 2, 2018

Stigjb commented Oct 3, 2018