Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong return types in docstrings for wv.evaluate_word_analogies and wv.evaluate_word_pairs #2203

Closed
Stigjb opened this issue Oct 1, 2018 · 3 comments · Fixed by #2207
Closed
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix documentation Current issue related to documentation Hacktoberfest Issues marked for hacktoberfest

Comments

@Stigjb
Copy link
Contributor

Stigjb commented Oct 1, 2018

This is the documented return type for KeyedVectors.evaluate_word_analogies:
https://github.com/RaRe-Technologies/gensim/blob/6a4424d3e465dea0afcddc4f4e2410cf0f52dba1/gensim/models/keyedvectors.py#L1070

The dicts in the second element of the return tuple have as values either a string or a list of 4-tuples. For instance:

{
    'section': 'family',
    'correct': [('he', 'she', 'dad', 'mom'), ...],
    'incorrect': [('man', 'woman', 'groom', 'bride'), ...]
}

So I believe the return value of evaluate_word_analogies is actually Tuple[float, List[Dict[str, Union[str, List[Tuple[str, str, str, str]]]]] (in PEP-484 style annotation).

Then, in KeyedVectors.evaluate_word_pairs, the documented return type is:
https://github.com/RaRe-Technologies/gensim/blob/6a4424d3e465dea0afcddc4f4e2410cf0f52dba1/gensim/models/keyedvectors.py#L1284

Here, the second and third elements of the return tuple are actually pairs of floats, since they are the values returned from scipy.stats.pearsonr and spearmanr, which return both the correlation and a p-value. I don't know whether the p-value should be discarded or the documented return type be updated to (float, (float, float), (float, float)).

@menshikh-iv menshikh-iv added the documentation Current issue related to documentation label Oct 2, 2018
@menshikh-iv
Copy link
Contributor

Hello @Stigjb, thanks for the report

So I believe the return value of evaluate_word_analogies is actually Tuple[float, List[Dict[str, Union[str, List[Tuple[str, str, str, str]]]]] (in PEP-484 style annotation).

I would be happy to use PEP484 annotation style, but I'm not sure how to use it in docstring (we use sphinx & napoleon plugin for numpy-style docstrings), also, we can't annotate arguments directly in the definition using typing backport lib (because of no new syntax support). Have you any thought about it?

Of course, this is a bug in current return type and should be fixed, but maybe you have some good ideas how to apply PEP484 in our case?

@menshikh-iv menshikh-iv added bug Issue described a bug difficulty easy Easy issue: required small fix Hacktoberfest Issues marked for hacktoberfest labels Oct 2, 2018
@Stigjb
Copy link
Contributor Author

Stigjb commented Oct 2, 2018

I looked at the documentation for Numpy-style docstrings as well as examples from Numpy and Pandas[1] with multiple return values, and think I found a way to describe these complex types better. See the PR.

The reason that I used PEP484 style in the original issue was just that it is the type annotation style I'm most fluent in, not a suggestion to switch over.

[1] pandas docstring guide # Parameter types

@Stigjb
Copy link
Contributor Author

Stigjb commented Oct 3, 2018

I found out that I got a detail wrong, please look at #2207 @menshikh-iv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty easy Easy issue: required small fix documentation Current issue related to documentation Hacktoberfest Issues marked for hacktoberfest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants