Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc2Vec docstring updated to reflect actual default value of sample parameter. Fix #1302. #1307

Merged
merged 1 commit into from
May 7, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions gensim/models/doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -580,7 +580,7 @@ def __init__(self, documents=None, dm_mean=None,
need about 1GB of RAM. Set to `None` for no limit (default).

`sample` = threshold for configuring which higher-frequency words are randomly downsampled;
default is 0 (off), useful value is 1e-5.
default is 1e-3, useful value is 1e-5.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

This doc reads very strange now: default value is X, useful value is Y. Why not use the useful value? 🤔
@gojomo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 'useful value' remark dates back to something in word2vec.c code, before it had a (modest) non-zero default. I suppose we could stop at just mentioning the default, or hint that more aggressive values are also worth trying with a comment like: "default is 1e-3; values to 1e-5 (or lower) may also be useful". (On large datasets, I've seen ok results with 1e-06 and 1e-07, which each offer additional training speedups.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. @menshikh-iv @datapythonista can you do this tiny fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky Yes, of course.


`workers` = use this many worker threads to train the model (=faster training with multicore machines).

Expand Down Expand Up @@ -616,7 +616,8 @@ def __init__(self, documents=None, dm_mean=None,

super(Doc2Vec, self).__init__(
sg=(1 + dm) % 2,
null_word=dm_concat, **kwargs)
null_word=dm_concat,
**kwargs)

self.load = call_on_class_only

Expand Down