-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File-based fast training for Any2Vec models #2127
Merged
menshikh-iv
merged 133 commits into
piskvorky:develop
from
persiyanov:feature/multistream-training
Sep 14, 2018
Merged
Changes from all commits
Commits
Show all changes
133 commits
Select commit
Hold shift + click to select a range
39a2c11
CythonLineSentence
20c22f7
fix
dd0e9ca
fix setup.py
6203c77
fixes
03bf799
some refactoring
660493f
remove printf
1aedfe8
compiled
9ff0bb1
second branch for pystreams
9e498b7
fix
1d4a2a8
learning rate decay in Cython + _do_train_epoch + _train_epoch_multis…
97bac7e
add train_epoch_sg function
4de3a84
call _train_epoch_multistream from train()
36d1412
add word2vec_inner.cpp
625025b
remove pragma from .cpp
8173da8
Merge branch 'develop' into feature/multistream-training
bd0a0e0
fix doc
63663fa
fix pip
2ee2405
add __reduce__ to CythonLineSentence for proper pickling
8f8e817
remove printf
ac28bbb
add 1 test for CythonLineSentence
942a12f
no vocab copying
2a44fbc
fixed
e4a8ba0
Revert "fixed"
394a417
Revert "no vocab copying"
9ab6b1b
remove input_streams, add corpus_file
5d2e2cf
fix
0489561
fix replacing input_streams -> corpus_file in Word2Vec class
901cad4
upd .cpp
c09035c
add C++11 compiler flags
1e3c314
pep8
d6755be
add link args too
cc4680c
upd FastLineSentence
9978f6b
fix signatures in doc2vec/fasttext + removed tests on multistream
35333dd
fix flake
86b91ac
clean up base_any2vec.py
fca6f50
fix
45ca084
fix CythonLineSentence ctor
16bb386
fix py3 type error
c83b96f
fix again
1a21b0b
try again
dd83a3e
new error
c72f0b6
fix test
74e51b3
add unordered_map wrapper
58fc112
upd
5e70184
fix cython compiling errors
9727782
upd word2vec_inner.cpp
d97ac0c
add some tests
b6d7bb3
more tests for corpus_file
0c1fc5f
fix docstrings
fd66e34
addressing comments
da9f3da
fix tests skipIf
81329d6
add persistence test
f2ba633
online learning tests
51cec43
fix save_as_line_sentence
a72ddf1
fix again
aba7682
address new comments
03d44b2
fix test
e4e8cb2
move multistream functions from word2vec_inner to word2vec_multistream
3e989de
fix tests
d8c5cdc
add .c file
2a42b85
fix test
002a60c
fix tests skipIf and setup.py
3850f49
fix mac os compatibility
c1e8a9b
add tutorial on w2v multistream
7b7195b
300% -> 200% in notebook
3a8a915
add MULTISTREAM_VERSION global constant
6beb96a
first move towards multistream FastText
a2eb5fc
move MULTISTREAM_VERSION
57f7b66
fix error
83ce7c2
fix CythonVocab
a3ede08
regenerated .c & .cpp files
d38463e
resolve ambiguate fast_sentence_* declarations
ec4c677
add test_training_multistream for fasttext
a5311d2
add skipif
f499d5b
add more tests
645499c
fix flake8
dc1b98d
add short example
b9564e9
upd jupyter notebook
eefdd65
fix docstrings in doc2vec
f669979
add d2v_train_epoch_dbow for from-file training
e80189f
add missing parts of from-file doc2vec
cf6b032
refactored a bit
87d8ea7
add total_corpus_count calculation in doc2vec
e2851b4
Merge branch 'develop' into feature/multistream-training
persiyanov 1fdaa43
add tests for doc2vec file-based + rename MULTISTREAM -> CORPUSFILE e…
c2fa0d8
regenerated .c + .cpp files
5427416
add Word2VecConfig in order to remove repeating parts of code
7f7760b
make shared initialization
926fd5e
use init_config from word2vec_corpusfile
df47983
add FastTextConfig
0df7f6f
init_config -> init_w2v_config, init_ft_config
5fd1c99
regenerated .c & .cpp files
d9257be
using FastTextConfig in fasttext_corpusfile.pyx
67c572c
fix
8e82b9f
fix
db2a77f
fix next_random in w2v
a96bc6d
introduce Doc2VecConfig
3b4da64
fix init_d2v_config
53b967c
use Doc2VecConfig in doc2vec_corpusfile.pyx
f57d1cb
removed unused vars
b652afe
fix docstrings
260cfb5
fix more docstrings
a433018
test old model for doc2vec & fasttext
20ec49b
fix loading old models
1ced17d
fix fasttext model checking
0731449
merge fast_line_sentence.cpp and fast_line_sentence.h
35f0ab4
fix word2vec test
49905f0
fix syntax error
95c6ec9
remove redundanta seekg call
aed2b6b
fix example notebook
c1af621
add initial doc_tags computation
33bf97a
fix test
e592b6a
fix test for windows
d08e4c1
add one more test on offsets
468a000
get rid of subword_arrays in fasttext
f71e1f8
make hanging indents everywhere
811388b
open file in byte mode
ddd5901
fix pep
a3490c7
fix tests
a28ff0d
fix again
b2996f0
final fix?
64bb617
regenerated .c & .cpp files
816f63f
fix test_persistence_fromfile for FastText
abad1b8
add fasttext & doc2vec to notebook
0b03839
add short examples
6217c73
update file-based tutorial notebook
piskvorky f70d159
work credit + minor nb fixes
piskvorky 9593d5f
remove FIXMEs from file-based *2vec notebook
piskvorky 7b714b2
remove warnings in corpus_file mode
persiyanov b833f0f
fix deprecation warning
menshikh-iv bcc0fb9
regenerate .ipynb
persiyanov 384e0b1
upd plot
persiyanov 527266f
upd plot
persiyanov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ss this needed only for w2v? Why not the same change for other models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed