-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor reproducibility of out-of-vocab word vectors after loading native model #2315
Comments
Partial fix - 07f34e2 |
I've previously wondered if our code is in fact calculating FastText vectors properly in accordance with the original implementation. To copy my comments from another chat:
|
From the descriptions of #2313/#2160, they seem focused on other related functionality, so it's not clear they'd necessarily include tests that verify identical-word-vectors from a loaded Facebook-FT-trained model. Though of course, it'd be great to have such tests, because it's unclear gensim really supports FT unless it matches Facebook's library's output from loaded models. |
See this unit test for reproduction:
https://github.com/mpenkov/gensim/blob/0d30caeb8c6d165d63c050de4bf32a0eab241d48/gensim/test/test_fasttext.py#L891
The test passes only if the tolerance is very high (0.1). For lower tolerance values (e.g. 0.01 and below), the test fails.
The text was updated successfully, but these errors were encountered: