Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating WordEmbeddingSimilarityIndex imports #3076

Closed
wants to merge 1 commit into from
Closed

Updating WordEmbeddingSimilarityIndex imports #3076

wants to merge 1 commit into from

Conversation

magic-lantern
Copy link

Current version of gensim library (3.8.3) has changed importing of WordEmbeddingSimilarityIndex from:

from gensim.similarities import WordEmbeddingSimilarityIndex

to:

from gensim.models import WordEmbeddingSimilarityIndex

Current version of gensim library (3.8.3) has changed importing of WordEmbeddingSimilarityIndex from:

`from gensim.similarities import WordEmbeddingSimilarityIndex`

to:

`from gensim.models import WordEmbeddingSimilarityIndex`
@magic-lantern
Copy link
Author

Also, as an FYI I've noticed that there are duplicate imports - not sure if that is intentional.

@piskvorky
Copy link
Owner

piskvorky commented Mar 15, 2021

Thanks, but this change doesn't seem correct.

from gensim.similarities import WordEmbeddingSimilarityIndex is indeed correct, as of the latest release (Gensim 4.0.0beta). Importing from gensim.models would result in an ImportError.

If you're using an older version of Gensim, please follow its (older) documentation. See the big red banner on top of https://radimrehurek.com/gensim/.

@piskvorky piskvorky closed this Mar 15, 2021
@magic-lantern
Copy link
Author

No problem - I thought since version 4 is still beta that the example notebooks should run with the current non-beta version (3.8.3 according to this github site).

@piskvorky
Copy link
Owner

piskvorky commented Mar 15, 2021

Aah, I see what you mean.

You're referring to the notebooks directory, not the standard documentation or tutorials from the Gensim website.

@mpenkov do we want to keep that notebooks directory? It's unmaintained, untested. Not sure how people find it, but clearly it leads to some confusion.

@magic-lantern how did you arrive at that notebook?

@magic-lantern
Copy link
Author

The docs/notebooks folder has a lot of great tutorials in notebook form.

I found this stuff a few months back, so don't remember the specific link that took me there - probably found it via some search engine. Here are a few of the places that link to these notebooks:

Note the 4.0.0 beta documentation site and the 3.8.3 documentation site both link to the same notebook.

@magic-lantern
Copy link
Author

@piskvorky - One more comment - since the current documentation does link to the notebooks and they are very helpful, at least for me - I recommend keeping them. These notebooks are great as they provide numerous end to end examples with code, sample data, explanations, etc.

@piskvorky
Copy link
Owner

piskvorky commented Mar 15, 2021

OK, thanks for your input.

Which means I shouldn't have closed this – the notebook(s) ought to be updated as well. It's just that they're not under automated testing, so who knows which ones still work… The ideal solution would be to transform notebooks that are still relevant into proper tutorials & how-to guides in the Gallery section. Because these do get tested and maintained. The notebooks dir just rots.

@magic-lantern any appetite for transforming the soft_cosine_tutorial.ipynb notebook and then removing it from docs/notebooks/? :)
CC @Witiko

@piskvorky piskvorky reopened this Mar 15, 2021
@mpenkov
Copy link
Collaborator

mpenkov commented Mar 15, 2021

do we want to keep that notebooks directory? It's unmaintained, untested. Not sure how people find it, but clearly it leads to some confusion.

My personal opinion is that we should transform as much of the notebooks as possible into runnable documentation to avoid the problem that you've described. This opinion isn't universally accepted (as I recall, @gojomo favors the notebooks over the Sphinx gallery). Furthermore, there'll always be some documentation that is a poor fit for the gallery (e.g. examples that take hours to run). I'm not sure what the best way to proceed there is. Perhaps we can add "re-run all notebooks and update their output" to the release checklist? Not ideal, because it's a PITA for the maintainer (me) and will become yet another sticking point in our release procedures.

@piskvorky
Copy link
Owner

piskvorky commented Mar 15, 2021

Nah, re-running all notebooks manually on each release is not a viable option. -1 on that, we'd go insane :)

We do have a script to auto-convert .ipynb into .py though. So I hope the transformation is easy.

there'll always be some documentation that is a poor fit for the gallery (e.g. examples that take hours to run)

But the tutorials are not run in CI always, right? Only when the tutorial file changes. So is this really an issue?

Here's what I'm thinking, please correct if wrong:

  1. Contributor modifies tutorial.py.
  2. Contributor runs the sphinx suite, which will automatically regenerate the gallery files: .ipynb, .rst, .html, whatever (yes, can take hours).
  3. Contributor commits the modified files.
  4. CI is triggered. No changes detected, all up to date => CI runs quickly.
  5. We merge PR and release => tutorial update is visible to all users.

@Witiko
Copy link
Contributor

Witiko commented Mar 15, 2021

@piskvorky We have already converted parts of the soft_cosine_tutorial.ipynb notebook to an autoexample in #3003. However, the notebook is much more extensive, whereas the autoexample is a very short and concise introduction. I am not sure we want to merge the two, since the resulting autoexample would be quite cluttered as a result. I don't mind if the notebook is deleted, but there are existing links that would break as a result. If we do delete the notebook, I suggest edits to the stackexchange posts, so that they link to the autoexample.

@magic-lantern
Copy link
Author

I think the Gensim library is great and am just trying to help improve things.

I looked a little more into the code base for and I see that you actually have a branch for the 3.8.3 release that has soft_cosine_tutorial.ipynb. I see now that the default branch is 'develop' is clearly meant for the 4.0.0 beta. In the 3.8.3 branch the notebook already has the correct import statement.

At this point, it seems that the most appropriate fix is for the links from the 3.8.3 documentation to go to the 3.8.3 branch instead of the default develop branch. My pull request should be cancelled - I don't want to break the tutorials that are meant to work in the 4.0.0 beta.

I'm not sure how the documentation available https://radimrehurek.com/gensim_3.8.3/similarities/docsim.html is generated, but I can investigate and create a different pull request so that the 3.8.3 documentation links to the right version of the notebook.

I support the inclusion of testing the notebooks and keeping them current!

Thoughts?

@mpenkov
Copy link
Collaborator

mpenkov commented Mar 16, 2021

I don't think there's a point in maintaining separate documentation for 3.8.3, or any version of gensim other than the most recent one. It would require maintainer resources that we don't have at the moment.

From what I understand, the 3.8.3 docs are still out there separately because:

  • 4.0.0 hasn't been officially released yet; and
  • 4.0.0 introduces a significantly different documentation layout and design, and we want to give people a chance to get used to it

@piskvorky Let me know if my understanding is incomplete.

@piskvorky
Copy link
Owner

Correct!

@piskvorky piskvorky closed this Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants