Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update numpy -> 2.0 and removing deprecated scipy #3595

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
09b18d0
dynamic number of features for tf-idf
Utopiah Aug 1, 2022
a2c43ab
Merge remote-tracking branch 'upstream/develop' into Utopiah_patch-1
mpenkov Aug 22, 2022
35a3a94
Issue-3528: Corrected file argument name in KeyedVectors.save
hammad7 May 27, 2024
26a4a8d
use newer unittest.mock everwhere (#3529)
a-detiste Jun 11, 2024
795c7e0
Merge pull request #3373 from Utopiah/patch-1
piskvorky Jun 11, 2024
caeca0c
Merge pull request #3532 from hammad7/feature/doc_update_issue_3528
piskvorky Jun 11, 2024
c95bdb4
Handle optional parameters without implicit bool cast (#3502)
nk-fouque Jun 11, 2024
bea670a
Bump codecov/codecov-action from 3 to 4 (#3514)
dependabot[bot] Jun 11, 2024
dc5b5c4
pin numpy < 2.0
mpenkov Jun 12, 2024
afbb82e
pin scipy, need deprecated sparsetools
mpenkov Jul 18, 2024
79f982b
removed scipy from build-only dependencies (#3538)
filip-komarzyniec Jul 18, 2024
7321346
bumped version to 4.3.3
mpenkov Jul 19, 2024
ea7b162
updated CHANGELOG.md for version 4.3.3
mpenkov Jul 19, 2024
e9ee434
Merge branch 'release-4.3.3'
mpenkov Jul 19, 2024
54dfec9
Merge branch 'master' into develop
mpenkov Jul 19, 2024
bff9abc
adjust update_index.py
mpenkov Aug 9, 2024
2468417
add route4me.com as bronze sponsor (#3558)
mpenkov Aug 10, 2024
f34a077
test numpy==2.0.0rc2
gojomo Jun 4, 2024
3847b73
numpy==2.0.0.rc2 & python>=3.9 in pyproject.toml
gojomo Jun 4, 2024
712780d
use legal version id: `4.4.0a1.dev0`
gojomo Jun 4, 2024
b9119e0
changed numpy pin
hechth Nov 6, 2024
fe730c2
removed scipy sparsetools functions
hechth Nov 7, 2024
546657e
lint
hechth Nov 7, 2024
2176511
numpy string
hechth Nov 7, 2024
6c43d1b
removed python 3.8 from supported versions
hechth Nov 7, 2024
ca32ff0
added item to get explicit python scalar type
hechth Nov 7, 2024
4816645
Fix Keyvector stored as str of np.float32
julianpollmann Dec 6, 2024
3657d30
Changed np.alltrue() -> np.all()
julianpollmann Dec 9, 2024
122f4ae
Fix dtype float32/64 mismatch
julianpollmann Dec 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
#
# We use Py3.8 here for historical reasons.
#
python-version: "3.8"
python-version: "3.9"

- name: Update pip
run: python -m pip install -U pip
Expand All @@ -35,7 +35,7 @@ jobs:
sudo apt-get -yq update
sudo apt-get -yq remove texlive-binaries --purge
sudo apt-get -yq --no-install-suggests --no-install-recommends --force-yes install dvipng texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-latex-extra texlive-fonts-recommended latexmk
sudo apt-get -yq install build-essential python3.8-dev
sudo apt-get -yq install build-essential python3.9-dev
- name: Install gensim and its dependencies
run: pip install -e .[docs]

Expand Down
14 changes: 5 additions & 9 deletions .github/workflows/build-wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-20.04, windows-2019, macos-11]
os: [ubuntu-20.04, windows-2019, macos-12]
steps:

- name: Checkout
Expand Down Expand Up @@ -61,21 +61,17 @@ jobs:
fail-fast: false
matrix:
include:
- {python: '3.8', os: macos-11}
- {python: '3.9', os: macos-11}
- {python: '3.10', os: macos-11}
- {python: '3.11', os: macos-11}
- {python: '3.12', os: macos-11}
- {python: '3.9', os: macos-12}
- {python: '3.10', os: macos-12}
- {python: '3.11', os: macos-12}
- {python: '3.12', os: macos-12}

- {python: '3.8', os: ubuntu-20.04}
- {python: '3.9', os: ubuntu-20.04}
- {python: '3.10', os: ubuntu-20.04}
- {python: '3.11', os: ubuntu-20.04}
- {python: '3.12', os: ubuntu-20.04}

- {python: '3.8', os: windows-2019}
- {python: '3.9', os: windows-2019}

- {python: '3.10', os: windows-2019}
- {python: '3.11', os: windows-2019}
- {python: '3.12', os: windows-2019}
Expand Down
8 changes: 3 additions & 5 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
#
# We use Py3.8 here for historical reasons.
#
python-version: "3.8"
python-version: "3.9"

- name: Update pip
run: python -m pip install -U pip
Expand All @@ -43,7 +43,7 @@ jobs:
sudo apt-get -yq update
sudo apt-get -yq remove texlive-binaries --purge
sudo apt-get -yq --no-install-suggests --no-install-recommends --force-yes install dvipng texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-latex-extra texlive-fonts-recommended latexmk
sudo apt-get -yq install build-essential python3.8-dev
sudo apt-get -yq install build-essential python3.9-dev
- name: Install gensim and its dependencies
run: pip install -e .[docs]

Expand All @@ -63,13 +63,11 @@ jobs:
fail-fast: false
matrix:
include:
- {python: '3.8', os: ubuntu-20.04}
- {python: '3.9', os: ubuntu-20.04}
- {python: '3.10', os: ubuntu-20.04}
- {python: '3.11', os: ubuntu-20.04}
- {python: '3.12', os: ubuntu-20.04}

- {python: '3.8', os: windows-2019}
- {python: '3.9', os: windows-2019}
- {python: '3.10', os: windows-2019}
- {python: '3.11', os: windows-2019}
Expand Down Expand Up @@ -161,7 +159,7 @@ jobs:

- name: Upload coverage to Codecov
if: matrix.coverage == true
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4
with:
fail_ci_if_error: true
files: ./coverage.xml
Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/update_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,13 @@ def main():
for page in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=prefix):
for content in page.get('Contents', []):
key = content['Key']
print(f"<li><a href='{key}'>{key}</a></li>")
#
# NB. use double quotes in href because that's that
# wheelhouse_uploader expects.
#
# https://github.com/ogrisel/wheelhouse-uploader/blob/eb32a7bb410769bb4212a9aa7fb3bfa3cef1aaec/wheelhouse_uploader/fetch.py#L15
#
print(f"""<li><a href="{key}">{key}</a></li>""")
print("</ul></body></html>")


Expand Down
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
Changes
=======

## 4.3.3, 2024-07-19

### :star2: New Features

### :red_circle: Bug fixes

* Correct file argument name in KeyedVectors.save docstring (__[hammad7](https://github.com/hammad7)__, [#3532](https://github.com/piskvorky/gensim/pull/3532))
* Import deprecated scipy.linalg.triu from numpy.triu instead (__[Luffy610](https://github.com/Luffy610)__, [#3524](https://github.com/piskvorky/gensim/pull/3524))

### :books: Tutorial and doc improvements

* Updated the broken Documentation Link on the README.md (__[wittyicon29](https://github.com/wittyicon29)__, [#3505](https://github.com/piskvorky/gensim/pull/3505))

### :+1: Improvements

* Add support for python3.12 wheels (__[YoungMind1](https://github.com/YoungMind1)__, [#3531](https://github.com/piskvorky/gensim/pull/3531))
* Removed scipy from build-only dependencies (__[filip-komarzyniec](https://github.com/filip-komarzyniec)__, [#3538](https://github.com/piskvorky/gensim/pull/3538))
* Use newer unittest.mock everwhere (__[a-detiste](https://github.com/a-detiste)__, [#3529](https://github.com/piskvorky/gensim/pull/3529))
* Handle optional parameters without implicit bool cast (__[nk-fouque](https://github.com/nk-fouque)__, [#3502](https://github.com/piskvorky/gensim/pull/3502))

## 4.3.2, 2023-08-23

### :red_circle: Bug fixes
Expand Down
21 changes: 12 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,12 @@ on Wikipedia.
Installation
------------

This software depends on [NumPy and Scipy], two Python packages for
scientific computing. You must have them installed prior to installing
gensim.

It is also recommended you install a fast BLAS library before installing
NumPy. This is optional, but using an optimized BLAS such as MKL, [ATLAS] or
[OpenBLAS] is known to improve performance by as much as an order of
This software depends on [NumPy], a Python package for
scientific computing. Please bear in mind that building NumPy from source
(e.g. by installing gensim on a platform which lacks NumPy .whl distribution)
is a non-trivial task involving [linking NumPy to a BLAS library].
It is recommended to provide a fast one (such as MKL, [ATLAS] or
[OpenBLAS]) which can improve performance by as much as an order of
magnitude. On OSX, NumPy picks up its vecLib BLAS automatically,
so you don’t need to do anything special.

Expand All @@ -69,7 +68,9 @@ Or, if you have instead downloaded and unzipped the [source tar.gz]
package:

```bash
python setup.py install
tar -xvzf gensim-X.X.X.tar.gz
cd gensim-X.X.X/
pip install .
```

For alternative modes of installation, see the [documentation].
Expand Down Expand Up @@ -172,8 +173,10 @@ BibTeX entry:
[documentation and Jupyter Notebook tutorials]: https://github.com/RaRe-Technologies/gensim/#documentation
[Vector Space Model]: https://en.wikipedia.org/wiki/Vector_space_model
[unsupervised document analysis]: https://en.wikipedia.org/wiki/Latent_semantic_indexing
[NumPy and Scipy]: https://scipy.org/install/
[NumPy]: https://numpy.org/install/
[linking NumPy to a BLAS library]: https://numpy.org/devdocs/building/blas_lapack.html
[ATLAS]: https://math-atlas.sourceforge.net/
[OpenBLAS]: https://xianyi.github.io/OpenBLAS/
[source tar.gz]: https://pypi.org/project/gensim/
[documentation]: https://radimrehurek.com/gensim/#install

Binary file added docs/src/_static/images/route4me-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/src/auto_examples/core/run_core_concepts.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@
},
"outputs": [],
"source": [
"from gensim import similarities\n\nindex = similarities.SparseMatrixSimilarity(tfidf[bow_corpus], num_features=12)"
"from gensim import similarities\n\nindex = similarities.SparseMatrixSimilarity(tfidf[bow_corpus], num_features=max(tfidf.dfs) + 1)"
]
},
{
Expand Down Expand Up @@ -274,4 +274,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}
4 changes: 2 additions & 2 deletions docs/src/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@
# built documents.
#
# The short X.Y version.
version = '4.3'
version = '4.3.3'
# The full version, including alpha/beta/rc tags.
release = '4.3.2.dev0'
release = '4.3.3'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
5 changes: 5 additions & 0 deletions docs/src/people.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ Silver Sponsors
Bronze Sponsors
---------------

.. figure:: _static/images/route4me-logo.png
:target: https://route4me.com
:width: 50%
:alt: Route Optimizer and Route Planner Software

.. figure:: _static/images/eaccidents-logo.png
:target: https://eaccidents.com/
:width: 50%
Expand Down
2 changes: 1 addition & 1 deletion gensim/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

"""

__version__ = '4.3.2.dev0'
__version__ = '4.3.3'

import logging

Expand Down
2 changes: 0 additions & 2 deletions gensim/corpora/sharded_corpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@

"""

from __future__ import print_function

import logging
import os
import math
Expand Down
10 changes: 5 additions & 5 deletions gensim/models/keyedvectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -495,9 +495,9 @@ def get_mean_vector(self, keys, weights=None, pre_normalize=True, post_normalize
if len(keys) == 0:
raise ValueError("cannot compute mean with no input")
if isinstance(weights, list):
weights = np.array(weights)
weights = np.array(weights, dtype=self.vectors.dtype)
if weights is None:
weights = np.ones(len(keys))
weights = np.ones(len(keys), dtype=self.vectors.dtype)
if len(keys) != weights.shape[0]: # weights is a 1-D numpy array
raise ValueError(
"keys and weights array must have same number of elements"
Expand Down Expand Up @@ -762,7 +762,7 @@ def save(self, *args, **kwargs):

Parameters
----------
fname : str
fname_or_handle : str
Path to the output file.

See Also
Expand Down Expand Up @@ -1667,7 +1667,7 @@ def save_word2vec_format(
if binary:
fout.write(f"{prefix}{key} ".encode('utf8') + key_vector.astype(REAL).tobytes())
else:
fout.write(f"{prefix}{key} {' '.join(repr(val) for val in key_vector)}\n".encode('utf8'))
fout.write(f"{prefix}{key} {' '.join(repr(val) for val in key_vector.tolist())}\n".encode('utf8'))

@classmethod
def load_word2vec_format(
Expand Down Expand Up @@ -1977,7 +1977,7 @@ def _word2vec_read_text(fin, kv, counts, vocab_size, vector_size, datatype, unic

def _word2vec_line_to_vector(line, datatype, unicode_errors, encoding):
parts = utils.to_unicode(line.rstrip(), encoding=encoding, errors=unicode_errors).split(" ")
word, weights = parts[0], [datatype(x) for x in parts[1:]]
word, weights = parts[0], [datatype(x).item() for x in parts[1:]]
return word, weights


Expand Down
5 changes: 3 additions & 2 deletions gensim/models/ldamodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,10 +133,11 @@ def update_dir_prior(prior, N, logphat, rho):
The updated prior.

"""
dtype = logphat.dtype
gradf = N * (psi(np.sum(prior)) - psi(prior) + logphat)

c = N * polygamma(1, np.sum(prior))
q = -N * polygamma(1, prior)
c = N * polygamma(1, np.sum(prior)).astype(dtype)
q = -N * polygamma(1, prior).astype(dtype)

b = np.sum(gradf / q) / (1 / c + np.sum(1 / q))

Expand Down
12 changes: 3 additions & 9 deletions gensim/models/lsimodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@
import numpy as np
import scipy.linalg
import scipy.sparse
from scipy.sparse import sparsetools

from gensim import interfaces, matutils, utils
from gensim.models import basemodel
Expand Down Expand Up @@ -960,10 +959,8 @@ def stochastic_svd(
m, n = corpus.shape
assert num_terms == m, f"mismatch in number of features: {m} in sparse matrix vs. {num_terms} parameter"
o = random_state.normal(0.0, 1.0, (n, samples)).astype(y.dtype) # draw a random gaussian matrix
sparsetools.csc_matvecs(
m, n, samples, corpus.indptr, corpus.indices,
corpus.data, o.ravel(), y.ravel(),
) # y = corpus * o
y = corpus.dot(o) # y = corpus * o

del o

# unlike np, scipy.sparse `astype()` copies everything, even if there is no change to dtype!
Expand Down Expand Up @@ -994,10 +991,7 @@ def stochastic_svd(
num_docs += n
logger.debug("multiplying chunk * gauss")
o = random_state.normal(0.0, 1.0, (n, samples), ).astype(dtype) # draw a random gaussian matrix
sparsetools.csc_matvecs(
m, n, samples, chunk.indptr, chunk.indices, # y = y + chunk * o
chunk.data, o.ravel(), y.ravel(),
)
y = y + chunk * o
del chunk, o
y = [y]
q, _ = matutils.qr_destroy(y) # orthonormalize the range
Expand Down
12 changes: 6 additions & 6 deletions gensim/models/tfidfmodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,16 +360,16 @@ def __init__(self, corpus=None, id2word=None, dictionary=None, wlocal=utils.iden
self.pivot = pivot
self.eps = 1e-12

if smartirs:
if smartirs is not None:
n_tf, n_df, n_n = self.smartirs
self.wlocal = partial(smartirs_wlocal, local_scheme=n_tf)
self.wglobal = partial(smartirs_wglobal, global_scheme=n_df)

if dictionary:
if dictionary is not None:
# user supplied a Dictionary object, which already contains all the
# statistics we need to construct the IDF mapping. we can skip the
# step that goes through the corpus (= an optimization).
if corpus:
if corpus is not None:
logger.warning(
"constructor received both corpus and explicit inverse document frequencies; ignoring the corpus"
)
Expand All @@ -378,17 +378,17 @@ def __init__(self, corpus=None, id2word=None, dictionary=None, wlocal=utils.iden
self.dfs = dictionary.dfs.copy()
self.term_lens = {termid: len(term) for termid, term in dictionary.items()}
self.idfs = precompute_idfs(self.wglobal, self.dfs, self.num_docs)
if not id2word:
if id2word is None:
self.id2word = dictionary
elif corpus:
elif corpus is not None:
self.initialize(corpus)
else:
# NOTE: everything is left uninitialized; presumably the model will
# be initialized in some other way
pass

# If smartirs is not None, override pivot and normalize
if not smartirs:
if smartirs is None:
return
if self.pivot is not None:
if n_n in 'ub':
Expand Down
2 changes: 1 addition & 1 deletion gensim/test/test_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@

import logging
import unittest
from unittest import mock

import mock
import numpy as np

from gensim.parsing.preprocessing import (
Expand Down
5 changes: 1 addition & 4 deletions gensim/test/test_poincare.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,7 @@
import os
import tempfile
import unittest
try:
from mock import Mock
except ImportError:
from unittest.mock import Mock
from unittest.mock import Mock

import numpy as np
try:
Expand Down
Loading
Loading