Releases: anhaidgroup/py_stringmatching
v0.4.6 - 7/5/2024
- Limited Numpy to <2.0 in setup.py, due to compatibility issues
- Added preliminary testing of pip install to Github Actions workflow
Contributors:
v0.4.5 - 2/1/2024
- Discontinued usage of cythonize.py during setup due to Python 3.12 compatibility issues
Contributors:
v0.4.4 - 1/26/2024
- Dropped support for Python 2
- Added support for Python 3.12
- Adjusted setuptools.setup project name to match name on PyPI
Contributors:
v0.4.3 - 2/8/2023
- Dropped support for Python 3.6.
- Added support for Python 3.10 and 3.11.
- Replaced aliases removed from Numpy 1.24.
- Switched from Nose to vanilla Unittest.
- Replaced Travis and Appveyor CI testing with Github Actions.
Contributors:
v0.4.2 - 10/23/2020
- Bug fix: Made PartialRatio importable from py_stringmatching.
- Dropped support for Python 3.4.
- This is the last version of py_stringmatching that will support Python 2 and Python 3.5.
Contributors:
v0.4.1 - 02/22/19
v0.4.1 - 02/22/19
- Cython version was updated. The package is now built with updated Cython version >= 0.27.3.
- Added support for Python 3.7 version and dropped Testing support for Python 3.3 version.
Contributers:
v0.4.0
v0.4.0 - 07/18/2017
-
Five similarity measures written in Python have been Cythonized to run much faster. These are Affine, Jaro, Jaro Winkler, Needleman Wunsch, and Smith Waterman.
-
We have also empirically evaluated the runtime of Jaccard (written in Python) and found that it is already very fast. Thus, Cythonizing it is unlikely to yield much of a speedup.
-
Note that in Version 0.3.x (and earlier versions), edit distance has been Cythonized. Thus, the set of all Cythonized similarity measures consists of edit distance, Affine, Jaro, Jaro Winkler, Needleman Wunsch, and Smith Waterman.
-
In subsequent versions, it would be highly desirable to Cythonize remaining similarity measures, including Dice, cosine, etc.
-
For this package, we add a runtime benchmark (consisting of a script and several datasets) to measure the runtime performance of similarity measures. This benchmark can be used by users to judge whether similarity measures are fast enough for their purposes, and used by developers to speed up the measures.
Contributors:
Srujith Poondla, Phil Martinkus, Pradap Konda, Paul Suganthan G.C., AnHai Doan
v0.3.0
v0.3.0 - 05/29/2017
- Added nine new string similarity measures - Bag Distance, Editex, Generalized Jaccard, Partial Ratio, Partial Token Sort, Ratio, Soundex, Token Sort, and Tversky Index.
Contributors:
Rishab Kalra, Pradap Konda, Paul Suganthan G.C., AnHai Doan
v0.2.1
v0.2.1 - 08/05/2016
- Remove explicit installation of numpy using pip in setup.
- Add numpy in setup_requires and compile extensions by including numpy install path.
Contributors:
Pradap Konda, Paul Suganthan G.C., AnHai Doan
v0.2.0
v0.2.0 - 07/06/2016
- Qgram tokenizers have been modified to take a flag called "padding". If this flag is True (the default), then a prefix and a suffix will be added to the input string before tokenizing (see the Tutorial for a reason for this).
- Version 0.1.0 does not handle strings in unicode correctly. Specifically, if an input string contains non-ascii characters, a string similarity measure may interpret the string incorrectly and thus compute an incorrect similarity score. In this version we have fixed the string similarity measures. Specifically, we convert the input strings into unicode before computing similarity measures. NOTE: the tokenizers are still not yet unicode-aware.
- In Version 0.1.0, the flag "dampen" for TF/IDF similarity measure has the default value of False. In this version we have modified it to have the default value of True, which is the more common value for this flag in practice.
Contributors:
Pradap Konda, Paul Suganthan G.C., AnHai Doan