- Python implementations of Orengo, Porter, and Savoy stemmers
- Fast: can stem more than 1.5M words/second on a normal desktop
- Least Recently Used (LRU) stem cache
- Support for lists of words to ignore (useful for stopword and named entity removal)
The project was originally developed by Pedro Oliveira.
This is a fork automatically exported from the Google Code original repos that lived at code.google.com/p/ptstemmer.
The original codebase also contained Java and C# implementations of the stemmers,
but I removed since I had no interested in them. I have the original code tagged
under original-export
and can be retrieved with a simple checkout:
$ git checkout original-export
The original work, and therefore this fork, are licensed under the GNU Lesser General Public License, version 3.0 (LGPLv3).
A verbatim copy of the license can be found in the LICENSE file.