forked from wharris/esmre
-
Notifications
You must be signed in to change notification settings - Fork 0
Python extension module for accelerating regular expressions using libesm
License
zombie-guru/esmre
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
esmre - Efficient String Matching Regular Expressions ===================================================== esmre is a Python module that can be used to speed up the execution of a large collection of regular expressions. It works by building a index of compulsory substrings from a collection of regular expressions, which it uses to quickly exclude those expressions which trivially do not match each input. Here is some example code that uses esmre: >>> import esmre >>> index = esmre.Index() >>> index.enter(r"Major-General\W*$", "savoy opera") >>> index.enter(r"\bway\W+haye?\b", "sea shanty") >>> index.query("I am the very model of a modern Major-General.") ['savoy opera'] >>> index.query("Way, hay up she rises,") ['sea shanty'] >>> The esmre module builds on the simpler string matching facilities of the esm module, which wraps a C implementation some of the algorithms described in Aho's and Corasick's paper on efficient string matching [Aho, A.V, and Corasick, M. J. Efficient String Matching: An Aid to Bibliographic Search. Comm. ACM 18:6 (June 1975), 333-340]. Some minor modifications have been made to the algorithms in the paper and one algorithm is missing (for now), but there is enough to implement a quick string matching index. Here is some example code that uses esm directly: >>> import esm >>> index = esm.Index() >>> index.enter("he") >>> index.enter("she") >>> index.enter("his") >>> index.enter("hers") >>> index.fix() >>> index.query("this here is history") [((1, 4), 'his'), ((5, 7), 'he'), ((13, 16), 'his')] >>> index.query("Those are his sheep!") [((10, 13), 'his'), ((14, 17), 'she'), ((15, 17), 'he')] >>> You can see more usage examples in the tests.
About
Python extension module for accelerating regular expressions using libesm
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 72.2%
- C 27.8%