seqmatcher

seqmatcher provides a DSL to match and edit sequences of events. Similar to how regular expressions help match patterns in text (which is a stream of characters), a collection of sequences (stream of events) can be analyzed similarly. This is a total ripoff of the work done here by Mikhail Panko.

The original notebooks introduce the semantics for the regex-like syntax and implement it as javascript code over lists of objects. Without a JIT like V8, it would be pretty slow to execute the same code in python. So instead we,

persist the dataset in the parquet format to read it quick.
read it using the awkward library which supports jagged arrays and optional datatypes.
compile the pattern matching routines at runtime using the numba library and run it against the awkward array data.

Performance wins:

Numba implements bindings to LLVM, so the compiled code runs pretty quick.
Awkward arrays are immutable and store all attributes, including nested ones, in contiguous buffers. So, matching and extracting subsequences copies very little data, and just record slices of the original arrays to use as output.

Things that are tricky:

Numba requires static variable types for compilation, so that constrains us to a consistent schema across all sequences and events.
A columnar data layout also makes modifying the matched sequences tricky (TODO: still gotta implement it in jitted code).

Installation

Install this library using pip:

$ pip install seqmatcher

Usage

Please refer to the example notebook on how to specify and match patterns.

Development

To contribute to this library, checkout the code in a new virtual enviroment.

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
examples		examples
experiments		experiments
src/seqmatcher		src/seqmatcher
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
NOTES.md		NOTES.md
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seqmatcher

Installation

Usage

Development

About

Releases

Packages

Languages

License

ananis25/seqmatcher

Folders and files

Latest commit

History

Repository files navigation

seqmatcher

Installation

Usage

Development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages