GitHub

Lexery

A simple lexer based on regular expressions.

Inspired by https://eli.thegreenplace.net/2013/06/25/regex-based-lexical-analysis-in-python-and-javascript

Usage

You define the lexing rules and lexery matches them iteratively as a look-up:

>>> import lexery
>>> import re
>>> text = 'crop \t   ( 20, 30, 40, 10 ) ;'
>>>
>>> lexer = lexery.Lexer(
...     rules=[
...         lexery.Rule(identifier='identifier',
...             pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
...         lexery.Rule(identifier='lpar', pattern=re.compile(r'\(')),
...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
...         lexery.Rule(identifier='rpar', pattern=re.compile(r'\)')),
...         lexery.Rule(identifier='comma', pattern=re.compile(r',')),
...         lexery.Rule(identifier='semi', pattern=re.compile(r';'))
...     ],
...     skip_whitespace=True)
>>> tokens = lexer.lex(text=text)
>>> assert tokens == [[
...     lexery.Token('identifier', 'crop', 0, 0),
...     lexery.Token('lpar', '(', 9, 0),
...     lexery.Token('number', '20', 11, 0),
...     lexery.Token('comma', ',', 13, 0),
...     lexery.Token('number', '30', 15, 0),
...     lexery.Token('comma', ',', 17, 0),
...     lexery.Token('number', '40', 19, 0),
...     lexery.Token('comma', ',', 21, 0),
...     lexery.Token('number', '10', 23, 0),
...     lexery.Token('rpar', ')', 26, 0),
...     lexery.Token('semi', ';', 28, 0)]]

Mind that if a part of the text can not be matched, a lexery.Error is raised:

>>> import lexery
>>> import re
>>> text = 'some-identifier ( 23 )'
>>>
>>> lexer = lexery.Lexer(
...     rules=[
...         lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
...     ],
...     skip_whitespace=True)
>>> tokens = lexer.lex(text=text)
Traceback (most recent call last):
...
lexery.Error: Unmatched text at line 0 and position 4:
some-identifier ( 23 )
    ^

If you specify an unmatched_identifier, all the unmatched characters are accumulated in tokens with that identifier:

>>> import lexery
>>> import re
>>> text = 'some-identifier ( 23 )-'
>>>
>>> lexer = lexery.Lexer(
...     rules=[
...         lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
...     ],
...     skip_whitespace=True,
...     unmatched_identifier='unmatched')
>>> tokens = lexer.lex(text=text)
>>> assert tokens == [[
...     lexery.Token('identifier', 'some', 0, 0),
...    lexery.Token('unmatched', '-', 4, 0),
...    lexery.Token('identifier', 'identifier', 5, 0),
...    lexery.Token('unmatched', '(', 16, 0),
...    lexery.Token('number', '23', 18, 0),
...    lexery.Token('unmatched', ')-', 21, 0)]]

Installation

Install lexery with pip:

pip3 install lexery

Development

Check out the repository.
In the repository root, create the virtual environment:

python3 -m venv venv3

Activate the virtual environment:

source venv3/bin/activate

Install the development dependencies:

pip3 install -e .[dev]

Pre-commit Checks

We provide a set of pre-commit checks that run unit tests, lint and check code for formatting.

Namely, we use:

yapf to check the formatting.
The style of the docstrings is checked with pydocstyle.
Static type analysis is performed with mypy.
Various linter checks are done with pylint.

Run the pre-commit checks locally from an activated virtual environment with development dependencies:

./precommit.py

The pre-commit script can also automatically format the code:

./precommit.py  --overwrite

Versioning

We follow Semantic Versioning. The version X.Y.Z indicates:

X is the major version (backward-incompatible),
Y is the minor version (backward-compatible), and
Z is the patch version (backward-compatible bug fix).

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
lexery		lexery
tests		tests
.gitignore		.gitignore
CHANGELOG.rst		CHANGELOG.rst
README.rst		README.rst
precommit.py		precommit.py
pylint.rc		pylint.rc
setup.py		setup.py
style.yapf		style.yapf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexery

Usage

Installation

Development

Pre-commit Checks

Versioning

About

Releases 6

Packages

Contributors 3

Languages

Parquery/lexery

Folders and files

Latest commit

History

Repository files navigation

Lexery

Usage

Installation

Development

Pre-commit Checks

Versioning

About

Resources

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 3

Languages

Packages