Skip to content

Commit

Permalink
Add doc on picking resolvers
Browse files Browse the repository at this point in the history
Also bump cache up: on `bench` the `basic` resolver high water marks
as:

- 40MB with no cache, averaging 455µs/line
- 40.7MB with a 200 entries s3fifo, averaging 324µs/line
- 42.4MB with a 2000 entries s3fifo, averaging 191µs/line
- 44.2MB with a 5000 entries s3fifo, averaging 155µs/line
- 47.2MB with a 10000 entries s3fifo, averaging 134µs/line
- 53MB with a 2000 entries s3fifo, averaging 123µs/line

Either 2000 or 5000 seem like pretty good defaults, the gains taper
afterwards as memory use increases sharply. Bump to 2000 to stay on
the conservative side.
  • Loading branch information
masklinn committed Oct 29, 2024
1 parent 4e07493 commit 4f3bcde
Show file tree
Hide file tree
Showing 5 changed files with 126 additions and 7 deletions.
15 changes: 9 additions & 6 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,20 @@ Just add ``ua-parser`` to your project's dependencies, or run
to install in the current environment.

Installing `google-re2 <https://pypi.org/project/google-re2/>`_ is
*strongly* recommended as it leads to *significantly* better
performances. This can be done directly via the ``re2`` optional
dependency:
Installing `ua-parser-rs <https://pypi.org/project/ua-parser-rs>`_ or
`google-re2 <https://pypi.org/project/google-re2/>`_ is *strongly*
recommended as they yield *significantly* better performances. This
can be done directly via the ``regex`` and ``re2`` optional
dependencies respectively:

.. code-block:: sh
$ pip install 'ua_parser[regex]'
$ pip install 'ua_parser[re2]'
If ``re2`` is available, ``ua-parser`` will simply use it by default
instead of the pure-python resolver.
If either dependency is already available (e.g. because the software
makes use of re2 for other reasons) ``ua-parser`` will use the
corresponding resolver automatically.

Quick Start
-----------
Expand Down
13 changes: 13 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,19 @@ from user agent strings.

.. warning:: Only available if |re2|_ is installed.

.. class::ua_parser.regex.Resolver(Matchers)
An advanced resolver based on |regex|_ and a bespoke implementation
of regex prefiltering, by the sibling project `ua-rust
<https://github.com/ua-parser/uap-rust`_.
Sufficiently fast that a cache may not be necessary, and may even
be detrimental at smaller cache sizes
.. warning:: Only available if `ua-parser-rs
<https://pypi.org/project/ua-parser-rs/`>_ is
installed.
Eager Matchers
''''''''''''''

Expand Down
97 changes: 97 additions & 0 deletions doc/guides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,103 @@ from here on::
:class:`~ua_parser.caching.Local`, which is also caching-related,
and serves to use thread-local caches rather than a shared cache.

Builtin Resolvers
=================

.. list-table::
:header-rows: 1
:stub-columns: 1

* -
- speed
- portability
- memory use
- safety
* - ``regex``
- great
- good
- bad
- great
* - ``re2``
- good
- bad
- good
- good
* - ``basic``
- terrible
- great
- great
- great

``regex``
---------

The ``regex`` resolver is a bespoke effort as part of the `uap-rust
<https://github.com/ua-parser/uap-rust>`_ sibling project, built on
`rust-regex <https://github.com/rust-lang/regex>`_ and `a bespoke
regex-prefiltering implementation
<https://github.com/ua-parser/uap-rust/tree/main/regex-filtered>`_,
it:

- Is the fastest available resolver, usually edging out ``re2`` by a
significant margin (when that is even available).
- Is fully controlled by the project, and thus can be built for all
interpreters and platforms supported by pyo3 (currently: cpython,
pypy, and graalpy, on linux, macos and linux, intel and arm). It is
also built as a cpython abi3 wheel and should thus suffer from no
compatibility issues with new release.
- Built entirely out of safe rust code, its safety risks are entirely
in ``regex`` and ``pyo3``.
- Its biggest drawback is that it is a lot more memory intensive than
the other resolvers, because ``regex`` tends to trade memory for
speed (~155MB high water mark on a real-world dataset).

If available, it is the default resolver, without a cache.

``re2``
-------

The ``re2`` resolver is built atop the widely used `google-re2
<https://github.com/google/re2>`_ via its built-in Python bindings.
It:

- Is extremely fast, though around 80% slower than ``regex`` on
real-world data.
- Is only compatible with CPython, and uses pure API wheels, so needs
a different release for each cpython version, for each OS, for each
architecture.
- Is built entirely in C++, but by experienced Google developers.
- Is more memory intensive than the pure-python ``basic`` resolver,
but quite slim all things considered (~55MB high water mark on a
real-world dataset).

If available, it is the second-preferred resolver, without a cache.

``basic``
---------

The ``basic`` resolver is a naive linear traversal of all rules, using
the standard library's ``re``. It:

- Is *extremely* slow, about 10x slower than ``re2`` in cpython, and
pypy and graal's regex implementations do *not* like the workload
and behind cpython by a factor of 3~4.
- Has perfect compatibility, with the caveat above, by virtue of being
built entirely out of standard library code.
- Is basically as safe as Python software can be by virtue of being
just Python, with the native code being the standard library's.
- Is the slimmest resolver at about 40MB.

This is caveated by a hard requirement to use caches which makes it
workably faster on real-world datasets (if still nowhere near
*uncached* ``re2`` or ``regex``) but increases its memory requirement
significantly e.g. using "sieve" and a cache size of 20000 on a
real-world dataset, it is about 4x slower than ``re2`` for about the
same memory requirements.

It is the fallback and least preferred resolver, with a medium
(currently 2000 entries) cache by default.

Writing Custom Resolvers
========================

Expand Down
6 changes: 6 additions & 0 deletions doc/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,9 @@ if installed, but can also be installed via and alongside ua-parser:
$ pip install 'ua-parser[yaml]'
$ pip install 'ua-parser[regex,yaml]'
``yaml`` simply enables the ability to :func:`load yaml rulesets
<ua_parser.loaders.load_yaml>`.

The other two dependencies enable more efficient resolvers. By
default, ``ua-parser`` will select the fastest resolver it finds out
of the available set. For more, see :ref:`builtin resolvers`.
2 changes: 1 addition & 1 deletion src/ua_parser/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
(
RegexResolver,
Re2Resolver,
lambda m: CachingResolver(BasicResolver(m), Cache(200)),
lambda m: CachingResolver(BasicResolver(m), Cache(2000)),
),
)
)
Expand Down

0 comments on commit 4f3bcde

Please sign in to comment.