add developers documentation for query-side of tokenizer

osm-search · Dec 13, 2024 · 5b40aa5 · 5b40aa5
1 parent fbb6edf
commit 5b40aa5
Showing 1 changed file with 33 additions and 5 deletions.
diff --git a/docs/develop/Tokenizers.md b/docs/develop/Tokenizers.md
@@ -91,14 +91,19 @@ for a custom tokenizer implementation.
 
 ### Directory Structure
 
-Nominatim expects a single file `src/nominatim_db/tokenizer/<NAME>_tokenizer.py`
-containing the Python part of the implementation.
+Nominatim expects two files containing the Python part of the implementation:
+
+ * `src/nominatim_db/tokenizer/<NAME>_tokenizer.py` contains the tokenizer
+   code used during import and
+ * `src/nominatim_api/search/NAME>_tokenizer.py` has the code used during
+   query time.
+
 `<NAME>` is a unique name for the tokenizer consisting of only lower-case
 letters, digits and underscore. A tokenizer also needs to install some SQL
 functions. By convention, these should be placed in `lib-sql/tokenizer`.
 
 If the tokenizer has a default configuration file, this should be saved in
-the `settings/<NAME>_tokenizer.<SUFFIX>`.
+`settings/<NAME>_tokenizer.<SUFFIX>`.
 
 ### Configuration and Persistence
 
@@ -110,9 +115,11 @@ are tied to a database installation and must only be read during installation
 time. If they are needed for the runtime then they must be saved into the
 `nominatim_properties` table and later loaded from there.
 
-### The Python module
+### The Python modules
 
-The Python module is expect to export a single factory function:
+#### `src/nominatim_db/tokenizer/`
+
+The import Python module is expected to export a single factory function:
 
 ```python
 def create(dsn: str, data_dir: Path) -> AbstractTokenizer
@@ -123,6 +130,20 @@ is a directory in the project directory that the tokenizer may use to save
 database-specific data. The function must return the instance of the tokenizer
 class as defined below.
 
+#### `src/nominatim_api/search/`
+
+The query-time Python module must also export a factory function:
+
+``` python
+def create_query_analyzer(conn: SearchConnection) -> AbstractQueryAnalyzer
+```
+
+The `conn` parameter contains the current search connection. See the
+[library documentation](../library/Low-Level-DB-Access.md#searchconnection-class)
+for details on the class. The function must return the instance of the tokenizer
+class as defined below.
+
+
 ### Python Tokenizer Class
 
 All tokenizers must inherit from `nominatim_db.tokenizer.base.AbstractTokenizer`
@@ -138,6 +159,13 @@ and implement the abstract functions defined there.
     options:
         heading_level: 6
 
+
+### Python Query Analyzer Class
+
+::: nominatim_api.search.query_analyzer_factory.AbstractQueryAnalyzer
+    options:
+        heading_level: 6
+
 ### PL/pgSQL Functions
 
 The tokenizer must provide access functions for the `token_info` column