doc: describe schema reload implementation

This patch adds a small document describing how space and sharding schemas works in the module. It covers what schemas are, how we store, use and reload them in requests. Closes #253
tarantool · Aug 3, 2022 · 2ce9fe6 · 2ce9fe6
1 parent d053623
commit 2ce9fe6
Showing 1 changed file with 155 additions and 0 deletions.
diff --git a/doc/dev/schema.md b/doc/dev/schema.md
@@ -0,0 +1,155 @@
+# Database schema information design document
+
+Two types of schema are used in ``crud`` requests: ``net.box`` spaces
+schema and ``ddl`` sharding schema. If a change had occurred in one of
+those, router instances should reload the schema and reevaluate
+a request using an updated one. This document clarifies how schema
+is obtained, used and reloaded.
+
+## Space schema
+
+Related links: [#98](https://github.com/tarantool/crud/issues/98),
+[PR#111](https://github.com/tarantool/crud/pull/111).
+
+### How schema is stored
+
+Every ``crud`` router is a [``vshard``](https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/)
+router, the same applies to storages. In ``vshard`` clusters, spaces are
+created on storages. Thus, each storage has a schema (space list,
+space formats and indexes) on it.
+
+Every router has a [``net.box``](https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/)
+connection to each storage it could interact with. (They can be
+retrieved with [``vshard.router.routeall``](https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/vshard_router/#lua-function.vshard.router.routeall)
+call.) Each ``net.box`` connection has space schema for an instance it
+is connected to. Router can access space schema by using ``net.box``
+connection object contents.
+
+### When schema is used
+
+Space schema is used to flatten maps on `crud.*_object*` requests.
+Together with sharding info, the space schema is used to calculate
+``bucket_id`` to choose replicaset for a request.
+
+### How schema is reloaded
+
+Now there are no automatic reload of space schema supported by
+``net.box`` itself (see [tarantool/tarantool/6169](https://github.com/tarantool/tarantool/issues/6169)).
+``crud`` uses connection ``reload_schema`` handle (``ping`` wrapper, see
+[PR#111 comment](https://github.com/tarantool/crud/pull/111#issuecomment-765811556))
+to reload schema. ``crud`` reloads each replicaset schema. If there are
+several requests to reload info, the only one reload fiber is started
+and every request waits for its completion.
+
+### When schema is reloaded and operation is retried
+
+The basic logic is as follows: if something had failed and space schema
+mismatch could be the reason, reload the schema and retry. If it didn't
+help after ``N`` retries (now ``N`` is ``1``), pass the error to the
+user.
+
+Retry with reload is triggered if
+- space not found;
+- ``bucket_id`` calculation has failed due to any reason;
+- operation had failed on storage, hash check is on and hashes mismatch.
+
+Let's talk a bit more about the last one. To enable hash check, the user
+should pass `add_space_schema_hash` option to a request. This option is
+always enabled for `crud.*_object*` requests. If hashes mismatch, it
+means the router space schema is inconsistent with the storage space
+schema, so we reload it. For ``*_many`` methods, reload and retry
+happens only if all tuples had failed with hash mismatch; otherwise,
+errors are passed to a user.
+
+Retries are counted per function it wraps, so in fact there could
+be more reloads than ``N`` for a single request. For example, if a user
+calls `insert_object`, object flatten has failed (and has succeeded
+after reload) and then storage insert has failed and hashes mismatch,
+reload will happen twice and there will be two retries (though it's two
+different blocks of code will be retried independently). Or if a user
+calls `insert_object_many`, each tuple flatten is wrapped independently
+(though it is hard to imagine a real case of multiple flatten retries:
+user will need to pass a batch of tables with different structure and
+change the schema several times to match the current one; in reality it
+always will be success after a single retry or a fast fail).
+
+### Alternative approaches
+
+One of the alternatives considered was to ping a storage instance on
+each request to refresh schema (see [PR#111 comment](https://github.com/tarantool/crud/pull/111#discussion_r562757016)).
+
+
+## Sharding schema
+
+Related links: [#166](https://github.com/tarantool/crud/issues/166),
+[PR#181](https://github.com/tarantool/crud/pull/181),
+[#237](https://github.com/tarantool/crud/issues/237),
+[PR#239](https://github.com/tarantool/crud/pull/239),
+[#212](https://github.com/tarantool/crud/issues/212),
+[PR#268](https://github.com/tarantool/crud/pull/268).
+
+### How schema is stored
+
+Again, ``crud`` cluster is a [``vshard``](https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/)
+cluster. Thus, data is sharded based on some key. To extract the key
+from a tuple and compute a [``bucket_id``](https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/vshard_architecture/),
+[``ddl``](https://github.com/tarantool/ddl) module is used. ``ddl``
+module schema describes sharding key (how to extract data required
+to compute ``bucket_id`` from a tuple based on space schema) and
+sharding function (how to compute ``bucket_id`` based on the data).
+This information is stored on storages in some tables and Tarantool
+spaces: ``_ddl_sharding_key`` and ``_ddl_sharding_func``. ``crud``
+module uses `_ddl_sharding_key` and `_ddl_sharding_func` spaces to fetch
+sharding schema: thus, you don't obliged to use ``ddl`` module and can
+setup only ``_ddl_*`` spaces manually, if you want. ``crud`` module uses
+plain Lua tables to store sharding info on routers and storages.
+
+### When schema is used
+
+Sharding schema is used to compute ``bucket_id``. Thus, it is used each
+time we need to execute a non-map-reduce request (``insert_*``,
+``replace_*``, ``update_*``, ``delete_*``,  ``upsert_*``, ``get_*``
+requests, most ``select`` and ``count`` requests, some ``pairs``
+requests), ``bucket_id`` is not specified explicitly and there are no
+``force_map_call`` option. If there is no sharding schema specified, we
+use defaults: sharding key is primary key, sharding function is
+``vshard.router.bucket_id_strcrc32``.
+
+### How schema is reloaded
+
+Storage sharding schema (internal ``crud`` Lua tables) is updated on
+initialization (first time when info is requested) and each time someone
+changes ``_ddl_sharding_key`` or ``_ddl_sharding_func`` data -- we use
+``on_replace`` triggers.
+
+Routers fetch sharding schema if cache wasn't initialized yet or each
+time reload was requested. Reload could be requested with ``crud``
+itself (see below) or by user (with ``require('crud.common.sharding_key').update_cache()``
+or ``require('crud.common.sharding_func').update_cache()`` handles).
+The latter was deprecated after introducing automatic reload.
+
+It is worth noting that reload/initialization affects both sharding
+key and function caches at the same time, but fetch itself is triggered
+only if one cache requested by some function is not initialized yet.
+(I.e. if function cache is ok and key cache is not, they still both be
+reloaded.). If there are several requests to reload info, the only one
+reload fiber is started and every request waits for its completion.
+
+### When schema is reloaded and operation is retried
+
+Each request that uses sharding info passes sharding hashes from
+a router with a request. If hashes mismatch with storage ones, we return
+a specific error. The request retries if it receives a hash mismatch
+error from the storage. If it didn't help after ``N`` retries (now ``N``
+is ``1``), the error is passed to the user. For ``*_many`` methods,
+reload and retry happens only if all tuples had failed with hash
+mismatch; otherwise, errors are passed to the user.
+
+### Alternative approaches
+
+There were different implementations of working with hash: we tried
+to compute it instead of storing pre-computed values, but pre-computed
+approach with triggers was better in terms of performance. It was also
+an option to ping storage before sending a request and verify sharding
+info relevance before sending the request with separate call, but it was
+discarded.