Skip to content

Commit

Permalink
doc: describe schema reload implementation
Browse files Browse the repository at this point in the history
This patch adds a small document describing how space and sharding
schemas works in the module. It covers what schemas are, how we store,
use and reload them in requests.

Closes #253
  • Loading branch information
DifferentialOrange committed Aug 3, 2022
1 parent d053623 commit 2ce9fe6
Showing 1 changed file with 155 additions and 0 deletions.
155 changes: 155 additions & 0 deletions doc/dev/schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Database schema information design document

Two types of schema are used in ``crud`` requests: ``net.box`` spaces
schema and ``ddl`` sharding schema. If a change had occurred in one of
those, router instances should reload the schema and reevaluate
a request using an updated one. This document clarifies how schema
is obtained, used and reloaded.

## Space schema

Related links: [#98](https://github.com/tarantool/crud/issues/98),
[PR#111](https://github.com/tarantool/crud/pull/111).

### How schema is stored

Every ``crud`` router is a [``vshard``](https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/)
router, the same applies to storages. In ``vshard`` clusters, spaces are
created on storages. Thus, each storage has a schema (space list,
space formats and indexes) on it.

Every router has a [``net.box``](https://www.tarantool.io/en/doc/latest/reference/reference_lua/net_box/)
connection to each storage it could interact with. (They can be
retrieved with [``vshard.router.routeall``](https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/vshard_router/#lua-function.vshard.router.routeall)
call.) Each ``net.box`` connection has space schema for an instance it
is connected to. Router can access space schema by using ``net.box``
connection object contents.

### When schema is used

Space schema is used to flatten maps on `crud.*_object*` requests.
Together with sharding info, the space schema is used to calculate
``bucket_id`` to choose replicaset for a request.

### How schema is reloaded

Now there are no automatic reload of space schema supported by
``net.box`` itself (see [tarantool/tarantool/6169](https://github.com/tarantool/tarantool/issues/6169)).
``crud`` uses connection ``reload_schema`` handle (``ping`` wrapper, see
[PR#111 comment](https://github.com/tarantool/crud/pull/111#issuecomment-765811556))
to reload schema. ``crud`` reloads each replicaset schema. If there are
several requests to reload info, the only one reload fiber is started
and every request waits for its completion.

### When schema is reloaded and operation is retried

The basic logic is as follows: if something had failed and space schema
mismatch could be the reason, reload the schema and retry. If it didn't
help after ``N`` retries (now ``N`` is ``1``), pass the error to the
user.

Retry with reload is triggered if
- space not found;
- ``bucket_id`` calculation has failed due to any reason;
- operation had failed on storage, hash check is on and hashes mismatch.

Let's talk a bit more about the last one. To enable hash check, the user
should pass `add_space_schema_hash` option to a request. This option is
always enabled for `crud.*_object*` requests. If hashes mismatch, it
means the router space schema is inconsistent with the storage space
schema, so we reload it. For ``*_many`` methods, reload and retry
happens only if all tuples had failed with hash mismatch; otherwise,
errors are passed to a user.

Retries are counted per function it wraps, so in fact there could
be more reloads than ``N`` for a single request. For example, if a user
calls `insert_object`, object flatten has failed (and has succeeded
after reload) and then storage insert has failed and hashes mismatch,
reload will happen twice and there will be two retries (though it's two
different blocks of code will be retried independently). Or if a user
calls `insert_object_many`, each tuple flatten is wrapped independently
(though it is hard to imagine a real case of multiple flatten retries:
user will need to pass a batch of tables with different structure and
change the schema several times to match the current one; in reality it
always will be success after a single retry or a fast fail).

### Alternative approaches

One of the alternatives considered was to ping a storage instance on
each request to refresh schema (see [PR#111 comment](https://github.com/tarantool/crud/pull/111#discussion_r562757016)).


## Sharding schema

Related links: [#166](https://github.com/tarantool/crud/issues/166),
[PR#181](https://github.com/tarantool/crud/pull/181),
[#237](https://github.com/tarantool/crud/issues/237),
[PR#239](https://github.com/tarantool/crud/pull/239),
[#212](https://github.com/tarantool/crud/issues/212),
[PR#268](https://github.com/tarantool/crud/pull/268).

### How schema is stored

Again, ``crud`` cluster is a [``vshard``](https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/)
cluster. Thus, data is sharded based on some key. To extract the key
from a tuple and compute a [``bucket_id``](https://www.tarantool.io/en/doc/latest/reference/reference_rock/vshard/vshard_architecture/),
[``ddl``](https://github.com/tarantool/ddl) module is used. ``ddl``
module schema describes sharding key (how to extract data required
to compute ``bucket_id`` from a tuple based on space schema) and
sharding function (how to compute ``bucket_id`` based on the data).
This information is stored on storages in some tables and Tarantool
spaces: ``_ddl_sharding_key`` and ``_ddl_sharding_func``. ``crud``
module uses `_ddl_sharding_key` and `_ddl_sharding_func` spaces to fetch
sharding schema: thus, you don't obliged to use ``ddl`` module and can
setup only ``_ddl_*`` spaces manually, if you want. ``crud`` module uses
plain Lua tables to store sharding info on routers and storages.

### When schema is used

Sharding schema is used to compute ``bucket_id``. Thus, it is used each
time we need to execute a non-map-reduce request (``insert_*``,
``replace_*``, ``update_*``, ``delete_*``, ``upsert_*``, ``get_*``
requests, most ``select`` and ``count`` requests, some ``pairs``
requests), ``bucket_id`` is not specified explicitly and there are no
``force_map_call`` option. If there is no sharding schema specified, we
use defaults: sharding key is primary key, sharding function is
``vshard.router.bucket_id_strcrc32``.

### How schema is reloaded

Storage sharding schema (internal ``crud`` Lua tables) is updated on
initialization (first time when info is requested) and each time someone
changes ``_ddl_sharding_key`` or ``_ddl_sharding_func`` data -- we use
``on_replace`` triggers.

Routers fetch sharding schema if cache wasn't initialized yet or each
time reload was requested. Reload could be requested with ``crud``
itself (see below) or by user (with ``require('crud.common.sharding_key').update_cache()``
or ``require('crud.common.sharding_func').update_cache()`` handles).
The latter was deprecated after introducing automatic reload.

It is worth noting that reload/initialization affects both sharding
key and function caches at the same time, but fetch itself is triggered
only if one cache requested by some function is not initialized yet.
(I.e. if function cache is ok and key cache is not, they still both be
reloaded.). If there are several requests to reload info, the only one
reload fiber is started and every request waits for its completion.

### When schema is reloaded and operation is retried

Each request that uses sharding info passes sharding hashes from
a router with a request. If hashes mismatch with storage ones, we return
a specific error. The request retries if it receives a hash mismatch
error from the storage. If it didn't help after ``N`` retries (now ``N``
is ``1``), the error is passed to the user. For ``*_many`` methods,
reload and retry happens only if all tuples had failed with hash
mismatch; otherwise, errors are passed to the user.

### Alternative approaches

There were different implementations of working with hash: we tried
to compute it instead of storing pre-computed values, but pre-computed
approach with triggers was better in terms of performance. It was also
an option to ping storage before sending a request and verify sharding
info relevance before sending the request with separate call, but it was
discarded.

0 comments on commit 2ce9fe6

Please sign in to comment.