From 6aa2f635e07be9ecc7d39620545a041e592ec2f2 Mon Sep 17 00:00:00 2001 From: Sergey Bronnikov Date: Thu, 16 Sep 2021 11:37:33 +0300 Subject: [PATCH] Doc: use custom sharding key to calculate bucket id Describe functionality and current limitations (#212 and #213) with custom sharding key in CHANGELOG and README. Closes #166 --- CHANGELOG.md | 4 ++++ README.md | 64 +++++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 60 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 68c09aced..be1752bc1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -24,6 +24,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. * `crud.len()` function to calculate the number of tuples in the space for memtx engine and calculate the maximum approximate number of tuples in the space for vinyl engine. +* CRUD operations calculates bucket id automatically using sharding + key specified with DDL schema or in `_ddl_sharding_key` space. + NOTE: CRUD methods delete(), get() and update() requires that sharding key + must be a part of primary key. ## [0.8.0] - 02-07-21 diff --git a/README.md b/README.md index 29ae1145b..7e80f6389 100644 --- a/README.md +++ b/README.md @@ -53,11 +53,56 @@ crud.unflatten_rows(res.rows, res.metadata) **Notes:** * A space should have a format. -* By default, `bucket_id` is computed as `vshard.router.bucket_id_strcrc32(key)`, - where `key` is the primary key value. - Custom bucket ID can be specified as `opts.bucket_id` for each operation. - For operations that accepts tuple/object bucket ID can be specified as - tuple/object field as well as `opts.bucket_id` value. + +**Sharding key** + +*Sharding key* is a set of tuple field values used for calculation *bucket ID*. +*Sharding key definition* is a set of tuple field names that describe what +tuple field should be a part of sharding key. *Bucket ID* determines which +replicaset stores certain data. Function that used for calculation bucket ID is +named *sharding function*. + +By default CRUD calculates bucket ID using primary key and +`vshard.router.bucket_id_strcrc32(key)`, it happen automatically and doesn't +require any actions from user side. User can calculate bucket ID on outside and +pass it as an option to CRUD methods that accepts tuple or object (see option +`bucket_id` below). + +In version > 0.8.0 users who don't want to use primary key as a sharding key +may set custom sharding key definition as a part of [DDL +schema](https://github.com/tarantool/ddl#input-data-format) or insert manually +to the space `_ddl_sharding_key` (for both cases consider a DDL module +documentation). As soon as sharding key for a certain space is available in +`_ddl_sharding_key` space CRUD will use it for bucket ID calculation +automatically. Note that CRUD methods `delete()`, `get()` and `update()` +requires that sharding key must be a part of primary key. + +Table below describe what operations supports custom sharding key: + +| CRUD method | Added sharding key support | +| -------------------------------- | -------------------------- | +| `get()` | Yes | +| `insert()` / `insert_object()` | Yes | +| `delete()` | Yes | +| `replace()` / `replace_object()` | Yes | +| `upsert()` / `upsert_object()` | Yes | +| `select()` / `pairs()` | Yes | +| `update()` | Yes | +| `upsert()` / `upsert_object()` | Yes | +| `replace() / replace_object()` | Yes | +| `min()` / `max()` | No (not required) | +| `cut_rows()` / `cut_objects()` | No (not required) | +| `truncate()` | No (not required) | +| `len()` | No (not required) | + +Current limitations for using custom sharding key: + +- It's not possible to update sharding keys automatically when schema is +updated on storages, see [#212](https://github.com/tarantool/crud/issues/212). +However it is possible to do it manually with +`sharding_key.update_sharding_keys_cache()`. +- CRUD select may lead map reduce in some cases, see +[#213](https://github.com/tarantool/crud/issues/213). ### Insert @@ -115,7 +160,8 @@ local object, err = crud.get(space_name, key, opts) where: * `space_name` (`string`) - name of the space -* `key` (`any`) - primary key value +* `key` (`any`) - primary key value in version < 0.8.0 and sharding key when + DDL sharding key is used in version >= 0.8.0. See section 'Sharding key' above. * `opts`: * `fields` (`?table`) - field names for getting only a subset of fields * `bucket_id` (`?number|cdata`) - bucket ID @@ -152,7 +198,8 @@ local object, err = crud.update(space_name, key, operations, opts) where: * `space_name` (`string`) - name of the space -* `key` (`any`) - primary key value +* `key` (`any`) - primary key value in version < 0.8.0 and sharding key when + DDL sharding key is used in version >= 0.8.0. See section 'Sharding key' above. * `operations` (`table`) - update [operations](https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_space/#box-space-update) * `opts`: * `timeout` (`?number`) - `vshard.call` timeout (in seconds) @@ -185,7 +232,8 @@ local object, err = crud.delete(space_name, key, opts) where: * `space_name` (`string`) - name of the space -* `key` (`any`) - primary key value +* `key` (`any`) - primary key value in version < 0.8.0 and sharding key when + DDL sharding key is used in version >= 0.8.0. See section 'Sharding key' above. * `opts`: * `timeout` (`?number`) - `vshard.call` timeout (in seconds) * `bucket_id` (`?number|cdata`) - bucket ID