Skip to content

Commit

Permalink
Doc: use custom sharding key to calculate bucket id
Browse files Browse the repository at this point in the history
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
  • Loading branch information
ligurio committed Sep 30, 2021
1 parent f84143b commit 14bfdfb
Show file tree
Hide file tree
Showing 2 changed files with 62 additions and 8 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
* `crud.len()` function to calculate the number of tuples
in the space for memtx engine and calculate the maximum
approximate number of tuples in the space for vinyl engine.
* CRUD operations calculates bucket id automatically using sharding
key specified with DDL schema or in `_ddl_sharding_key` space.
NOTE: CRUD methods delete(), get() and update() requires that sharding key
must be a part of primary key.

## [0.8.0] - 02-07-21

Expand Down
66 changes: 58 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,58 @@ crud.unflatten_rows(res.rows, res.metadata)
**Notes:**

* A space should have a format.
* By default, `bucket_id` is computed as `vshard.router.bucket_id_strcrc32(key)`,
where `key` is the primary key value.
Custom bucket ID can be specified as `opts.bucket_id` for each operation.
For operations that accepts tuple/object bucket ID can be specified as
tuple/object field as well as `opts.bucket_id` value.

**Sharding key and bucket id calculation**

*Sharding key* is a set of tuple field values used for calculation *bucket ID*.
*Sharding key definition* is a set of tuple field names that describe what
tuple field should be a part of sharding key. *Bucket ID* determines which
replicaset stores certain data. Function that used for calculation bucket ID is
named *sharding function*.

By default CRUD calculates bucket ID using primary key and a function
`vshard.router.bucket_id_strcrc32(key)`, it happen automatically and doesn't
require any actions from user side. User can calculate bucket ID on outside and
pass it as an option to CRUD methods that accepts tuple or object (see option
`bucket_id` below).

In version > 0.8.0 users who don't want to use primary key as a sharding key
may set custom sharding key definition as a part of [DDL
schema](https://github.com/tarantool/ddl#input-data-format) or insert manually
to the space `_ddl_sharding_key` (for both cases consider a DDL module
documentation). As soon as sharding key for a certain space is available in
`_ddl_sharding_key` space CRUD will use it for bucket ID calculation
automatically. Note that CRUD methods `delete()`, `get()` and `update()`
requires that sharding key must be a part of primary key.

Table below describe what operations supports custom sharding key:

| CRUD method | Sharding key support |
| -------------------------------- | -------------------------- |
| `get()` | Yes |
| `insert()` / `insert_object()` | Yes |
| `delete()` | Yes |
| `replace()` / `replace_object()` | Yes |
| `upsert()` / `upsert_object()` | Yes |
| `select()` / `pairs()` | Yes |
| `update()` | Yes |
| `upsert()` / `upsert_object()` | Yes |
| `replace() / replace_object()` | Yes |
| `min()` / `max()` | No (not required) |
| `cut_rows()` / `cut_objects()` | No (not required) |
| `truncate()` | No (not required) |
| `len()` | No (not required) |

Current limitations for using custom sharding key:

- It's not possible to update sharding keys automatically when schema is
updated on storages, see [#212](https://github.com/tarantool/crud/issues/212).
However it is possible to do it manually with
`sharding_key.update_sharding_keys_cache()`.
- CRUD select may lead map reduce in some cases, see
[#213](https://github.com/tarantool/crud/issues/213).
- No support of JSON path for sharding key, see
[#219](https://github.com/tarantool/crud/issues/219).

### Insert

Expand Down Expand Up @@ -115,7 +162,8 @@ local object, err = crud.get(space_name, key, opts)
where:

* `space_name` (`string`) - name of the space
* `key` (`any`) - primary key value
* `key` (`any`) - primary key value in version < 0.8.0 and sharding key when
DDL sharding key is used in version >= 0.8.0. See section 'Sharding key' above.
* `opts`:
* `fields` (`?table`) - field names for getting only a subset of fields
* `bucket_id` (`?number|cdata`) - bucket ID
Expand Down Expand Up @@ -152,7 +200,8 @@ local object, err = crud.update(space_name, key, operations, opts)
where:

* `space_name` (`string`) - name of the space
* `key` (`any`) - primary key value
* `key` (`any`) - primary key value in version < 0.8.0 and sharding key when
DDL sharding key is used in version >= 0.8.0. See section 'Sharding key' above.
* `operations` (`table`) - update [operations](https://www.tarantool.io/en/doc/latest/reference/reference_lua/box_space/#box-space-update)
* `opts`:
* `timeout` (`?number`) - `vshard.call` timeout (in seconds)
Expand Down Expand Up @@ -185,7 +234,8 @@ local object, err = crud.delete(space_name, key, opts)
where:

* `space_name` (`string`) - name of the space
* `key` (`any`) - primary key value
* `key` (`any`) - primary key value in version < 0.8.0 and sharding key when
DDL sharding key is used in version >= 0.8.0. See section 'Sharding key' above.
* `opts`:
* `timeout` (`?number`) - `vshard.call` timeout (in seconds)
* `bucket_id` (`?number|cdata`) - bucket ID
Expand Down

0 comments on commit 14bfdfb

Please sign in to comment.