Skip to content

Commit

Permalink
Doc: use custom sharding key to calculate bucket id
Browse files Browse the repository at this point in the history
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
  • Loading branch information
ligurio committed Nov 18, 2021
1 parent 6370568 commit d5593c6
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 5 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added

* CRUD operations calculates bucket id automatically using sharding
key specified with DDL schema or in `_ddl_sharding_key` space.
NOTE: CRUD methods delete(), get() and update() requires that sharding key
must be a part of primary key.

### Changed

### Fixed
Expand Down
57 changes: 52 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,58 @@ crud.unflatten_rows(res.rows, res.metadata)
**Notes:**

* A space should have a format.
* By default, `bucket_id` is computed as `vshard.router.bucket_id_strcrc32(key)`,
where `key` is the primary key value.
Custom bucket ID can be specified as `opts.bucket_id` for each operation.
For operations that accepts tuple/object bucket ID can be specified as
tuple/object field as well as `opts.bucket_id` value.

**Sharding key and bucket id calculation**

*Sharding key* is a set of tuple field values used for calculation *bucket ID*.
*Sharding key definition* is a set of tuple field names that describe what
tuple field should be a part of sharding key. *Bucket ID* determines which
replicaset stores certain data. Function that used for calculation bucket ID is
named *sharding function*.

By default CRUD calculates bucket ID using primary key and a function
`vshard.router.bucket_id_strcrc32(key)`, it happen automatically and doesn't
require any actions from user side. User can calculate bucket ID on outside and
pass it as an option to CRUD methods that accepts tuple or object (see option
`bucket_id` below).

Starting from 0.10.0 users who don't want to use primary key as a sharding key
may set custom sharding key definition as a part of [DDL
schema](https://github.com/tarantool/ddl#input-data-format) or insert manually
to the space `_ddl_sharding_key` (for both cases consider a DDL module
documentation). As soon as sharding key for a certain space is available in
`_ddl_sharding_key` space CRUD will use it for bucket ID calculation
automatically. Note that CRUD methods `delete()`, `get()` and `update()`
requires that sharding key must be a part of primary key.

Table below describe what operations supports custom sharding key:

| CRUD method | Sharding key support |
| -------------------------------- | -------------------------- |
| `get()` | Yes |
| `insert()` / `insert_object()` | Yes |
| `delete()` | Yes |
| `replace()` / `replace_object()` | Yes |
| `upsert()` / `upsert_object()` | Yes |
| `select()` / `pairs()` | Yes |
| `update()` | Yes |
| `upsert()` / `upsert_object()` | Yes |
| `replace() / replace_object()` | Yes |
| `min()` / `max()` | No (not required) |
| `cut_rows()` / `cut_objects()` | No (not required) |
| `truncate()` | No (not required) |
| `len()` | No (not required) |

Current limitations for using custom sharding key:

- It's not possible to update sharding keys automatically when schema is
updated on storages, see [#212](https://github.com/tarantool/crud/issues/212).
However it is possible to do it manually with
`require('crud.sharding_key').update_sharding_keys_cache()`.
- CRUD select may lead map reduce in some cases, see
[#213](https://github.com/tarantool/crud/issues/213).
- No support of JSON path for sharding key, see
[#219](https://github.com/tarantool/crud/issues/219).

### Insert

Expand Down

0 comments on commit d5593c6

Please sign in to comment.