Skip to content

Commit

Permalink
Doc: use custom sharding key to calculate bucket id
Browse files Browse the repository at this point in the history
Describe functionality and current limitations (#212, #213, #219, #243)
with custom sharding key in CHANGELOG and README.

Thanks to Oleg Babin (@olegrok) and Alexander Turenko (@Totktonada) for
help with feature implementation.

Closes #166

Reviewed-by: Oleg Babin <[email protected]>
Reviewed-by: Alexander Turenko <[email protected]>
Co-authored-by: Georgy Moiseev <[email protected]>
  • Loading branch information
ligurio and DifferentialOrange committed Nov 27, 2021
1 parent 76439df commit 0c5776d
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 5 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added

* CRUD operations calculates bucket id automatically using sharding
key specified with DDL schema or in `_ddl_sharding_key` space.
NOTE: CRUD methods delete(), get() and update() requires that sharding key
must be a part of primary key.

### Changed

### Fixed
Expand Down
59 changes: 54 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,60 @@ crud.unflatten_rows(res.rows, res.metadata)
**Notes:**

* A space should have a format.
* By default, `bucket_id` is computed as `vshard.router.bucket_id_strcrc32(key)`,
where `key` is the primary key value.
Custom bucket ID can be specified as `opts.bucket_id` for each operation.
For operations that accepts tuple/object bucket ID can be specified as
tuple/object field as well as `opts.bucket_id` value.

**Sharding key and bucket id calculation**

*Sharding key* is a set of tuple field values used for calculation *bucket ID*.
*Sharding key definition* is a set of tuple field names that describe what
tuple field should be a part of sharding key. *Bucket ID* determines which
replicaset stores certain data. Function that used for bucket ID calculation is
named *sharding function*.

By default CRUD calculates bucket ID using primary key and a function
`vshard.router.bucket_id_strcrc32(key)`, it happen automatically and doesn't
require any actions from user side. However, for operations that accepts
tuple/object bucket ID can be specified as tuple/object field as well as
`opts.bucket_id` value.

Starting from 0.10.0 users who don't want to use primary key as a sharding key
may set custom sharding key definition as a part of [DDL
schema](https://github.com/tarantool/ddl#input-data-format) or insert manually
to the space `_ddl_sharding_key` (for both cases consider a DDL module
documentation). As soon as sharding key for a certain space is available in
`_ddl_sharding_key` space CRUD will use it for bucket ID calculation
automatically. Note that CRUD methods `delete()`, `get()` and `update()`
requires that sharding key must be a part of primary key.

Table below describe what operations supports custom sharding key:

| CRUD method | Sharding key support |
| -------------------------------- | -------------------------- |
| `get()` | Yes |
| `insert()` / `insert_object()` | Yes |
| `delete()` | Yes |
| `replace()` / `replace_object()` | Yes |
| `upsert()` / `upsert_object()` | Yes |
| `select()` / `pairs()` | Yes |
| `update()` | Yes |
| `upsert()` / `upsert_object()` | Yes |
| `replace() / replace_object()` | Yes |
| `min()` / `max()` | No (not required) |
| `cut_rows()` / `cut_objects()` | No (not required) |
| `truncate()` | No (not required) |
| `len()` | No (not required) |

Current limitations for using custom sharding key:

- It's not possible to update sharding keys automatically when schema is
updated on storages, see
[#212](https://github.com/tarantool/crud/issues/212). However it is possible
to do it manually with `require('crud.sharding_key').update_cache()`.
- CRUD select may lead map reduce in some cases, see
[#213](https://github.com/tarantool/crud/issues/213).
- No support of JSON path for sharding key, see
[#219](https://github.com/tarantool/crud/issues/219).
- `primary_index_fieldno_map` is not cached, see
[#243](https://github.com/tarantool/crud/issues/243).

### Insert

Expand Down

0 comments on commit 0c5776d

Please sign in to comment.