Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap the project #1

Closed
knazarov opened this issue Dec 20, 2017 · 0 comments
Closed

Bootstrap the project #1

knazarov opened this issue Dec 20, 2017 · 0 comments

Comments

@knazarov
Copy link

knazarov commented Dec 20, 2017

Let's implement a proof-of-concept of sharding based on Virtual Buckets concept.

local router = require('vshard.router')
router.cfg { --will be moved to box.cfg() sometime
    sharding = {
           { uri = '10.0.0.21:3301', replicaset=1, master = true },
           { uri = '10.0.0.22:3301', replicaset=1 },

           { uri = '10.0.0.21:3302', replicaset=2, master = true },
           { uri = '10.0.0.22:3302', replicaset=2 },

           { uri = '10.0.0.21:3303', replicaset=3, master = true },
           { uri = '10.0.0.22:3303', replicaset=3 },
    },
}
  • router: parse configuration into Lua hash { replicaset => [master, slave, ...] }, where master-slave are net.box objects. Use https://github.com/tarantool/shard/tree/dev for this code.
  • storage: create table _vbucket to store bucked_id (the single field)
  • storage: set box.cfg { read_only = false } if master is false
  • router: on shard.cfg() connect to all servers and check for _vbucket table
  • router: create routing table based on _vbucket contents from replicasets
  • router: add a method shard.bootstrap() which will make initial redistribution of buckets over shards. The number of vbuckets can be hardcoded for now.
  • storage: add a dispatcher function which will check bucket_id for every request and execute a Lua function with provided arguments
vshard.storage.dispatch = function(bucket_id, func, {arg1, arg2, ...}) end
  • router: implement a function to call procedures by bucket_id:
    shard.call(key, func, {args}) => result
  • Add tests
  • Add CI
@Gerold103 Gerold103 changed the title Config table passed to vshard.router.cfg() is destroyed Bootstrap the project Jan 17, 2018
Gerold103 added a commit that referenced this issue Feb 29, 2020
Closes #207

@TarantoolBot document
Title: vshard.router.bucket_id_strcrc32() and .bucket_id_mpcrc32()
vshard.router.bucket_id() is deprecated, each its usage logs a
warning. It still works, but will be deleted in future.

Behaviour of the old bucket_id() function is now available as
vshard.router.bucket_id_strcrc32(). It works exactly like the old
function, but does not log a warning.

The reason why there is a new function bucket_id_mpcrc32() is that
the old bucket_id() and the new bucket_id_strcrc32() are not
consistent for cdata numbers. In particular, they return 3
different values for normal Lua numbers like 123, for unsigned
long long cdata (like 123ULL, or ffi.cast('unsigned long long',
123)), and for signed long long cdata (like 123LL, or
ffi.cast('long long', 123)). Note, this is important!

    vshard.router.bucket_id(123)
    vshard.router.bucket_id(123LL)
    vshard.router.bucket_id(123ULL)

    Return 3 different values!!!

For float and double cdata (ffi.cast('float', number),
ffi.cast('double', number)) these functions return different
values even for the same numbers of the same floating point type.
This is because tostring() on a floating point cdata number
returns not the number, but a pointer at it. Different on each
call.

vshard.router.bucket_id_strcrc32() behaves exactly the same, but
does not log a warning. In case you need that behaviour.

vshard.router.bucket_id_mpcrc32() is safer. It takes a CRC32 from
MessagePack encoded value. That is, bucket_id of integers does not
depend on their Lua type. However it still may return different
values for not equal floating point types. That is,
ffi.cast('float', number) may be reflected onto a bucket id not
equal to ffi.cast('double', number). This can't be fixed, because
a float value, even being casted to double, may have a garbage
tail in its fraction.

Floating point keys should not be used to calculate a bucket id,
usually.

P.S. #1: bucket_id_mpcrc32() in case of a string key does not
encode it into MessagePack, but takes hash right from the string.
This does not affect consistency of the function, but makes it as
fast as bucket_id_strcrc32().

P.S. #2: be very careful in case you store floating point types in
a space. When data is returned from a space, it is cased to Lua
number. And if that value had empty fraction part, it will be
treated as integer by bucket_id_mpcrc32(). So you need to do
explicit casts in such cases. Example of the problem:

s = box.schema.create_space('test', {format = {{'id', 'double'}}})
_ = s:create_index('pk')

inserted = ffi.cast('double', 1)

-- Value is stored as double.
s:replace({inserted})

-- But when returned to Lua, stored as Lua number, not cdata.
returned = s:get({inserted}).id
type(returned), returned
---
- number
- 1
...

vshard.router.bucket_id_mpcrc32(inserted)
---
- 1411
...
vshard.router.bucket_id_mpcrc32(returned)
---
- 1614
...
Gerold103 added a commit that referenced this issue Mar 2, 2020
Closes #207

@TarantoolBot document
Title: vshard.router.bucket_id_strcrc32() and .bucket_id_mpcrc32()
vshard.router.bucket_id() is deprecated, each its usage logs a
warning. It still works, but will be deleted in future.

Behaviour of the old bucket_id() function is now available as
vshard.router.bucket_id_strcrc32(). It works exactly like the old
function, but does not log a warning.

The reason why there is a new function bucket_id_mpcrc32() is that
the old bucket_id() and the new bucket_id_strcrc32() are not
consistent for cdata numbers. In particular, they return 3
different values for normal Lua numbers like 123, for unsigned
long long cdata (like 123ULL, or ffi.cast('unsigned long long',
123)), and for signed long long cdata (like 123LL, or
ffi.cast('long long', 123)). Note, this is important!

    vshard.router.bucket_id(123)
    vshard.router.bucket_id(123LL)
    vshard.router.bucket_id(123ULL)

    Return 3 different values!!!

For float and double cdata (ffi.cast('float', number),
ffi.cast('double', number)) these functions return different
values even for the same numbers of the same floating point type.
This is because tostring() on a floating point cdata number
returns not the number, but a pointer at it. Different on each
call.

vshard.router.bucket_id_strcrc32() behaves exactly the same, but
does not log a warning. In case you need that behaviour.

vshard.router.bucket_id_mpcrc32() is safer. It takes a CRC32 from
MessagePack encoded value. That is, bucket_id of integers does not
depend on their Lua type. However it still may return different
values for not equal floating point types. That is,
ffi.cast('float', number) may be reflected onto a bucket id not
equal to ffi.cast('double', number). This can't be fixed, because
a float value, even being casted to double, may have a garbage
tail in its fraction.

Floating point keys should not be used to calculate a bucket id,
usually.

P.S. #1: bucket_id_mpcrc32() in case of a string key does not
encode it into MessagePack, but takes hash right from the string.
This does not affect consistency of the function, but makes it as
fast as bucket_id_strcrc32().

P.S. #2: be very careful in case you store floating point types in
a space. When data is returned from a space, it is cased to Lua
number. And if that value had empty fraction part, it will be
treated as integer by bucket_id_mpcrc32(). So you need to do
explicit casts in such cases. Example of the problem:

s = box.schema.create_space('test', {format = {{'id', 'double'}}})
_ = s:create_index('pk')

inserted = ffi.cast('double', 1)

-- Value is stored as double.
s:replace({inserted})

-- But when returned to Lua, stored as Lua number, not cdata.
returned = s:get({inserted}).id
type(returned), returned
---
- number
- 1
...

vshard.router.bucket_id_mpcrc32(inserted)
---
- 1411
...
vshard.router.bucket_id_mpcrc32(returned)
---
- 1614
...
Serpentian added a commit to Serpentian/vshard that referenced this issue Dec 4, 2023
Part of tarantool#426

@TarantoolBot document
Title: vshard: config identification mode

The option `identification_mode` should be specified in the root of the
config. It can have one of those values:

* `'uuid_as_key'` - default. Means, that default uuid config
    identification is used. replica.name is allowed and should not be
    interpreted as `box.cfg.instance_name`. replica/replicaset.uuid is
    forbidden. The config should have the following format:
    {
        ['cbf06940-0790-498b-948d-042b62cf3d29'] = { -- replicaset tarantool#1
            replicas = {
                ['8a274925-a26d-47fc-9e1b-af88ce939412'] = {
                    name = 'storage_1_a',
                    ...
                },
                ...
            },
        },
        ...
    }

* `'name_as_key'`. Name identification is used, supported only by
    Tarantool >= 3.0.0. It's forbidden to specify replica.name in
    such format. UUIDs are optional and can be specified via
    replicaset/replica.uuid:
    {
        replicaset_1 = {
            uuid = 'cbf06940-0790-498b-948d-042b62cf3d29',
            replicas = {
                replica_1_a = {
                    uuid = '8a274925-a26d-47fc-9e1b-af88ce939412'
                    ...
                },
                ...
            }
        },
        ...
    }

    Note, that names, used as keys in config are passed to
    box.cfg.replicaset/instance_name for storage. In case of
    reconfiguration it's strictly validated, that both
    replicaset and instance name corresponds to the passed
    config. Vshard doesn't deal with changing or setting names,
    it must be done externally (using Tarantool's config module,
    for example).
Serpentian added a commit to Serpentian/vshard that referenced this issue Dec 13, 2023
Part of tarantool#426

@TarantoolBot document
Title: vshard: config identification mode

The option `identification_mode` should be specified in the root of the
config. It can have one of those values:

* `'uuid_as_key'` - default. Means, that default uuid config
    identification is used. replica.name is allowed and should not be
    interpreted as `box.cfg.instance_name`. replica/replicaset.uuid is
    forbidden. The config should have the following format:
    ```
    {
        ['cbf06940-0790-498b-948d-042b62cf3d29'] = { -- replicaset tarantool#1
            replicas = {
                ['8a274925-a26d-47fc-9e1b-af88ce939412'] = {
                    name = 'storage_1_a',
                    ...
                },
                ...
            },
        },
        ...
    }
    ```

* `'name_as_key'`. Name identification is used, supported only by
    Tarantool >= 3.0.0. It's forbidden to specify replica.name in
    such format. UUIDs are optional and can be specified via
    replicaset/replica.uuid:
    ```
    {
        replicaset_1 = {
            uuid = 'cbf06940-0790-498b-948d-042b62cf3d29',
            replicas = {
                replica_1_a = {
                    uuid = '8a274925-a26d-47fc-9e1b-af88ce939412'
                    ...
                },
                ...
            }
        },
        ...
    }
    ```

    Note, that names, used as keys in config are passed to
    box.cfg.replicaset/instance_name for storage. In case of
    reconfiguration it's strictly validated, that both
    replicaset and instance name corresponds to the passed
    config. Vshard doesn't deal with changing or setting names,
    it must be done externally (using Tarantool's config module,
    for example).
Serpentian added a commit to Serpentian/vshard that referenced this issue Dec 14, 2023
Part of tarantool#426

@TarantoolBot document
Title: vshard: config identification mode

The option `identification_mode` should be specified in the root of the
config. It can have one of those values:

* `'uuid_as_key'` - default. Means, that default uuid config
    identification is used. replica.name is allowed and should not be
    interpreted as `box.cfg.instance_name`. replica/replicaset.uuid is
    forbidden. The config should have the following format:
    ```
    {
        ['cbf06940-0790-498b-948d-042b62cf3d29'] = { -- replicaset tarantool#1
            replicas = {
                ['8a274925-a26d-47fc-9e1b-af88ce939412'] = {
                    name = 'storage_1_a',
                    ...
                },
                ...
            },
        },
        ...
    }
    ```

* `'name_as_key'`. Name identification is used, supported only by
    Tarantool >= 3.0.0. It's forbidden to specify replica.name in
    such format. UUIDs are optional and can be specified via
    replicaset/replica.uuid:
    ```
    {
        replicaset_1 = {
            uuid = 'cbf06940-0790-498b-948d-042b62cf3d29',
            replicas = {
                replica_1_a = {
                    uuid = '8a274925-a26d-47fc-9e1b-af88ce939412'
                    ...
                },
                ...
            }
        },
        ...
    }
    ```

    Note, that names, used as keys in config are passed to
    box.cfg.replicaset/instance_name for storage. In case of
    reconfiguration it's strictly validated, that both
    replicaset and instance name corresponds to the passed
    config. Vshard doesn't deal with changing or setting names,
    it must be done externally (using Tarantool's config module,
    for example).
Serpentian added a commit to Serpentian/vshard that referenced this issue Dec 15, 2023
Part of tarantool#426

@TarantoolBot document
Title: vshard: config identification mode

The option `identification_mode` should be specified in the root of the
config. It can have one of those values:

* `'uuid_as_key'` - default. Means, that default uuid config
    identification is used. replica.name is allowed and should not be
    interpreted as `box.cfg.instance_name`. replica/replicaset.uuid is
    forbidden. The config should have the following format:
    ```
    {
        ['cbf06940-0790-498b-948d-042b62cf3d29'] = { -- replicaset tarantool#1
            replicas = {
                ['8a274925-a26d-47fc-9e1b-af88ce939412'] = {
                    name = 'storage_1_a',
                    ...
                },
                ...
            },
        },
        ...
    }
    ```

* `'name_as_key'`. Name identification is used, supported only by
    Tarantool >= 3.0.0. It's forbidden to specify replica.name in
    such format. UUIDs are optional and can be specified via
    replicaset/replica.uuid:
    ```
    {
        replicaset_1 = {
            uuid = 'cbf06940-0790-498b-948d-042b62cf3d29',
            replicas = {
                replica_1_a = {
                    uuid = '8a274925-a26d-47fc-9e1b-af88ce939412'
                    ...
                },
                ...
            }
        },
        ...
    }
    ```

    Note, that names, used as keys in config are passed to
    box.cfg.replicaset/instance_name for storage. In case of
    reconfiguration it's strictly validated, that both
    replicaset and instance name corresponds to the passed
    config. Vshard doesn't deal with changing or setting names,
    it must be done externally (using Tarantool's config module,
    for example).
Gerold103 pushed a commit that referenced this issue Dec 19, 2023
Part of #426

@TarantoolBot document
Title: vshard: config identification mode

The option `identification_mode` should be specified in the root of the
config. It can have one of those values:

* `'uuid_as_key'` - default. Means, that default uuid config
    identification is used. replica.name is allowed and should not be
    interpreted as `box.cfg.instance_name`. replica/replicaset.uuid is
    forbidden. The config should have the following format:
    ```
    {
        ['cbf06940-0790-498b-948d-042b62cf3d29'] = { -- replicaset #1
            replicas = {
                ['8a274925-a26d-47fc-9e1b-af88ce939412'] = {
                    name = 'storage_1_a',
                    ...
                },
                ...
            },
        },
        ...
    }
    ```

* `'name_as_key'`. Name identification is used, supported only by
    Tarantool >= 3.0.0. It's forbidden to specify replica.name in
    such format. UUIDs are optional and can be specified via
    replicaset/replica.uuid:
    ```
    {
        replicaset_1 = {
            uuid = 'cbf06940-0790-498b-948d-042b62cf3d29',
            replicas = {
                replica_1_a = {
                    uuid = '8a274925-a26d-47fc-9e1b-af88ce939412'
                    ...
                },
                ...
            }
        },
        ...
    }
    ```

    Note, that names, used as keys in config are passed to
    box.cfg.replicaset/instance_name for storage. In case of
    reconfiguration it's strictly validated, that both
    replicaset and instance name corresponds to the passed
    config. Vshard doesn't deal with changing or setting names,
    it must be done externally (using Tarantool's config module,
    for example).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants