Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make automated invalidation of caches on router on schema reload or ddl sharding keys update #212

Closed
ligurio opened this issue Sep 13, 2021 · 3 comments · Fixed by #268
Closed
Assignees
Labels
bug Something isn't working

Comments

@ligurio
Copy link
Member

ligurio commented Sep 13, 2021

Imagine user have a cluster with running CRUD module. He wants to update a schema or update sharding keys in _ddl_sharding_key space. In such case we should update cache with sharding keys on router.
Now we have a mechanism to follow schema changes on storage (see option add_space_schema_hash in CRUD operations and function schema.get_space_schema_hash()).

Patch for this can look like this:

diff --git a/crud/borders.lua b/crud/borders.lua
index ac77cef..b702d09 100644
--- a/crud/borders.lua
+++ b/crud/borders.lua
@@ -108,9 +108,6 @@ local function call_get_border_on_router(border_name, space_name, index_name, op
         local storage_result = storage_result[1]
         if storage_result.err ~= nil then
             local need_reload = schema.result_needs_reload(space, storage_result)
-            if need_reload == true then
-                sharding.schema_reload_actions(space_name, storage_result.ddl_sharding_key)
-            end
             return nil, BorderError:new("Failed to get %s: %s", border_name, storage_result.err), need_reload
         end
 
diff --git a/crud/common/schema.lua b/crud/common/schema.lua
index 93afc01..bbfae66 100644
--- a/crud/common/schema.lua
+++ b/crud/common/schema.lua
@@ -214,7 +214,6 @@ function schema.wrap_func_result(space, func, args, opts)
     else
         result.res = filter_tuple_fields(func_res, opts.field_names)
     end
-    result.ddl_sharding_key = schema.fetch_ddl_sharding_key(box, space.name)
 
     return result
 end
diff --git a/crud/common/sharding.lua b/crud/common/sharding.lua
index 528b4fe..07d0275 100644
--- a/crud/common/sharding.lua
+++ b/crud/common/sharding.lua
@@ -109,14 +109,6 @@ function sharding.is_sharding_key_in_primary_index(space_name, primary_index, sh
     return sharding_key_in_primary_index_cache[space_name]
 end
 
-function sharding.schema_reload_actions(space_name, sharding_key)
-    dev_checks('string', 'table')
-
-    ddl_sharding_keys_cache[space_name] = sharding_key
-    sharding_key_in_primary_index_cache[space_name] = nil
-    sharding_key_fieldnos_cache[space_name] = nil
-end
-
 -- Build an array with sharding key values.
 local function build_sharding_key(key, index_parts, sharding_key_fieldno_map)
     dev_checks('table', 'table', 'table')
diff --git a/crud/insert.lua b/crud/insert.lua
index 3b9fe1f..9d4332c 100644
--- a/crud/insert.lua
+++ b/crud/insert.lua
@@ -84,9 +84,6 @@ local function call_insert_on_router(space_name, tuple, opts)
 
     if storage_result.err ~= nil then
         local need_reload = schema.result_needs_reload(space, storage_result)
-        if need_reload == true then
-            sharding.schema_reload_actions(space_name, storage_result.ddl_sharding_key)
-        end
         return nil, InsertError:new("Failed to insert: %s", storage_result.err), need_reload
     end
 
diff --git a/crud/replace.lua b/crud/replace.lua
index f52f331..26d5721 100644
--- a/crud/replace.lua
+++ b/crud/replace.lua
@@ -88,9 +88,6 @@ local function call_replace_on_router(space_name, tuple, opts)
 
     if storage_result.err ~= nil then
         local need_reload = schema.result_needs_reload(space, storage_result)
-        if need_reload == true then
-            sharding.schema_reload_actions(space_name, storage_result.ddl_sharding_key)
-        end
         return nil, ReplaceError:new("Failed to replace: %s", storage_result.err), need_reload
     end
 
diff --git a/crud/upsert.lua b/crud/upsert.lua
index 87f16c0..d91d8ba 100644
--- a/crud/upsert.lua
+++ b/crud/upsert.lua
@@ -90,9 +90,6 @@ local function call_upsert_on_router(space_name, tuple, user_operations, opts)
 
     if storage_result.err ~= nil then
         local need_reload = schema.result_needs_reload(space, storage_result)
-        if need_reload == true then
-            sharding.schema_reload_actions(space_name, storage_result.ddl_sharding_key)
-        end
         return nil, UpsertError:new("Failed to upsert: %s", storage_result.err), need_reload
     end

Things missed in a patch:

  • schema.get_space_schema_hash follows only changes in spaces with data and not _ddl_sharding_key and _ddl_sharding_func.
  • crud.insert() does not set opts.add_space_schema_hash = true (as opposite to crud.insert_object().
  • anything else?

Part of #166

@Totktonada
Copy link
Member

  • anything else?

crud.insert() does not set opts.add_space_schema_hash = true (as opposite to crud.insert_object().

ligurio added a commit that referenced this issue Sep 16, 2021
Previously there were two different ways to obtain bucket id in CRUD:

- calculate bucket id automatically using primary key (default)
- pass it from outside explicitly in options on CRUD operation call

Users who uses DDL module [1] may specify sharding key (that are
actually names of tuple fields), but it was not possible to use DDL
sharding key for bucket id calculation. Now CRUD allows to use that
custom sharding key to calculate bucket id, it will be done
automatically when used DDL schema with non-empty sharding_key [1] or
when space _ddl_sharding_key contains a tuple with space name and it's
sharding key.

Table below describe what operations supports custom sharding key:

| CRUD method                  | Added sharding key support |
| ---------------------------- | -------------------------- |
| get()                        | Yes                        |
| insert() / insert_object()   | Yes                        |
| delete()                     | Yes                        |
| replace() / replace_object() | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| select() / pairs()           | Yes                        |
| update()                     | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| replace() / replace_object() | Yes                        |
| min() / max()                | No (not required)          |
| cut_rows() / cut_objects()   | No (not required)          |
| truncate()                   | No (not required)          |
| len()                        | No (not required)          |

Limitations:

- It's not possible to update sharding keys automatically when schema is
  updated on storages, see [2]. However it is possible to do it manually with
  sharding_key.update_sharding_keys_cache().
- CRUD select may lead map reduce in some cases, see [3].

1. https://github.com/tarantool/ddl
2. #212
3. #213

Closes #166
ligurio added a commit that referenced this issue Sep 16, 2021
Previously there were two different ways to obtain bucket id in CRUD:

- calculate bucket id automatically using primary key (default)
- pass it from outside explicitly in options on CRUD operation call

Users who uses DDL module [1] may specify sharding key (that are
actually names of tuple fields), but it was not possible to use DDL
sharding key for bucket id calculation. Now CRUD allows to use that
custom sharding key to calculate bucket id, it will be done
automatically when used DDL schema with non-empty sharding_key [1] or
when space _ddl_sharding_key contains a tuple with space name and it's
sharding key.

Table below describe what operations supports custom sharding key:

| CRUD method                  | Added sharding key support |
| ---------------------------- | -------------------------- |
| get()                        | Yes                        |
| insert() / insert_object()   | Yes                        |
| delete()                     | Yes                        |
| replace() / replace_object() | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| select() / pairs()           | Yes                        |
| update()                     | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| replace() / replace_object() | Yes                        |
| min() / max()                | No (not required)          |
| cut_rows() / cut_objects()   | No (not required)          |
| truncate()                   | No (not required)          |
| len()                        | No (not required)          |

Limitations:

- It's not possible to update sharding keys automatically when schema is
  updated on storages, see [2]. However it is possible to do it manually with
  sharding_key.update_sharding_keys_cache().
- CRUD select may lead map reduce in some cases, see [3].

1. https://github.com/tarantool/ddl
2. #212
3. #213

Closes #166
ligurio added a commit that referenced this issue Sep 17, 2021
Previously there were two different ways to obtain bucket id in CRUD:

- calculate bucket id automatically using primary key (default)
- pass it from outside explicitly in options on CRUD operation call

Users who uses DDL module [1] may specify sharding key (that are
actually names of tuple fields), but it was not possible to use DDL
sharding key for bucket id calculation. Now CRUD allows to use that
custom sharding key to calculate bucket id, it will be done
automatically when used DDL schema with non-empty sharding_key [1] or
when space _ddl_sharding_key contains a tuple with space name and it's
sharding key.

Table below describe what operations supports custom sharding key:

| CRUD method                  | Added sharding key support |
| ---------------------------- | -------------------------- |
| get()                        | Yes                        |
| insert() / insert_object()   | Yes                        |
| delete()                     | Yes                        |
| replace() / replace_object() | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| select() / pairs()           | Yes                        |
| update()                     | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| replace() / replace_object() | Yes                        |
| min() / max()                | No (not required)          |
| cut_rows() / cut_objects()   | No (not required)          |
| truncate()                   | No (not required)          |
| len()                        | No (not required)          |

Limitations:

- It's not possible to update sharding keys automatically when schema is
  updated on storages, see [2]. However it is possible to do it manually with
  sharding_key.update_sharding_keys_cache().
- CRUD select may lead map reduce in some cases, see [3].

1. https://github.com/tarantool/ddl
2. #212
3. #213

Closes #166
ligurio added a commit that referenced this issue Sep 20, 2021
Previously there were two different ways to obtain bucket id in CRUD:

- calculate bucket id automatically using primary key (default)
- pass it from outside explicitly in options on CRUD operation call

Users who uses DDL module [1] may specify sharding key (that are
actually names of tuple fields), but it was not possible to use DDL
sharding key for bucket id calculation. Now CRUD allows to use that
custom sharding key to calculate bucket id, it will be done
automatically when used DDL schema with non-empty sharding_key [1] or
when space _ddl_sharding_key contains a tuple with space name and it's
sharding key.

Table below describe what operations supports custom sharding key:

| CRUD method                  | Added sharding key support |
| ---------------------------- | -------------------------- |
| get()                        | Yes                        |
| insert() / insert_object()   | Yes                        |
| delete()                     | Yes                        |
| replace() / replace_object() | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| select() / pairs()           | Yes                        |
| update()                     | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| replace() / replace_object() | Yes                        |
| min() / max()                | No (not required)          |
| cut_rows() / cut_objects()   | No (not required)          |
| truncate()                   | No (not required)          |
| len()                        | No (not required)          |

Limitations:

- It's not possible to update sharding keys automatically when schema is
  updated on storages, see [2]. However it is possible to do it manually with
  sharding_key.update_sharding_keys_cache().
- CRUD select may lead map reduce in some cases, see [3].

1. https://github.com/tarantool/ddl
2. #212
3. #213

Closes #166
ligurio added a commit that referenced this issue Sep 21, 2021
Previously there were two different ways to obtain bucket id in CRUD:

- calculate bucket id automatically using primary key (default)
- pass it from outside explicitly in options on CRUD operation call

Users who uses DDL module [1] may specify sharding key (that are
actually names of tuple fields), but it was not possible to use DDL
sharding key for bucket id calculation. Now CRUD allows to use that
custom sharding key to calculate bucket id, it will be done
automatically when used DDL schema with non-empty sharding_key [1] or
when space _ddl_sharding_key contains a tuple with space name and it's
sharding key.

Table below describe what operations supports custom sharding key:

| CRUD method                  | Added sharding key support |
| ---------------------------- | -------------------------- |
| get()                        | Yes                        |
| insert() / insert_object()   | Yes                        |
| delete()                     | Yes                        |
| replace() / replace_object() | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| select() / pairs()           | Yes                        |
| update()                     | Yes                        |
| upsert() / upsert_object()   | Yes                        |
| replace() / replace_object() | Yes                        |
| min() / max()                | No (not required)          |
| cut_rows() / cut_objects()   | No (not required)          |
| truncate()                   | No (not required)          |
| len()                        | No (not required)          |

Limitations:

- It's not possible to update sharding keys automatically when schema is
  updated on storages, see [2]. However it is possible to do it manually with
  sharding_key.update_sharding_keys_cache().
- CRUD select may lead map reduce in some cases, see [3].

1. https://github.com/tarantool/ddl
2. #212
3. #213

Closes #166
@kyukhin kyukhin added bug Something isn't working teamE labels Sep 24, 2021
ligurio added a commit that referenced this issue Sep 28, 2021
Describe functionality and current limitations (#212 and #213) with
custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 28, 2021
Describe functionality and current limitations (#212 and #213) with
custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 28, 2021
Describe functionality and current limitations (#212 and #213) with
custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 28, 2021
Describe functionality and current limitations (#212 and #213) with
custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 28, 2021
Describe functionality and current limitations (#212 and #213) with
custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 29, 2021
Describe functionality and current limitations (#212 and #213) with
custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 29, 2021
Describe functionality and current limitations (#212 and #213) with
custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 29, 2021
Describe functionality and current limitations (#212 and #213) with
custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 29, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 29, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 29, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 30, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 30, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 30, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 30, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Sep 30, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Oct 1, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Nov 18, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Nov 18, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Nov 18, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
ligurio added a commit that referenced this issue Nov 18, 2021
Describe functionality and current limitations (#212, #213 and #219)
with custom sharding key in CHANGELOG and README.

Closes #166
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
Fetch sharding info hashes to router on ddl schema load. Hashes are
stored in router metadata cache together with sharding info.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
Return error if router sharding info differs from storage sharding info.
Comparison is based on sharding hash values. Hashes are provided with
each relevant request.

Hashes are extracted together with sharding key and sharding func
definitions on router during request execution.

After this patch, the performance of insert requests decreased by 5%,
the performance of select requests decreased by 1.5%.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
If sharding info mismatch has happened, sharding info will be reloaded
on router. After that, request will be retried with new sharding info
(expect for pairs requests due to its nature, they must be retried
manually).

There are no detectable performance drops introduced in this patch.

Closes #212
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
Return error if router sharding info differs from storage sharding info.
Comparison is based on sharding hash values. Hashes are provided with
each relevant request.

Hashes are extracted together with sharding key and sharding func
definitions on router during request execution.

After this patch, the performance of insert requests decreased by 5%,
the performance of select requests decreased by 1.5%.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
If sharding info mismatch has happened, sharding info will be reloaded
on router. After that, request will be retried with new sharding info
(expect for pairs requests due to its nature, they must be retried
manually).

There are no detectable performance drops introduced in this patch.

Closes #212
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
Compute and store sharding key and sharding func hashes on storages.
Hashes are updated with on_replace triggers.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
Rename sharding_metadata_cache to router_metadata_cache to distinct it
from storage_metadata_hash.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
Fetch sharding info hashes to router on ddl schema load. Hashes are
stored in router metadata cache together with sharding info.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
Return error if router sharding info differs from storage sharding info.
Comparison is based on sharding hash values. Hashes are provided with
each relevant request.

Hashes are extracted together with sharding key and sharding func
definitions on router during request execution.

After this patch, the performance of insert requests decreased by 5%,
the performance of select requests decreased by 1.5%.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 19, 2022
If sharding info mismatch has happened, sharding info will be reloaded
on router. After that, request will be retried with new sharding info
(expect for pairs requests due to its nature, they must be retried
manually).

There are no detectable performance drops introduced in this patch.

Closes #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Compute and store sharding key and sharding func hashes on storages.
Hashes are updated with on_replace triggers.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Rename sharding_metadata_cache to router_metadata_cache to distinct it
from storage_metadata_hash.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Fetch sharding info hashes to router on ddl schema load. Hashes are
stored in router metadata cache together with sharding info.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Return error if router sharding info differs from storage sharding info.
Comparison is based on sharding hash values. Hashes are provided with
each relevant request.

Hashes are extracted together with sharding key and sharding func
definitions on router during request execution.

After this patch, the performance of insert requests decreased by 5%,
the performance of select requests decreased by 1.5%.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
If sharding info mismatch has happened, sharding info will be reloaded
on router. After that, request will be retried with new sharding info
(expect for pairs requests due to its nature, they must be retried
manually).

There are no detectable performance drops introduced in this patch.

Closes #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Since sharding schema reloads must be processed automatically after this
patchset, there shouldn't be usual cases where user need to reload
sharding info manually. Thus methods for manual sharding schema reload
are deprecated and will be removed in future releases.

Follows up #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Compute and store sharding key and sharding func hashes on storages.
Hashes are updated with on_replace triggers.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Rename sharding_metadata_cache to router_metadata_cache to distinct it
from storage_metadata_hash.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Fetch sharding info hashes to router on ddl schema load. Hashes are
stored in router metadata cache together with sharding info.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Return error if router sharding info differs from storage sharding info.
Comparison is based on sharding hash values. Hashes are provided with
each relevant request.

Hashes are extracted together with sharding key and sharding func
definitions on router during request execution.

After this patch, the performance of insert requests decreased by 5%,
the performance of select requests decreased by 1.5%.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
If sharding info mismatch has happened, sharding info will be reloaded
on router. After that, request will be retried with new sharding info
(expect for pairs requests due to its nature, they must be retried
manually).

There are no detectable performance drops introduced in this patch.

Closes #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Since sharding schema reloads must be processed automatically after this
patchset, there shouldn't be usual cases where user need to reload
sharding info manually. Thus methods for manual sharding schema reload
are deprecated and will be removed in future releases.

Follows up #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
If crud request uses tuple as input argument (insert, upsert and replace
operations) and its bucket_id is empty, the module will fill this field
and damage input argument tuple. This patch fixes this behavior.

After this patch, performance of insert, upsert and replace has
decreased by 5%.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Compute and store sharding key and sharding func hashes on storages.
Hashes are updated with on_replace triggers.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Rename sharding_metadata_cache to router_metadata_cache to distinct it
from storage_metadata_hash.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Fetch sharding info hashes to router on ddl schema load. Hashes are
stored in router metadata cache together with sharding info.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Return error if router sharding info differs from storage sharding info.
Comparison is based on sharding hash values. Hashes are provided with
each relevant request.

Hashes are extracted together with sharding key and sharding func
definitions on router during request execution.

After this patch, the performance of insert requests decreased by 5%,
the performance of select requests decreased by 1.5%.

Part of #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
If sharding info mismatch has happened, sharding info will be reloaded
on router. After that, request will be retried with new sharding info
(expect for pairs requests due to its nature, they must be retried
manually).

There are no detectable performance drops introduced in this patch.

Closes #212
DifferentialOrange added a commit that referenced this issue Apr 20, 2022
Since sharding schema reloads must be processed automatically after this
patchset, there shouldn't be usual cases where user need to reload
sharding info manually. Thus methods for manual sharding schema reload
are deprecated and will be removed in future releases.

Follows up #212
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants