Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for master until a timeout #95

Closed
filonenko-mikhail opened this issue Dec 4, 2020 · 2 comments · Fixed by #337
Closed

Wait for master until a timeout #95

filonenko-mikhail opened this issue Dec 4, 2020 · 2 comments · Fixed by #337
Assignees
Labels
bug Something isn't working

Comments

@filonenko-mikhail
Copy link

  • Setup cartridge cluster with failover
  • Setup separated crud-router and crud-storage
  • Make a write load
  • Switch master under load
  • One time
LuajitError: ...rk/starwars/.rocks/share/tarantool/crud/common/utils.lua:35: attempt to index field 'master' (a nil value)
@Totktonada Totktonada added the bug Something isn't working label Jun 18, 2021
@Totktonada
Copy link
Member

It seems, it was crud-0.0.4. The relevant code is the following:

function utils.get_space(space_name, replicasets)
local replicaset = select(2, next(replicasets))
local space = replicaset.master.conn.space[space_name]
return space
end

The code remain unchanged since 0.0.4.

It seems logical that there is a time frame, when the old master is already demoted, but the new one is not promoted yet. Also I see that vshard contains 'whether there is a master?' checks that confirms my guess.

We should not lean on master presence unconditionally and should report a proper error if there is no master. The error should give a user hint that it is a transient error and the request can be safely retried.

BTW, crud has no autoretries / retry strategy settings?

@Totktonada
Copy link
Member

BTW, as I see, vshard's call automatically waits for master (until a timeout). It seems, it would be good to use the same approach.

NB: Don't forget to substract time we waiting for a space from net.box request timeout.

@Totktonada Totktonada changed the title Raise error when replace_object Wait for master until a timeout May 21, 2022
GRISHNOV added a commit that referenced this issue Jan 10, 2023
Added validation of the master availability to the `utils.get_space`
method before receiving the space through the connection.

Closes #95
Closes #331
GRISHNOV added a commit that referenced this issue Jan 16, 2023
Added validation of the master availability to the `utils.get_space`
method before receiving the space through the connection.

Closes #95
Closes #331
GRISHNOV added a commit that referenced this issue Jan 16, 2023
Added validation of the master availability to the `utils.get_space`
method before receiving the space through the connection.

Closes #95
Closes #331
GRISHNOV added a commit that referenced this issue Jan 16, 2023
Added validation of the master availability to the `utils.get_space`
method before receiving the space through the connection.

Closes #95
Closes #331
GRISHNOV added a commit that referenced this issue Jan 17, 2023
Added validation of the master availability to the `utils.get_space`
method before receiving the space through the connection.

Closes #95
Closes #331
GRISHNOV added a commit that referenced this issue Jan 17, 2023
Added validation of the master availability to the `utils.get_space`
method before receiving the space through the connection.

Closes #95
Closes #331
GRISHNOV added a commit that referenced this issue Jan 18, 2023
Added validation of the master presence in replicaset to
the `utils.get_space` method with timeout condition.

Closes #95
GRISHNOV added a commit that referenced this issue Jan 20, 2023
Added timeout condition for the validation of master presence in
replicaset and for the master connection to the `utils.get_space`
method.

Closes #95
GRISHNOV added a commit that referenced this issue Jan 23, 2023
Added timeout condition for the validation of master presence in
replicaset and for the master connection to the `utils.get_space`
method.

Closes #95
GRISHNOV added a commit that referenced this issue Jan 24, 2023
Added timeout condition for the validation of master presence in
replicaset and for the master connection to the `utils.get_space`
method.

Closes #95
GRISHNOV added a commit that referenced this issue Jan 27, 2023
Added timeout condition for the validation of master presence in
replicaset and for the master connection to the `utils.get_space`
method.

Closes #95
GRISHNOV added a commit that referenced this issue Jan 27, 2023
Added timeout condition for the validation of master presence in
replicaset and for the master connection to the `utils.get_space`
method.

Closes #95
DifferentialOrange pushed a commit that referenced this issue Jan 27, 2023
Added timeout condition for the validation of master presence in
replicaset and for the master connection to the `utils.get_space`
method.

Closes #95
DifferentialOrange added a commit that referenced this issue Feb 2, 2023
Overview

  This release introduces a breaking change with removing a deprecated
  feature: `crud.len(space_id)`.

  This release also introduces a Cartridge clusterwide config to setup
  `crud.cfg`.

Breaking changes

  You cannot use space id as a space identifier in `crud.len` anymore.
  Use space name instead.

New features

  * Timeout condition for the validation of master presence in 
    replicaset and for the master connection (#95).
  * Cartridge clusterwide configuration for `crud.cfg` (#332).

Changes

  * Forbid using space id in `crud.len` (#255).

Fixes

  * Add validation of the master presence in replicaset and the 
    master connection to the `utils.get_space` method before 
    receiving the space from the connection (#331).
  * Fix fiber cancel on schema reload timeout in `call_reload_schema`
    (PR #337).
@DifferentialOrange DifferentialOrange mentioned this issue Feb 2, 2023
1 task
DifferentialOrange added a commit that referenced this issue Feb 2, 2023
Overview

  This release introduces a breaking change with removing a deprecated
  feature: `crud.len(space_id)`.

  This release also introduces a Cartridge clusterwide config to setup
  `crud.cfg`.

Breaking changes

  You cannot use space id as a space identifier in `crud.len` anymore.
  Use space name instead.

New features

  * Timeout condition for the validation of master presence in 
    replicaset and for the master connection (#95).
  * Cartridge clusterwide configuration for `crud.cfg` (#332).

Changes

  * Forbid using space id in `crud.len` (#255).

Fixes

  * Add validation of the master presence in replicaset and the 
    master connection to the `utils.get_space` method before 
    receiving the space from the connection (#331).
  * Fix fiber cancel on schema reload timeout in `call_reload_schema`
    (PR #336).
DifferentialOrange added a commit that referenced this issue Feb 2, 2023
Overview

  This release introduces a breaking change with removing a deprecated
  feature: `crud.len(space_id)`.

  This release also introduces a Cartridge clusterwide config to setup
  `crud.cfg`.

Breaking changes

  You cannot use space id as a space identifier in `crud.len` anymore.
  Use space name instead.

New features

  * Timeout condition for the validation of master presence in 
    replicaset and for the master connection (#95).
  * Cartridge clusterwide configuration for `crud.cfg` (#332).

Changes

  * Forbid using space id in `crud.len` (#255).

Fixes

  * Add validation of the master presence in replicaset and the 
    master connection to the `utils.get_space` method before 
    receiving the space from the connection (#331).
  * Fix fiber cancel on schema reload timeout in `call_reload_schema`
    (PR #336).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants