Skip to content

Commit

Permalink
Timeout Fixes and VTOrc Improvement (#11881)
Browse files Browse the repository at this point in the history
* refactor: move tests out of newfeaturestest so that they run on upgrade-downgrade tests too

Signed-off-by: Manan Gupta <[email protected]>

* feat: add failing ers test for handling multiple vttablet failures with default values of flags

Signed-off-by: Manan Gupta <[email protected]>

* feat: add a new lock-timeout flag and use that instead of remote-operation-timeout

Signed-off-by: Manan Gupta <[email protected]>

* feat: augment DownPrimary test to reproduce the issue of VTOrc not handling multiple failures

Signed-off-by: Manan Gupta <[email protected]>

* feat: remove LockShardTimeout configuration from VTOrc and add parallelism to refresh of tablets

Signed-off-by: Manan Gupta <[email protected]>

* log: add more logging lines around ers in vtorc

Signed-off-by: Manan Gupta <[email protected]>

* test: get the test to work

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix usage of wait for replicas timeout

Signed-off-by: Manan Gupta <[email protected]>

* test: fix flags expected output

Signed-off-by: Manan Gupta <[email protected]>

* test: fix race in test now that the function is called in parallel multiple times

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix default of onCloseTimeout to 1 second

Signed-off-by: Manan Gupta <[email protected]>

* test: add failing unit test to refreshTabletsInKeyspaceShard

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix vtorc to not forget a tablet which has been deleted

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix backward compatibility, add tests and release notes docs

Signed-off-by: Manan Gupta <[email protected]>

* test: fix flags output

Signed-off-by: Manan Gupta <[email protected]>

* test: use disable-replication-manager instead of disable-active-reparents to allow vttablets to setup replication when restarted

Signed-off-by: Manan Gupta <[email protected]>

* test: fix flaky test by not checking for an error

Signed-off-by: Manan Gupta <[email protected]>

* feat: handle the case of empty hostname in tablet initialization

Signed-off-by: Manan Gupta <[email protected]>

* feat: update onclose timeout to 10 seconds

Signed-off-by: Manan Gupta <[email protected]>

* test: fix unit test

Signed-off-by: Manan Gupta <[email protected]>

* feat: address review comments

Signed-off-by: Manan Gupta <[email protected]>

* docs: add comments explaining the test functions

Signed-off-by: Manan Gupta <[email protected]>

* feat: add summary docs for 'lock-shard-timeout' deprecation

Signed-off-by: Manan Gupta <[email protected]>

Signed-off-by: Manan Gupta <[email protected]>
  • Loading branch information
GuptaManan100 authored Dec 16, 2022
1 parent f7dc5c0 commit 46d63e9
Show file tree
Hide file tree
Showing 29 changed files with 403 additions and 223 deletions.
16 changes: 15 additions & 1 deletion doc/releasenotes/16_0_0_summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,14 @@ If you have code searching for error strings from Vitess, this is a breaking cha
Many error strings have been tweaked.
If your application is searching for specific errors, you might need to update your code.

#### <a id="lock-timeout-introduction"/> `lock-timeout` and `remote_operation_timeout` Changes
Earlier, the shard and keyspace locks used to be capped by the `remote_operation_timeout`. This is no longer the case and instead a new flag called `lock-timeout` is introduced.
For backward compatibility, if `lock-timeout` is unspecified and `remote_operation_timeout` flag is provided, then its value will also be used for `lock-timeout` as well.
The default value for `remote_operation_timeout` has also changed from 30 seconds to 15 seconds. The default for the new flag `lock-timeout` is 45 seconds.

During upgrades, if the users want to preserve the same behaviour as previous releases, then they should provide the `remote_operation_timeout` flag explicitly before upgrading.
After the upgrade, they should then alter their configuration to also specify `lock-timeout` explicitly.

### New command line flags and behavior

#### VTGate: Support query timeout --query-timeout
Expand Down Expand Up @@ -232,4 +240,10 @@ VSchema Example
"view2": "select * from t2",
}
}
```
```

### VTOrc

#### Flag Deprecations

The flag `lock-shard-timeout` has been deprecated. Please use the newly introduced `lock-timeout` instead. More detail [here](#lock-timeout-introduction).
2 changes: 1 addition & 1 deletion go/flags/endtoend/mysqlctl.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Global flags:
--mysqlctl_client_protocol string the protocol to use to talk to the mysqlctl server (default "grpc")
--mysqlctl_mycnf_template string template file to use for generating the my.cnf file during server init
--mysqlctl_socket string socket file to use for remote mysqlctl actions (empty for local actions)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
--pool_hostname_resolve_interval duration if set force an update to all hostnames and reconnect if changed, defaults to 0 (disabled)
Expand Down
2 changes: 1 addition & 1 deletion go/flags/endtoend/mysqlctld.txt
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Usage of mysqlctld:
--mysql_socket string Path to the mysqld socket file
--mysqlctl_mycnf_template string template file to use for generating the my.cnf file during server init
--mysqlctl_socket string socket file to use for remote mysqlctl actions (empty for local actions)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
--pool_hostname_resolve_interval duration if set force an update to all hostnames and reconnect if changed, defaults to 0 (disabled)
Expand Down
3 changes: 2 additions & 1 deletion go/flags/endtoend/vtbackup.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ Usage of vtbackup:
--keep-alive-timeout duration Wait until timeout elapses after a successful backup before shutting down.
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
Expand Down Expand Up @@ -119,7 +120,7 @@ Usage of vtbackup:
--port int port for the server
--pprof strings enable profiling
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--restart_before_backup Perform a mysqld clean/full restart after applying binlogs, but before taking the backup. Only makes sense to work around xtrabackup bugs.
--s3_backup_aws_endpoint string endpoint of the S3 backend (region must be provided).
--s3_backup_aws_region string AWS region to use. (default "us-east-1")
Expand Down
7 changes: 4 additions & 3 deletions go/flags/endtoend/vtctld.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Usage of vtctld:
--action_timeout duration time to wait for an action before resorting to force (default 2m0s)
--action_timeout duration time to wait for an action before resorting to force (default 1m0s)
--alsologtostderr log to standard error as well as files
--azblob_backup_account_key_file string Path to a file containing the Azure Storage account key; if this flag is unset, the environment variable VT_AZBLOB_ACCOUNT_KEY will be used as the key itself (NOT a file path).
--azblob_backup_account_name string Azure Storage Account name for backups; if this flag is unset, the environment variable VT_AZBLOB_ACCOUNT_NAME will be used.
Expand Down Expand Up @@ -60,21 +60,22 @@ Usage of vtctld:
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
--log_rotate_max_size uint size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800)
--logtostderr log to standard error instead of files
--max-stack-size int configure the maximum stack size in bytes (default 67108864)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--opentsdb_uri string URI of opentsdb /api/put method
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
--port int port for the server
--pprof strings enable profiling
--proxy_tablets Setting this true will make vtctld proxy the tablet status instead of redirecting to them
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--s3_backup_aws_endpoint string endpoint of the S3 backend (region must be provided).
--s3_backup_aws_region string AWS region to use. (default "us-east-1")
--s3_backup_aws_retries int AWS request retries. (default -1)
Expand Down
5 changes: 3 additions & 2 deletions go/flags/endtoend/vtgate.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ Usage of vtgate:
--keyspaces_to_watch strings Specifies which keyspaces this vtgate should have access to while routing queries or accessing the vschema.
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--legacy_replication_lag_algorithm Use the legacy algorithm when selecting vttablets for serving. (default true)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--lock_heartbeat_time duration If there is lock function used. This will keep the lock connection active by using this heartbeat (default 5s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
Expand Down Expand Up @@ -119,7 +120,7 @@ Usage of vtgate:
--mysql_tcp_version string Select tcp, tcp4, or tcp6 to control the socket type. (default "tcp")
--no_scatter when set to true, the planner will fail instead of producing a plan that includes scatter queries
--normalize_queries Rewrite queries with bind vars. Turn this off if the app itself sends normalized queries with bind vars. (default true)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--opentsdb_uri string URI of opentsdb /api/put method
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
Expand All @@ -134,7 +135,7 @@ Usage of vtgate:
--querylog-format string format for query logs ("text" or "json") (default "text")
--querylog-row-threshold uint Number of rows a query has to return or affect before being logged; not useful for streaming queries. 0 means all queries will be logged.
--redact-debug-ui-queries redact full queries and bind variables from debug UI
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--retry-count int retry count (default 2)
--schema_change_signal Enable the schema tracker; requires queryserver-config-schema-change-signal to be enabled on the underlying vttablets for this to work (default true)
--schema_change_signal_user string User to be used to send down query to vttablet to retrieve schema changes
Expand Down
3 changes: 2 additions & 1 deletion go/flags/endtoend/vtgr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Usage of vtgr:
-h, --help display usage and exit
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
Expand All @@ -31,7 +32,7 @@ Usage of vtgr:
--pprof strings enable profiling
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--refresh_interval duration Refresh interval to load tablets. (default 10s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--scan_interval duration Scan interval to diagnose and repair. (default 3s)
--scan_repair_timeout duration Time to wait for a Diagnose and repair operation. (default 3s)
--security_policy string the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only)
Expand Down
6 changes: 3 additions & 3 deletions go/flags/endtoend/vtorc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@ Usage of vtorc:
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--lock-shard-timeout duration Duration for which a shard lock is held when running a recovery (default 30s)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
--log_rotate_max_size uint size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800)
--logtostderr log to standard error instead of files
--max-stack-size int configure the maximum stack size in bytes (default 67108864)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 10s)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
--port int port for the server
Expand All @@ -40,7 +40,7 @@ Usage of vtorc:
--reasonable-replication-lag duration Maximum replication lag on replicas which is deemed to be acceptable (default 10s)
--recovery-period-block-duration duration Duration for which a new recovery is blocked on an instance after running a recovery (default 30s)
--recovery-poll-duration duration Timer duration on which VTOrc polls its database to run a recovery (default 1s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--security_policy string the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only)
--shutdown_wait_time duration Maximum time to wait for VTOrc to release all the locks that it is holding before shutting down on SIGTERM (default 30s)
--snapshot-topology-interval duration Timer duration on which VTOrc takes a snapshot of the current MySQL information it has in the database. Should be in multiple of hours
Expand Down
Loading

0 comments on commit 46d63e9

Please sign in to comment.