Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support throttling vstreamer copy table work on source tablets #9923

Merged
merged 17 commits into from
Apr 4, 2022

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Mar 18, 2022

Description

This provides mechanisms to limit the impact of large vreplication copy table (row streamer) workflow phases on production source tablets.

It supports throttling based on the InnoDB history list length and the MySQL replication lag seen (if the source is a replica). This gives vreplication source tablets an opportunity to catch up in between copy table phase cycles (copy, catchup, fastforward). You can control this using two new vttablet flags:

Related Issue(s)

Checklist

@mattlord mattlord changed the title Support throttling vstream work on source tablets Support throttling vstream copy table work on source tablets Mar 18, 2022
@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication release notes labels Mar 18, 2022
@mattlord mattlord force-pushed the copy_table_cost branch 10 times, most recently from 5d4a46a to b0e7f4c Compare March 21, 2022 05:38
@mattlord mattlord changed the title Support throttling vstream copy table work on source tablets Support throttling vstreamer copy table work on source tablets Mar 21, 2022
@mattlord mattlord force-pushed the copy_table_cost branch 5 times, most recently from f924c86 to ca8468c Compare March 24, 2022 19:46
@mattlord mattlord force-pushed the copy_table_cost branch 9 times, most recently from 5c27419 to 4bbabfb Compare March 25, 2022 02:11
And make it a gauge so that it's always showing the current value
as we allow this to be changed in the running process via /debug/env.

Signed-off-by: Matt Lord <[email protected]>
mattlord added a commit to vitessio/website that referenced this pull request Mar 31, 2022
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rohit-nayak-ps rohit-nayak-ps merged commit 3d4f400 into vitessio:main Apr 4, 2022
@rohit-nayak-ps rohit-nayak-ps deleted the copy_table_cost branch April 4, 2022 18:51
mattlord added a commit to vitessio/website that referenced this pull request Apr 5, 2022
* Document work in vitessio/vitess#9923

Signed-off-by: Matt Lord <[email protected]>

* Move to two dashes for long flag names

Signed-off-by: Matt Lord <[email protected]>

* Add detailed flag info and related info

Signed-off-by: Matt Lord <[email protected]>
@rohit-nayak-ps
Copy link
Contributor

Starting this discussion here to follow up on @deepthi's DM observation that these flag names are too long. @mattlord and I did note that, but we thought more descriptive the better since they would anyway reside in scripts or config files. However as @deepthi points out this may contribute to hitting OS command line length limits and also shorter the better, for recall purposes.

@mattlord, thoughts?

To start off suggestions!:
vreplication_copy_phase_max_innodb_history_list_length => vrep_max_hist_len
vreplication_copy_phase_max_mysql_replication_lag => vrep_max_mysql_lag

@deepthi
Copy link
Member

deepthi commented Apr 6, 2022

I'm not opposed to keeping the full name of the innodb variable in the flag for clarity. At a minimum, we can drop phase from the flag names, and mysql from the second one. Beyond that, I leave it up to you two.
I went and looked at other flags and we don't seem to be following any sort of standard for naming flags which are durations. Sometimes we suffix them with seconds if they are int but expected to be seconds, sometimes we don't. So even though other replication_lag flags are durations, I don't feel like we can impose that as a standard at this point.

@mattlord
Copy link
Contributor Author

mattlord commented Apr 6, 2022

IMO related flags should all have the same prefix. I would have used vrepl but we already had a number of flags with the vreplication prefix and beyond that vreplication_copy_phase so I used that:

$ vttablet --help 2>&1 | grep -E "vrepl|vstream" 
  --vreplication_copy_phase_duration duration
  --vreplication_copy_phase_max_innodb_history_list_length int
	The maximum InnoDB transaction history that can exist on a vstreamer (source) before starting another round of copying rows. This helps to limit the impact on the source tablet. (default 1000000)
  --vreplication_copy_phase_max_mysql_replication_lag int
	The maximum MySQL replication lag (in seconds) that can exist on a vstreamer (source) before starting another round of copying rows. This helps to limit the impact on the source tablet. (default 43200)
  --vreplication_experimental_flags int
	(Bitmask) of experimental features in vreplication to enable (default 1)
  --vreplication_healthcheck_retry_delay duration
  --vreplication_healthcheck_timeout duration
  --vreplication_healthcheck_topology_refresh duration
  --vreplication_heartbeat_update_interval int
	Frequency (in seconds, default 1, max 60) at which the time_updated column of a vreplication stream when idling (default 1)
  --vreplication_replica_lag_tolerance duration
  --vreplication_retry_delay duration
  --vreplication_store_compressed_gtid
	Store compressed gtids in the pos column of _vt.vreplication
  --vreplication_tablet_type string
  --vstream_dynamic_packet_size
  --vstream_packet_size int

The mysql is important in the second var name IMO as we also have vreplication lag in this context and that's not what this flag is about.

I don't really agree that shorter is easier to remember, but I don't feel too strongly here.

I'm not sure that OS limits are a major concern, but it's possible (1 MiB on macOS and 2MiB on Linux):

$ sw_vers && echo -n "Max cmd-line value in bytes: " && getconf ARG_MAX
ProductName:	macOS
ProductVersion:	12.3.1
BuildVersion:	21E258
Max cmd-line value in bytes: 1048576


$ cat /etc/system-release && echo -n "Max cmd-line value in bytes: " && getconf ARG_MAX
Amazon Linux release 2 (Karoo)
Max cmd-line value in bytes: 2621440

Please note that the entire --help output — which includes the help text — is currently less than 60KiB:

$ vttablet --help 2>&1 | wc -c
   53842

IMO what we really need is a project/ticket to define some Vitess policies for command-line options/flags and then apply them to all flags across all of vitess. Want me to create an issue for that? I don't mind being the one to work on it either — the lack of a policy is the real issue IMO (don't feel too strongly on the specifics, but the lack of uniformity and predictability is bad for users).

@deepthi
Copy link
Member

deepthi commented Apr 6, 2022

Let's create an issue/task saying "we need to define a policy", and once there is a concrete proposal it can become an RFC. The current state is not good. We also have some flags with dashes and some with underscores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants