-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv,server: do more verbose logging of graceful drain range lease transfer, once the majority of ranges leases are transferred over #65659
Comments
NB: the logging you're requesting is already available via the vmodule setting. |
That is my suggestion but ONLY once we get down to N=10 (not sure what N should be) or so range leases still held on the node. Do you think that make sense? I think there is no reason to not be very verbose once you are down to a small number of ranges. What do you think about that? |
I'm not opposed to it. |
Ack. I think how useful it would be depends on the scope of the issue we see on CC. I can get more info about scope without this change. I can post here when I have a sense of scope. |
@tbg has suggested that we can use a verbose condition, since draining could stall/encounter errors at a point where the number of remaining leases is greater than N. We can set verbose to true when the number of remaining leases does not decrease (or worse increases) with graceful drain calls. |
68488: server: add verbose logging when graceful drain range lease transfer stalls r=knz a=cameronnunez Informs #65659. Debugging graceful drain is challenging due to limited information. It is not clear what is going wrong from the logs. This patch adds more detailed logging when graceful drain range lease transfer encounters issues. Specifically, range lease transfer attempt information (when an attempt occurs, duration blocked on transfer attempt, and the lease that is being transferred) is logged. The reason behind the failure of the transfer attempt may not be logged on the draining node. This is because this information lies on the node that serves the draining request. Currently, we do not have logging that aggregates all of this information in one place. See #74158. Release note (cli change): If graceful drain range lease transfer encounters issues, verbose logging is automatically set to help with troubleshooting. 75271: sql,server: create toString to role option r=maryliag a=maryliag Previously, the code was using roleOption.String() to create a string to be used on queries. This was causing an issue with roles that contained spaces within it, e.g. role option `VALIDUNTIL` was being translate to `VALIDUNTIL` instead of the correct way `VALID UNTIL`. This commit updates the String() function to correctly add the space on the `VALID UNTIL` case. Release note (bug fix): Update the String() function of roleOption to add a space on the role `VALID UNTIL`. 75293: backupccl: reuse allocated tenant prefix in rekey r=dt a=dt Release note: none. 75303: migrations: make sure public role gets privs on public schema r=ajwerner a=rafiss This matches up with how the public schema gets created in a non-migration scenario. Release note: None Co-authored-by: Cameron Nunez <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
We have marked this issue as stale because it has been inactive for |
Is your feature request related to a problem? Please describe.
Here are some example logs:
Above logs show 1m passing without graceful drain finishing (without all range lease transfers moving over), leading to some impact during update.
More detailed logs might help with debugging. It's not clear what is going wrong from the above logs.
Describe the solution you'd like
I expect it would help debugging if we logged more verbose info once we got down to N=10 or so range leases still held on the node. We could log:
Describe alternatives you've considered
We could do nothing, relying on operators using vmodule to get the additional verbosity when needed.
Additional context
On CC, graceful drain doesn't always finish before SIGKILL time. We want to fix this.
Epic: CRDB-8501
Jira issue: CRDB-7706
The text was updated successfully, but these errors were encountered: