-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: admin server gives scheduled GC job "Retrying" status #95712
Comments
Seems like other filters are having the same issue: https://cockroachlabs.slack.com/archives/C0159JK877C/p1674533214747149 |
@kevin-v-ngo What should be the suggested fix here in terms of what status to show? Are we just removing the forced 'retry' status mentioned above altogether (i.e. just show the status from the jobs page without interpretation). |
I think it's odd that we aren't consistent with the internal table and SHOW command but I understand we have this behavior because we tried to be more specific into this "running" state. It seems like this is more of a sub-status of "running" that we're trying to convey but it caused more confusion because of this inconsistency. In the near term for this issue, let's be consistent and remove this retry status. Medium-longer term let's track adding in the UI that this 'running' state is retrying and then reflect this in the internal table as well. |
This commit removes the 'Retrying' status from the jobs UX. Previously, we were interpolating this status from the running status. This just added confusion and incorectness to the status of the job being displayed. The status being surfaced now aligns directly with what is shown in the `crdb_internal.jobs` table. Some missing job statuses were also added as request options to the 'Status' dropdown, including: - Pause Requested - Cancel Requested - Revert Failed Fixes: cockroachdb#95712 Release note (ui change): Retrying is no longer a status shown in the jobs page.
This commit removes the 'Retrying' status from the jobs UX. Previously, we were interpolating this status from the running status. This just added confusion and incorectness to the status of the job being displayed. The status being surfaced now aligns directly with what is shown in the `crdb_internal.jobs` table. Some missing job statuses were also added as request options to the 'Status' dropdown, including: - Pause Requested - Cancel Requested - Revert Failed Fixes: cockroachdb#95712 Release note (ui change): Retrying is no longer a status shown in the jobs page.
97465: c2c: gather perf metrics from prometheus r=stevendanna a=msbutler c2c roachtest performance metrics are now gathered by a prom/grafana instance running locally on the roachprod cluster. This change allows us to gather and process any metrics exposed to the crdb prom endpoint. Specifically, we now gather: `capacity_used`, `replication_logical_bytes`, `replication_sst_bytes` at various points during the c2c roachtest, allowing us to measure: - Initial Scan Throughput: initial scan size / initial scan duration - Workload Throughput: data ingested during workload / workload duration - Cutover Throughput: (data ingested between cutover time and cutover cmd) / (cutover process duration) where the size of these operations can be measured as either physical replicated bytes, logical ingested bytes, or physical ingested bytes on the source cluster. This patch also fixes a recent bug which mislabeled src cluster throughput as initial scan throughput. Informs #89176 Release note: None 97505: server, ui: remove interpreted jobs retrying status r=xinhaoz a=xinhaoz This commit removes the 'Retrying' status from the jobs UX. Previously, we were interpolating this status from the running status. This just added confusion and incorectness to the status of the job being displayed. The status being surfaced now aligns directly with what is shown in the `crdb_internal.jobs` table. Some missing job statuses were also added as request options to the 'Status' dropdown, including: - Pause Requested - Cancel Requested - Revert Failed Fixes: #95712 Release note (ui change): Retrying is no longer a status shown in the jobs page. <img width="1326" alt="image" src="https://user-images.githubusercontent.com/20136951/220738075-733b0cc8-9f77-4ace-a944-3791ff159c62.png"> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]>
The query behind the jobs endpoint of the admin server assigns a
retrying
status to all jobs that meet the following condition:status='running' AND next_run > now() AND num_runs > 1
or
status='reverting' AND next_run > now() AND num_runs > 1
We need to either change the logic defining the
retry*Condition
here, or, further down, not assign a status ofrunning
to jobs that are scheduled but have not yet started.Background thread: https://cockroachlabs.slack.com/archives/CUVEFUU3C/p1674507323054389
Jira issue: CRDB-23683
The text was updated successfully, but these errors were encountered: