Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobsprofiler: add support for a job diagnostic bundle #105076

Closed
adityamaru opened this issue Jun 16, 2023 · 1 comment · Fixed by #107759
Closed

jobsprofiler: add support for a job diagnostic bundle #105076

adityamaru opened this issue Jun 16, 2023 · 1 comment · Fixed by #107759
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Jun 16, 2023

This issue tracks the work to be able to collect a diagnostic bundle for a running or finished job. This is similar to the support we have for collecting statement bundles. The job diagnostic bundle will contain information such as:

  • DistSQL diagram
  • Per component progress
  • Per component trace-driven metrics
  • Aggregate and per-node traces

This bundle will be download-able from the DBConsole and via the SQL shell.

Epic: CRDB-8964

Jira issue: CRDB-28850

@adityamaru adityamaru added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery T-jobs labels Jun 16, 2023
@adityamaru adityamaru self-assigned this Jun 16, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jun 16, 2023

cc @cockroachdb/disaster-recovery

@blathers-crl blathers-crl bot added the A-jobs label Jun 16, 2023
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jun 22, 2023
Similar to statement bundles this change introduces the
infrastructure to collect and read job profiler bundles.
Right now, a job profiler bundle will only contain the
latest DSP diagram for a job, but going forward this will
give us a place to dump raw files such as:
- cluster-wide job traces
- cpu profiles
- trace-driven aggregated stats
- raw payload and progress protos

Downloading this bundle will be exposed in a future patch in
all of the places where statement bundles are today:
- DBConsole
- CLI shell
- SQL shell

This change introduces a builtin that constructs
and writes the bundle for a job to the system.job_info
table. It also introduces a new endpoint on the status
server to read this constructed bundle. The next set of
PRs will add the necessary components to allow downloading
the bundle from the DBConsole.

Informs: cockroachdb#105076

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jun 24, 2023
This change adds a new component to the `Profiler` tab
of the job details page that supports collecting and viewing
job profiler bundles. The component has a button to collect
job profiler bundles. These bundles are then listed in a sorted
table with the ability to download each bundle.

The above operations are backed by the infrastructure added
in cockroachdb#105384.

Note, the `Profiler` tab is currently disabled for CC but this
change allows for a future project to enable the collection of
bundles through the CC console as well.

Informs: cockroachdb#105076
Release note (ui change): collect and download job profiler
bundles from the `Profiler` tab on the job details page.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jun 24, 2023
Similar to statement bundles this change introduces the
infrastructure to collect and read job profiler bundles.
Right now, a job profiler bundle will only contain the
latest DSP diagram for a job, but going forward this will
give us a place to dump raw files such as:
- cluster-wide job traces
- cpu profiles
- trace-driven aggregated stats
- raw payload and progress protos

Downloading this bundle will be exposed in a future patch in
all of the places where statement bundles are today:
- DBConsole
- CLI shell
- SQL shell

This change introduces a builtin that constructs
and writes the bundle for a job to the system.job_info
table. It also introduces a new endpoint on the status
server to read this constructed bundle. The next set of
PRs will add the necessary components to allow downloading
the bundle from the DBConsole.

Informs: cockroachdb#105076

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 7, 2023
Similar to statement bundles this change introduces the
infrastructure to request, collect and read the execution
details for a particular job.
Right now, the execution details  will only contain the
latest DSP diagram for a job, but going forward this will
give us a place to dump raw files such as:
- cluster-wide job traces
- cpu profiles
- trace-driven aggregated stats
- raw payload and progress protos

Downloading some or all of these execution details will be
exposed in a future patch in all of the places where
statement bundles are today:
- DBConsole
- CLI shell
- SQL shell

This change introduces a builtin that allows the caller
to request the collection and persistence of a job's
current execution details.

This change also introduces a new endpoint on the status
server to read the data corresponding to the execution details
persisted for a job. The next set of
PRs will add the necessary components to allow downloading
the files from the DBConsole.

Informs: cockroachdb#105076

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 11, 2023
Similar to statement bundles this change introduces the
infrastructure to request, collect and read the execution
details for a particular job.
Right now, the execution details  will only contain the
latest DSP diagram for a job, but going forward this will
give us a place to dump raw files such as:
- cluster-wide job traces
- cpu profiles
- trace-driven aggregated stats
- raw payload and progress protos

Downloading some or all of these execution details will be
exposed in a future patch in all of the places where
statement bundles are today:
- DBConsole
- CLI shell
- SQL shell

This change introduces a builtin that allows the caller
to request the collection and persistence of a job's
current execution details.

This change also introduces a new endpoint on the status
server to read the data corresponding to the execution details
persisted for a job. The next set of
PRs will add the necessary components to allow downloading
the files from the DBConsole.

Informs: cockroachdb#105076

Release note: None
craig bot pushed a commit that referenced this issue Jul 11, 2023
105384: jobsprofiler: enable requesting a job's execution details r=dt a=adityamaru

Similar to statement bundles this change introduces the
infrastructure to request, collect and read the execution
details for a particular job.
Right now, the execution details  will only contain the
latest DSP diagram for a job, but going forward this will
give us a place to dump raw files such as:
- cluster-wide job traces
- cpu profiles
- trace-driven aggregated stats
- raw payload and progress protos

Downloading some or all of these execution details will be
exposed in a future patch in all of the places where
statement bundles are today:
- DBConsole
- CLI shell
- SQL shell

This change introduces a builtin that allows the caller
to request the collection and persistence of a job's
current execution details.

This change also introduces a new endpoint on the status
server to read the data corresponding to the execution details
persisted for a job. The next set of
PRs will add the necessary components to allow downloading
the files from the DBConsole.

Informs: #105076

Release note: None

Co-authored-by: adityamaru <[email protected]>
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 11, 2023
In cockroachdb#105384
we added infrastructure to request and store execution details
for a job. This currently only includes the DistSQL diagram
generated during a job execution. Going forward this will
include several files such as traces, goroutines, profiles etc.

This change introduces an endpoint that allows listing all such
files that are available for consumption. This list will be displayed
on the job details page allowing the user to download any subset of
the files collected during job execution.

Informs: cockroachdb#105076
Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 12, 2023
This change collect cluster-wide goroutines that have
a pprof label tying it to the particular job's execution,
whose job execution details have been requested. This relies
on the support added to the pprofui server to collect cluster-wide,
labelled goroutines in cockroachdb#105916.

Informs: cockroachdb#105076
Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 17, 2023
In cockroachdb#105384 and cockroachdb#106629 we added support to collect
and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved observability into the state
of a job.

This change is the first of a few that exposes these
endpoints on the DBConsole job details page. This change
only adds support for listing files that have been
requested as part of a job's execution details.
A follow-up change will add support to request these files,
sort them and download them from the job details page.

This page is not available on the Cloud Console as it
is meant for advanced debugging.

This change also renames the `Profiler` tab to
`Advanced Debugging` as the users of this tab are
going to be internal CRDB support and engineering
for the time being.

Informs: cockroachdb#105076

Release note (ui change): add table in the Profiler job
details page that lists all the available files describing
a job's execution details
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 18, 2023
In cockroachdb#105384 and cockroachdb#106629 we added support to collect
and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved observability into the state
of a job.

This change is the first of a few that exposes these
endpoints on the DBConsole job details page. This change
only adds support for listing files that have been
requested as part of a job's execution details.
A follow-up change will add support to request these files,
sort them and download them from the job details page.

This page is not available on the Cloud Console as it
is meant for advanced debugging.

This change also renames the `Profiler` tab to
`Advanced Debugging` as the users of this tab are
going to be internal CRDB support and engineering
for the time being.

Informs: cockroachdb#105076

Release note (ui change): add table in the Profiler job
details page that lists all the available files describing
a job's execution details
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 19, 2023
In cockroachdb#105384 and cockroachdb#106629 we added support to collect
and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved observability into the state
of a job.

This change is the first of a few that exposes these
endpoints on the DBConsole job details page. This change
only adds support for listing files that have been
requested as part of a job's execution details.
A follow-up change will add support to request these files,
sort them and download them from the job details page.

This page is not available on the Cloud Console as it
is meant for advanced debugging.

This change also renames the `Profiler` tab to
`Advanced Debugging` as the users of this tab are
going to be internal CRDB support and engineering
for the time being.

Informs: cockroachdb#105076

Release note (ui change): add table in the Profiler job
details page that lists all the available files describing
a job's execution details
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 24, 2023
In cockroachdb#105384 and cockroachdb#106629 we added support to collect
and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved observability into the state
of a job.

This change is the first of a few that exposes these
endpoints on the DBConsole job details page. This change
only adds support for listing files that have been
requested as part of a job's execution details.
A follow-up change will add support to request these files,
sort them and download them from the job details page.

This page is not available on the Cloud Console as it
is meant for advanced debugging.

This change also renames the `Profiler` tab to
`Advanced Debugging` as the users of this tab are
going to be internal CRDB support and engineering
for the time being.

Informs: cockroachdb#105076

Release note (ui change): add table in the Profiler job
details page that lists all the available files describing
a job's execution details
craig bot pushed a commit that referenced this issue Jul 24, 2023
106879: jobs: add table to display execution details r=maryliag a=adityamaru

In #105384 and #106629 we added support to collect and list files that had been collected as part of
a job's execution details. These files are meant
to provide improved obersvability into the state
of a job.

This change is the first of a few that exposes these endpoints on the DBConsole job details page. This change only adds support for listing files that have been requested as part of a job's execution details.
A future change will add support to request these files, sort them and download them from the job details page.

This page is not available on the Cloud Console as it is meant for advanced debugging.

Informs: #105076

Release note (ui change): add table in the Profiler job details page that lists all the available files describing a job's execution details
<img width="1505" alt="Screenshot 2023-07-18 at 2 26 50 PM" src="https://github.com/cockroachdb/cockroach/assets/13837382/aebe18a6-9c25-4c9a-ad7c-a94e2e4c97ff">
<img width="1510" alt="Screenshot 2023-07-18 at 2 27 03 PM" src="https://github.com/cockroachdb/cockroach/assets/13837382/da9b3a21-8dc6-47ca-ac02-24d8bb7d09e7">



107236: sql: use txn.NewBatch instead of &kv.Batch{} r=fqazi a=rafiss

This will make these requests properly passes along the admission control headers.

informs #79212
Epic: None
Release note: None

107447: sql: fix CREATE MATERIALIZED VIEW AS schema change job description r=fqazi a=ecwall

Fixes #107445

This changes the CREATE MATERIALIZED VIEW AS schema change job description SQL syntax. For example
```
CREATE VIEW "v" AS "SELECT t.id FROM movr.public.t";
```
becomes
```
CREATE MATERIALIZED VIEW defaultdb.public.v AS SELECT t.id FROM defaultdb.public.t WITH DATA;
```

Release note (bug fix): Fix CREATE MATERIALIZED VIEW AS schema change job description SQL syntax.

Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 26, 2023
In cockroachdb#106879 we added a table to the `Advanced Debugging`
tab of the job details page. This table lists out all
the execution detail files that are available for the
given job.

This change is a follow up to add download functionality
to each row in the table. The format of the downloaded
file is determined by the prefix of the filename.

A final change to allow users to generate execution details
will be added in the next follow up.

Informs: cockroachdb#105076
Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 26, 2023
In cockroachdb#106879 we added a table to the `Advanced Debugging`
tab of the job details page. This table lists out all
the execution detail files that are available for the
given job.

This change is a follow up to add download functionality
to each row in the table. The format of the downloaded
file is determined by the prefix of the filename.

A final change to allow users to generate execution details
will be added in the next follow up.

Informs: cockroachdb#105076
Release note: None
craig bot pushed a commit that referenced this issue Jul 27, 2023
107198: jobsprofiler: stringify protobin files when requested r=dt a=adityamaru

This change is in preparation for a larger change that
will allow downloading debug files from the `Advanded Debugging`
tab on the job details page.

With this change a `binpb` file will have a `binpb.txt` version of the
file listed too. If the user requests to download
a `binpb.txt` file we unmarshal and stringify the contents
of the file before serving them to the user. Currently, there
is only one protobin file type written by a job resumer on
completion.

Informs: #105076
Release note: None

107700: netutil: fix a buglet r=erikgrinaker,stevendanna a=knz

I was noticing an excess number of conn objects remaining open after a test shutdown.

Release note: None
Epic: CRDB-28893

107711: backupccl: skip TestBackupRestoreTenant r=stevendanna a=adityamaru

Skip while we debug the timeouts in #107669.

Informs: #107669
Release note: None

Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 27, 2023
In cockroachdb#106879 we added a table to the `Advanced Debugging`
tab of the job details page. This table lists out all
the execution detail files that are available for the
given job.

This change is a follow up to add download functionality
to each row in the table. The format of the downloaded
file is determined by the prefix of the filename.

A final change to allow users to generate execution details
will be added in the next follow up.

Informs: cockroachdb#105076
Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jul 27, 2023
This is the last of the three PRs to add support
for requesting, viewing and downloading execution
details from the job details page.

This change wires up the logic needed to request
the execution details for a given job. The request
is powered by the crdb_internal.request_job_execution_details
builtin that triggers the collection of execution details.

Fixes: cockroachdb#105076
Release note: None
craig bot pushed a commit that referenced this issue Jul 28, 2023
107210: jobs: enable downloading execution detail files r=maryliag a=adityamaru

In #106879 we added a table to the `Advanced Debugging`
tab of the job details page. This table lists out all
the execution detail files that are available for the
given job.

This change is a follow up to add download functionality
to each row in the table. The format of the downloaded
file is determined by the prefix of the filename.

A final change to allow users to generate execution details
will be added in the next follow up.

Informs: #105076
Release note: None

107760: spanconfigccl: fix tests under multitenancy r=yuzefovich a=rafiss

fixes #106818
fixes #106821
Release note: None

Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
craig bot pushed a commit that referenced this issue Aug 2, 2023
107759: jobs: add button to request execution details r=maryliag a=adityamaru

This is the last of the three PRs to add support
for requesting, viewing and downloading execution
details from the job details page.

This change wires up the logic needed to request
the execution details for a given job. The request
is powered by the crdb_internal.request_job_execution_details
builtin that triggers the collection of execution details.

Fixes: #105076
Release note: None

107956: server: export distsender metrics from SQL pods r=knz a=nvanbenschoten

This commit exports the DistSender timeseries metrics from SQL pods.

```
distsender.batches
distsender.batches.partial
distsender.batch_requests.replica_addressed.bytes
distsender.batch_responses.replica_addressed.bytes
distsender.batch_requests.cross_region.bytes
distsender.batch_responses.cross_region.bytes
distsender.batch_requests.cross_zone.bytes
distsender.batch_responses.cross_zone.bytes
distsender.batches.async.sent
distsender.batches.async.throttled
distsender.rpc.sent
distsender.rpc.sent.local
distsender.rpc.sent.nextreplicaerror
distsender.errors.notleaseholder
distsender.errors.inleasetransferbackoffs
distsender.rangelookups
requests.slow.distsender
distsender.rpc.%s.sent # rpc name
distsender.rpc.err.%s  # error name
distsender.rangefeed.total_ranges
distsender.rangefeed.catchup_ranges
distsender.rangefeed.error_catchup_ranges
distsender.rangefeed.restart_ranges
distsender.rangefeed.restart_stuck
```

Epic: None
Release note: None

Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in 6ea4efb Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant