-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] util/log,diagnostics: new TELEMETRY channel, send reports to logging #64218
Conversation
83c6ea4
to
18e7afc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the CRDB context give this a thumbs up or not but I like what I see! Thanks for this work @knz.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @thtruo and @vilamurugu)
See discussion in cockroachdb#63815: we would like to leverage the logging subsystem, in particular network logging, as a new way to channel diagnostic reports. This work will operate in multiple phases. This commit is only about the first phase: - creating a new logging channel TELEMETRY - ensuring the channel is not redirected to files by default. - copies the diagnostic report to the TELEMETRY channel, *in addition* of sending it to `register.cockroachdb.com` as previously. Later work will aim to further this project by subsuming the HTTP reporting to `register.cockroachdb.com` entirely under logging, e.g. via a HTTP sink (see issue cockroachdb#63980). Release note (cli change): CockroachDB now also sends telemetry diagnostics to the new logging channel TELEMETRY. By default, this channel is not connected to file output, so there is no observable difference in behavior for default deployments. To redirect the diagnostic output, one can define a new sink that captures this channel. For example, to see diagnostics reports on the standard error, one can use: `--log='sinks: {stderr: {channels: TELEMETRY, filter: INFO}}'`
18e7afc
to
e934980
Compare
cc @JuanLeon1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, Raphael! I like the inclusion of having diagnostic usage reports being sent as distinct logs through the TELEMETRY channel.
Reviewed 22 of 22 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @vilamurugu)
@knz One question came up in a conversation with @logston around the Serverless case. Will diagnostic usage reports be sent per tenant or per node? |
We want both, for different purposes:
You're preaching to the choir. That's exactly why we absolutely must use sampling for query logging. The other events of interest (feature counters, time to first query etc) are infrequent enough. |
cc @andy-kimball for awareness as this is one of the prerequisite pieces towards enabling the logging telemetry pipeline to support product analytics for CC Serverless |
superseded by #66427 |
66427: util/log,cli: new TELEMETRY channel r=maryliag a=knz Supersedes #64218 Informs #63815. See discussion in #63815: we would like to leverage the logging subsystem, in particular network logging, as a new way to channel diagnostic reports. This work will operate in multiple phases. This commit is only about the first phase: creating a new logging channel TELEMETRY. (see PRs #67809 and #67507 for example follow-ups) Release note (cli change): CockroachDB now supports a new logging channel called TELEMETRY. This will be used in later versions to report diagnostic events useful to Cockroach Labs for product analytics. (At the time of this writing, no events are defined for the TELEMETRY channel yet.) When no logging configuration is specified, this channel is connected to file output, with a maximum retention of 1MiB. To also produce the diagnostic output elsewhere, one can define a new sink that captures this channel. For example, to see diagnostics reports on the standard error, one can use: `--log='sinks: {stderr: {channels: TELEMETRY, filter: INFO}}'` When configuring file output, the operator should be careful to apply a separate maximum retention for the TELEMETRY channel from other file outputs, as telemetry data can be verbose and outcrowd other logging messages. For example: `--log='sinks: {file-groups: {telemetry: {channels: TELEMETRY, max-group-size: 1MB}, ...}}` 67913: roachtest: remove mention of 'roachstress' r=tbg a=knz The proposed 'roachstress' script linked from the repro steps at https://gist.github.com/tbg/fc4409e06b4ddc9abc58838cf5c0fe8b is utterly broken: it confuses local with remote execution. The correct steps must clarify that `roachprod` and `roachtest` binaries must be built for the local platform, and the `cockroach` and `workload` binaries must be built for the remote platform. Someone trying to run this on macos, or with a linux without the old glibc, gets inscrutable error messages with the current script. Additionally, the proposed roachstress script uses a `--cpu-quota` parameter that's incoherent with `roachtest`'s own default. Finally, I (@knz) directly witnessed a new hire who was utterly confused by the amount of automation and assumptions made in the roachtest script. (Should they install caffeinate? Why? How to customize the roachtest flags? How to change the number of runs? Can the test run locally on my computer?) In summary, the `roachstress` script is just too magical and actively hostile to newcomers. This patch removes it and replaces it by clearer instructiosn. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
Informs #63815.
See discussion in #63815: we would like to leverage the logging
subsystem, in particular network logging, as a new way
to channel diagnostic reports.
This work will operate in multiple phases. This commit is only
about the first phase:
in addition of sending it to
register.cockroachdb.com
as previously.
Later work will aim to further this project by subsuming
the HTTP reporting to
register.cockroachdb.com
entirelyunder logging, e.g. via a HTTP sink (see issue #63980).
Release note (cli change): CockroachDB now also sends
telemetry diagnostics to the new logging channel TELEMETRY.
By default, this channel is not connected to file output,
so there is no observable difference in behavior
for default deployments. To redirect the diagnostic
output, one can define a new sink that captures
this channel.
For example, to see diagnostics reports on the standard
error, one can use:
--log='sinks: {stderr: {channels: TELEMETRY, filter: INFO}}'