Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: util/log,server/debug: fixes to logging and the logspy api endpoint #66328

Merged
merged 8 commits into from
Jun 16, 2021

Conversation

knz
Copy link
Contributor

@knz knz commented Jun 10, 2021

Backport:

Please see individual PRs for details.

/cc @cockroachdb/release

@knz knz requested a review from tbg June 10, 2021 18:54
@knz knz requested a review from a team June 10, 2021 18:54
@knz knz requested a review from a team as a code owner June 10, 2021 18:54
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz force-pushed the backport21.1-66096-61378 branch 3 times, most recently from 01fe7d8 to 4688e49 Compare June 11, 2021 16:00
knz and others added 8 commits June 16, 2021 13:26
This simplifies the test and makes it more extensible.

Release note: None
…entries

Prior to this change, we did not have a public data structure able to
represent the various fields available in a `crdb-v2` log entry: the
existing `logpb.Entry` was not able to distinguish structured and
non-structured entries, and did not have the ability to delimit the
message and the stack trace.

This patch extends `logpb.Entry` to make it more able to represent
`crdb-v2` entries, at least for the purpose of extending `debug
merge-log` towards conversion between logging formats.

Additionally, the patch adds a *best effort* attempt at populating
the new fields in the `crdb-v1` entry parser. This is best effort
because the crdb-v1 parser is lossy so we cannot faithfully
parse its entries reliably.

Release note: None
Release note (cli change): When log entries are written to disk,
the first few header lines written at the start of every new file
now report the configured logging format.
Release note (bug fix): Previously, `--log='file-defaults: {format:
crdb-v1}'` was not handled properly. This has been fixed. This bug
existed since v21.1.0.
This increases troubleshootability.

Release note: None
Prior to this patch, the logspy mechanism was utterly broken: any time
it was running, it would cut off any and all log entries from going to
files, stderr, network sinks etc. This was a gross violation
of the guarantees we have constructed around structured logging,
as well as a security vulnerability (see release note below).

Additionally, it was impossible to launch multiple logspy sessions
simultaneously on the same node, for example using different grep
filters.

This commit rectifies the situation.

At a technical level, we have two changes: one in the logging Go API
and one in the `/debug/logspy` HTTP API.

**For the logging changes**, this patch replaces the "interceptor" singleton
callback function that takes over `(*loggerT).outputLogEntry()`, by a
new *interceptor sink* that lives alongside the other sinks on every
channel.

Because the interceptor logic is now a regular sink, log entries
continue to be delivered to other sinks while an interceptor is
active.

Reusing the sink abstraction, with its own sink configuration with no
severity filter, clarifies that the interceptor accepts all the log
entries regardless of which filters are configured on other sinks.

Additionally, the interceptor sink now supports multiple concurrent
interception sessions. Each log entry is delivered to all current
interceptors. The `logspy` logic is adapted to use this facility,
so that multiple `logspy` requests can run side-by-side.

**For the HTTP API change**, we are changing the `/debug/logspy`
semantics. This is explained in the release note below.

Release note (security update): All the logging output to files
or network sinks was previously disabled temporarily while an operator
was using the `/debug/logspy` HTTP API, resulting in lost entries
and a breach of auditability guarantees. This behavior has been corrected.

Release note (bug fix): Log entries are not lost any more while the
`/debug/logspy` HTTP API is being used. This bug had existed since
CockroachDB v1.1.

Release note (api change): The `/debug/logspy` HTTP API has changed.
The endpoint now returns JSON data by default.
This change is motivated as follows:

- the previous format, `crdb-v1`, cannot be parsed reliably.
- using JSON entries guarantees that the text of each entry
  all fits on a single line of output (newline characters
  inside the messages are escaped). This makes filtering
  easier and more reliable.
- using JSON enables the user to apply `jq` on the output, for
  example via `curl -s .../debug/logspy | jq ...`

If the previous format is desired, the user can pass the query
argument `&flatten=1` to the `logspy` URL to obtain the previous flat
text format (`crdb-v1`) instead.

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Prior to this patch, any ongoing `/debug/logspy` query would
trigger maximum verbosity in the logging package - i.e.
cause all logging API calls under `log.V` to be activated.

This was problematic, as it would cause a large amount
of logging traffic to be pumped through the interceptor logic,
increasing the chance for entries to be dropped (especially
when `logspy?grep=...` is not used).

Additionally, since the previous change to make the interceptor logic
a regular sink, all the entries captured by the interceptors are now
also duplicated to the other sinks (this is a feature / bug fix,
as explained in the previous commit message).

However, this change also meant that any ongoing `logspy` request
would cause maximum verbosity to all the sinks with threshold
INFO (most logging calls under `log.V` have severity INFO). For
example, the DEV channel accepts all entries at level INFO or higher
in the default config. This in turn could incur unacceptable disk
space or IOPS consumption in certain deployments.

In orde to mitigate this new problem, this patch removes the special
conditional from the logging package. From this point forward,
the verbosity of the entries delivered via `/debug/logspy` are those
configured via `vmodule` -- no more and no less.

To make this configurable for a running server, including one where
the SQL endpoint may not be available yet, this patch also introduces
a new `/debug/vmodule` HTTP API and extends `/debug/logspy` with
the ability to temporarily change `vmodule` *upon explicit request*.

Release note (api change): The `/debug/logspy` API does not any more
enable maximum logging verbosity automatically. To change the
verbosity, use the new `/debug/vmodule` endpoint or pass the
`&vmodule=` query parameter to the `/debug/logspy` endpoint.

For example, suppose you wish to run a 20s logspy session:

- Before:

  ```
  curl 'https://.../debug/logspy?duration=20s&...'
  ```

- Now:

  ```
  curl 'https://.../debug/logspy?duration=20s&vmodule=...'
  ```

  OR

  ```
  curl 'https://.../debug/vmodule?duration=22s&vmodule=...'
  curl 'https://.../debug/logspy?duration=20s'
  ```

As for the regular `vmodule` command-line flag, the maximum verbosity
across all the source code can be selected with the pattern `*=4`.

Note: at most one in-flight HTTP API request is allowed to modify the
`vmodule` parameter. This maintain the invariant that the
configuration restored at the end of each request is the same as when
the request started.

Release note (api change): The new `/debug/vmodule` API makes it
possible for an operator to configure the logging verbosity in a
similar way as the SQL built-in function
`crdb_internal.set_vmodule()`, or to query the current configuration
as in `crdb_internal.get_vmodule()`. Additionally, any configuration
change performed via this API can be automatically reverted after a
configurable delay. The API forms are:

- `/debug/vmodule` - retrieve the current configuration.
- `/debug/vmodule?set=[vmodule config]&duration=[duration]` - change
  the configuration to `[vmodule config]` . The previous configuration
  at the time the `/debug/vmodule` request started is restored after
  `[duration]`. This duration, if not specified, defaults to twice the
  default duration of a `logspy` request (currently, the `logspy`
  default duration is 5s, so the `vmodule` default duration is 10s).
  If the duration is zero or negative, the previous configuration
  is never restored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants