log: merge the implementation for file buffering and network buffering #72452

knz · 2021-11-04T20:33:46Z

This is follow up work to #70330.

Currently file sinks support buffering via buffered-writes
and network sinks via buffering.
We need to merge the config mechanisms somehow.

Also we want to merge the implementations somehow.

Epic CRDB-6669

Jira issue: CRDB-11160

The text was updated successfully, but these errors were encountered:

Release note: None

Informs: cockroachdb#72452 For some time, two buffering mechanisms have existed within the log package. One is controlled by `buffered-writes`, which is only used by the file sink. `buffered-writes` buffers writes until a buffer is full, after which it flushes the buffer to an underlying file. This is not an asynchronous operation. Separately, the `bufferedSink` was introduced to provide an async buffering mechanism. This was originally intended for network log sinks such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink` was not allowed for use with the `fileSink` as we already had a buffering mechanism available. Today, we find ourselves in a situation where we aim to make CRDB even more resilient in the face of disk failures. However, some of these efforts like WAL failover are unable to achieve their goals if the logging subsystem isn't resilient to disk unavailability as well. Today, if a file sink is configured to write to an unavailable disk, any goroutine making a logging call to one of those sinks will hang as the underlying i/o operations used are unable to interact with the disk. This can have a severe negative impact on cluster performance and throughput, even when using `buffered-writes: true`. As a way to alleviate this pain, this patch experimentally enables the usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is used to "wrap" an underlying sink (such as a `fileSink`), and asynchronously outputs buffered data to said sink. It is especially good at shielding writers from any problems with the underlying sink, which makes it a great candidate to help us accomplish our resiliency goals when it comes to disk outage scenarios. This is gated by an environment variable during the experimental period, `COCKROACH_ENABLE_ASYNC_LOG_FILE_BUFFER`. With the env var enabled, users can leverage the `bufferedSink` by using the `buffering` config option. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` Note that we don't allow both `buffered-writes` and `buffering` to be used together. This patch adds a validation step to ensure this. We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks. Release note (ops change): Users have the ability to update their log configuration to enable asynchronous buffering of `file-group` log sinks via the `buffering` [configuration options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks). Note that this is an *experimental* feature, and therefore it is gated by the environment variable `COCKROACH_ENABLE_ASYNC_LOG_FILE_BUFFER`. Set `COCKROACH_ENABLE_ASYNC_LOG_FILE_BUFFER=true` to enable the feature. The `buffering` configuration can be applied to `file-defaults` or individual `file-groups` as needed. Note however that `buffering` and `buffered-writes` are incompatible for the time being. If you'd like to try the `buffering` option on a file log sink, it's necessary to explicitly set `buffered-writes: false`. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks.

Informs: cockroachdb#72452 For some time, two buffering mechanisms have existed within the log package. One is controlled by `buffered-writes`, which is only used by the file sink. `buffered-writes` buffers writes until a buffer is full, after which it flushes the buffer to an underlying file. This is not an asynchronous operation. Separately, the `bufferedSink` was introduced to provide an async buffering mechanism. This was originally intended for network log sinks such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink` was not allowed for use with the `fileSink` as we already had a buffering mechanism available. Today, we find ourselves in a situation where we aim to make CRDB even more resilient in the face of disk failures. However, some of these efforts like WAL failover are unable to achieve their goals if the logging subsystem isn't resilient to disk unavailability as well. Today, if a file sink is configured to write to an unavailable disk, any goroutine making a logging call to one of those sinks will hang as the underlying i/o operations used are unable to interact with the disk. This can have a severe negative impact on cluster performance and throughput, even when using `buffered-writes: true`. As a way to alleviate this pain, this patch experimentally enables the usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is used to "wrap" an underlying sink (such as a `fileSink`), and asynchronously outputs buffered data to said sink. It is especially good at shielding writers from any problems with the underlying sink, which makes it a great candidate to help us accomplish our resiliency goals when it comes to disk outage scenarios. Now, users can leverage the `bufferedSink` in `file-defaults` and `file-groups by using the `buffering` config option. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` Note that we don't allow both `buffered-writes` and `buffering` to be used together. This patch adds a validation step to ensure this. We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks. Release note (ops change): Users have the ability to update their log configuration to enable asynchronous buffering of `file-group` log sinks via the `buffering` [configuration options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks). The `buffering` configuration can be applied to `file-defaults` or individual `file-groups` as needed. Note however that `buffering` and `buffered-writes` are incompatible for the time being. If you'd like to try the `buffering` option on a file log sink, it's necessary to explicitly set `buffered-writes: false`. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks.

120428: pkg/util/log: enable bufferedSink usage with fileSink as experimental r=dhartunian,jbowens a=abarganier Informs: #72452 Epic: CRDB-35401 For some time, two buffering mechanisms have existed within the log package. One is controlled by `buffered-writes`, which is only used by the file sink. `buffered-writes` buffers writes until a buffer is full, after which it flushes the buffer to an underlying file. This is not an asynchronous operation. Separately, the `bufferedSink` was introduced to provide an async buffering mechanism. This was originally intended for network log sinks such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink` was not allowed for use with the `fileSink` as we already had a buffering mechanism available. Today, we find ourselves in a situation where we aim to make CRDB even more resilient in the face of disk failures. However, some of these efforts like WAL failover are unable to achieve their goals if the logging subsystem isn't resilient to disk unavailability as well. Today, if a file sink is configured to write to an unavailable disk, any goroutine making a logging call to one of those sinks will hang as the underlying i/o operations used are unable to interact with the disk. This can have a severe negative impact on cluster performance and throughput, even when using `buffered-writes: true`. As a way to alleviate this pain, this patch experimentally enables the usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is used to "wrap" an underlying sink (such as a `fileSink`), and asynchronously outputs buffered data to said sink. It is especially good at shielding writers from any problems with the underlying sink, which makes it a great candidate to help us accomplish our resiliency goals when it comes to disk outage scenarios. Users can leverage the `bufferedSink` by using the `buffering` config option. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 5s flush-trigger-size: 1.0MiB max-buffer-size: 50MiB ``` Note that we don't allow both `buffered-writes` and `buffering` to be used together. This patch adds a validation step to ensure this. Release note (ops change): Users have the ability to update their log configuration to enable asynchronous buffering of `file-group` log sinks via the `buffering` [configuration options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks). The `buffering` configuration can be applied to `file-defaults` or individual `file-groups` as needed. Note however that `buffering` and `buffered-writes` are incompatible for the time being. If you'd like to try the `buffering` option on a file log sink, it's necessary to explicitly set `buffered-writes: false`. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks. 120725: streamingccl: add show tenant without pts test r=stevendanna a=msbutler This patch adds a test for the fix in #120434 Epic: none Release note: none Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Michael Butler <[email protected]>

Informs: cockroachdb#72452 For some time, two buffering mechanisms have existed within the log package. One is controlled by `buffered-writes`, which is only used by the file sink. `buffered-writes` buffers writes until a buffer is full, after which it flushes the buffer to an underlying file. This is not an asynchronous operation. Separately, the `bufferedSink` was introduced to provide an async buffering mechanism. This was originally intended for network log sinks such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink` was not allowed for use with the `fileSink` as we already had a buffering mechanism available. Today, we find ourselves in a situation where we aim to make CRDB even more resilient in the face of disk failures. However, some of these efforts like WAL failover are unable to achieve their goals if the logging subsystem isn't resilient to disk unavailability as well. Today, if a file sink is configured to write to an unavailable disk, any goroutine making a logging call to one of those sinks will hang as the underlying i/o operations used are unable to interact with the disk. This can have a severe negative impact on cluster performance and throughput, even when using `buffered-writes: true`. As a way to alleviate this pain, this patch experimentally enables the usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is used to "wrap" an underlying sink (such as a `fileSink`), and asynchronously outputs buffered data to said sink. It is especially good at shielding writers from any problems with the underlying sink, which makes it a great candidate to help us accomplish our resiliency goals when it comes to disk outage scenarios. Now, users can leverage the `bufferedSink` in `file-defaults` and `file-groups by using the `buffering` config option. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` Note that we don't allow both `buffered-writes` and `buffering` to be used together. This patch adds a validation step to ensure this. We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks. Release note (ops change): Users have the ability to update their log configuration to enable asynchronous buffering of `file-group` log sinks via the `buffering` [configuration options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks). The `buffering` configuration can be applied to `file-defaults` or individual `file-groups` as needed. Note however that `buffering` and `buffered-writes` are incompatible for the time being. If you'd like to try the `buffering` option on a file log sink, it's necessary to explicitly set `buffered-writes: false`. For example: ``` file-defaults: buffered-writes: false buffering: max-staleness: 1s flush-trigger-size: 256KiB max-buffer-size: 50MiB ``` We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when using this feature with file log sinks.

knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) X-anchored-telemetry The issue number is anchored by telemetry references. A-logging In and around the logging infrastructure. labels Nov 4, 2021

knz added a commit to rauchenstein/cockroach that referenced this issue Nov 4, 2021

logconfig: link issue cockroachdb#72452 in the error message

8b94b76

Release note: None

knz mentioned this issue Nov 4, 2021

util/log: add buffer sink decorator #70330

Merged

knz added a commit to rauchenstein/cockroach that referenced this issue Nov 4, 2021

logconfig: link issue cockroachdb#72452 in the error message

27dd89a

Release note: None

exalate-issue-sync bot added the T-server-and-security DB Server & Security label Nov 29, 2021

thtruo added T-observability-inf and removed T-server-and-security DB Server & Security labels Nov 29, 2021

andf-crl mentioned this issue Oct 19, 2022

DOC-1132 log sink buffering cockroachdb/docs#15311

Merged

abarganier mentioned this issue Mar 13, 2024

pkg/util/log: enable bufferedSink usage with fileSink as experimental #120428

Merged

exalate-issue-sync bot added T-observability and removed T-observability-inf labels Mar 21, 2024

ebembi-crdb mentioned this issue Sep 29, 2024

pkg/util/log: enable bufferedSink usage with fileSink as experimental ebembi-crdb/cockroach#7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log: merge the implementation for file buffering and network buffering #72452

log: merge the implementation for file buffering and network buffering #72452

knz commented Nov 4, 2021 •

edited by cockroach-jira-scripts

Loading

log: merge the implementation for file buffering and network buffering #72452

log: merge the implementation for file buffering and network buffering #72452

Comments

knz commented Nov 4, 2021 • edited by cockroach-jira-scripts Loading

knz commented Nov 4, 2021 •

edited by cockroach-jira-scripts

Loading