Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: merge the implementation for file buffering and network buffering #72452

Open
knz opened this issue Nov 4, 2021 · 0 comments
Open

log: merge the implementation for file buffering and network buffering #72452

knz opened this issue Nov 4, 2021 · 0 comments
Labels
A-logging In and around the logging infrastructure. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability X-anchored-telemetry The issue number is anchored by telemetry references.

Comments

@knz
Copy link
Contributor

knz commented Nov 4, 2021

This is follow up work to #70330.

Currently file sinks support buffering via buffered-writes
and network sinks via buffering.
We need to merge the config mechanisms somehow.

Also we want to merge the implementations somehow.

Epic CRDB-6669

Jira issue: CRDB-11160

@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) X-anchored-telemetry The issue number is anchored by telemetry references. A-logging In and around the logging infrastructure. labels Nov 4, 2021
knz added a commit to rauchenstein/cockroach that referenced this issue Nov 4, 2021
knz added a commit to rauchenstein/cockroach that referenced this issue Nov 4, 2021
@exalate-issue-sync exalate-issue-sync bot added the T-server-and-security DB Server & Security label Nov 29, 2021
@thtruo thtruo added T-observability-inf and removed T-server-and-security DB Server & Security labels Nov 29, 2021
abarganier added a commit to abarganier/cockroach that referenced this issue Mar 13, 2024
Informs: cockroachdb#72452

For some time, two buffering mechanisms have existed within the log
package. One is controlled by `buffered-writes`, which is only used by
the file sink. `buffered-writes` buffers writes until a buffer is full,
after which it flushes the buffer to an underlying file. This is not an
asynchronous operation.

Separately, the `bufferedSink` was introduced to provide an async
buffering mechanism. This was originally intended for network log sinks
such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink`
was not allowed for use with the `fileSink` as we already had a
buffering mechanism available.

Today, we find ourselves in a situation where we aim to make CRDB even
more resilient in the face of disk failures. However, some of these
efforts like WAL failover are unable to achieve their goals if the
logging subsystem isn't resilient to disk unavailability as well.

Today, if a file sink is configured to write to an unavailable disk, any
goroutine making a logging call to one of those sinks will hang as the
underlying i/o operations used are unable to interact with the disk.
This can have a severe negative impact on cluster performance and
throughput, even when using `buffered-writes: true`.

As a way to alleviate this pain, this patch experimentally enables the
usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is
used to "wrap" an underlying sink (such as a `fileSink`), and
asynchronously outputs buffered data to said sink. It is especially good
at shielding writers from any problems with the underlying sink, which
makes it a great candidate to help us accomplish our resiliency goals
when it comes to disk outage scenarios.

This is gated by an environment variable during the experimental period,
`COCKROACH_ENABLE_ASYNC_LOG_FILE_BUFFER`.

With the env var enabled, users can leverage the `bufferedSink` by using
the `buffering` config option. For example:

```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```

Note that we don't allow both `buffered-writes` and `buffering` to be
used together. This patch adds a validation step to ensure this.

We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.

Release note (ops change): Users have the ability to update their log
configuration to enable asynchronous buffering of `file-group` log
sinks via the `buffering` [configuration
options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks).

Note that this is an *experimental* feature, and therefore it is gated
by the environment variable `COCKROACH_ENABLE_ASYNC_LOG_FILE_BUFFER`.
Set `COCKROACH_ENABLE_ASYNC_LOG_FILE_BUFFER=true` to enable the feature.

The `buffering` configuration can be applied to `file-defaults` or
individual `file-groups` as needed.

Note however that `buffering` and `buffered-writes` are incompatible for
the time being. If you'd like to try the `buffering` option on a file
log sink, it's necessary to explicitly set `buffered-writes: false`. For
example:
```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```
We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.
abarganier added a commit to abarganier/cockroach that referenced this issue Mar 18, 2024
Informs: cockroachdb#72452

For some time, two buffering mechanisms have existed within the log
package. One is controlled by `buffered-writes`, which is only used by
the file sink. `buffered-writes` buffers writes until a buffer is full,
after which it flushes the buffer to an underlying file. This is not an
asynchronous operation.

Separately, the `bufferedSink` was introduced to provide an async
buffering mechanism. This was originally intended for network log sinks
such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink`
was not allowed for use with the `fileSink` as we already had a
buffering mechanism available.

Today, we find ourselves in a situation where we aim to make CRDB even
more resilient in the face of disk failures. However, some of these
efforts like WAL failover are unable to achieve their goals if the
logging subsystem isn't resilient to disk unavailability as well.

Today, if a file sink is configured to write to an unavailable disk, any
goroutine making a logging call to one of those sinks will hang as the
underlying i/o operations used are unable to interact with the disk.
This can have a severe negative impact on cluster performance and
throughput, even when using `buffered-writes: true`.

As a way to alleviate this pain, this patch experimentally enables the
usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is
used to "wrap" an underlying sink (such as a `fileSink`), and
asynchronously outputs buffered data to said sink. It is especially good
at shielding writers from any problems with the underlying sink, which
makes it a great candidate to help us accomplish our resiliency goals
when it comes to disk outage scenarios.

Now, users can leverage the `bufferedSink` in `file-defaults` and
`file-groups  by using the `buffering` config option. For example:

```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```

Note that we don't allow both `buffered-writes` and `buffering` to be
used together. This patch adds a validation step to ensure this.

We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.

Release note (ops change): Users have the ability to update their log
configuration to enable asynchronous buffering of `file-group` log
sinks via the `buffering` [configuration
options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks).

The `buffering` configuration can be applied to `file-defaults` or
individual `file-groups` as needed.

Note however that `buffering` and `buffered-writes` are incompatible for
the time being. If you'd like to try the `buffering` option on a file
log sink, it's necessary to explicitly set `buffered-writes: false`. For
example:
```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```
We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.
craig bot pushed a commit that referenced this issue Mar 20, 2024
120428: pkg/util/log: enable bufferedSink usage with fileSink as experimental r=dhartunian,jbowens a=abarganier

Informs: #72452

Epic: CRDB-35401

For some time, two buffering mechanisms have existed within the log
package. One is controlled by `buffered-writes`, which is only used by
the file sink. `buffered-writes` buffers writes until a buffer is full,
after which it flushes the buffer to an underlying file. This is not an
asynchronous operation.

Separately, the `bufferedSink` was introduced to provide an async
buffering mechanism. This was originally intended for network log sinks
such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink`
was not allowed for use with the `fileSink` as we already had a
buffering mechanism available.

Today, we find ourselves in a situation where we aim to make CRDB even
more resilient in the face of disk failures. However, some of these
efforts like WAL failover are unable to achieve their goals if the
logging subsystem isn't resilient to disk unavailability as well.

Today, if a file sink is configured to write to an unavailable disk, any
goroutine making a logging call to one of those sinks will hang as the
underlying i/o operations used are unable to interact with the disk.
This can have a severe negative impact on cluster performance and
throughput, even when using `buffered-writes: true`.

As a way to alleviate this pain, this patch experimentally enables the
usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is
used to "wrap" an underlying sink (such as a `fileSink`), and
asynchronously outputs buffered data to said sink. It is especially good
at shielding writers from any problems with the underlying sink, which
makes it a great candidate to help us accomplish our resiliency goals
when it comes to disk outage scenarios.

Users can leverage the `bufferedSink` by using the `buffering` config option. 
For example:

```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 5s
    flush-trigger-size: 1.0MiB
    max-buffer-size: 50MiB
```

Note that we don't allow both `buffered-writes` and `buffering` to be
used together. This patch adds a validation step to ensure this.

Release note (ops change): Users have the ability to update their log
configuration to enable asynchronous buffering of `file-group` log
sinks via the `buffering` [configuration
options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks).

The `buffering` configuration can be applied to `file-defaults` or
individual `file-groups` as needed.

Note however that `buffering` and `buffered-writes` are incompatible for
the time being. If you'd like to try the `buffering` option on a file
log sink, it's necessary to explicitly set `buffered-writes: false`. For
example:
```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```
We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.

120725: streamingccl: add show tenant without pts test r=stevendanna a=msbutler

This patch adds a test for the fix in #120434

Epic: none

Release note: none

Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
ebembi-crdb added a commit to ebembi-crdb/cockroach that referenced this issue Sep 28, 2024
Informs: cockroachdb#72452

For some time, two buffering mechanisms have existed within the log
package. One is controlled by `buffered-writes`, which is only used by
the file sink. `buffered-writes` buffers writes until a buffer is full,
after which it flushes the buffer to an underlying file. This is not an
asynchronous operation.

Separately, the `bufferedSink` was introduced to provide an async
buffering mechanism. This was originally intended for network log sinks
such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink`
was not allowed for use with the `fileSink` as we already had a
buffering mechanism available.

Today, we find ourselves in a situation where we aim to make CRDB even
more resilient in the face of disk failures. However, some of these
efforts like WAL failover are unable to achieve their goals if the
logging subsystem isn't resilient to disk unavailability as well.

Today, if a file sink is configured to write to an unavailable disk, any
goroutine making a logging call to one of those sinks will hang as the
underlying i/o operations used are unable to interact with the disk.
This can have a severe negative impact on cluster performance and
throughput, even when using `buffered-writes: true`.

As a way to alleviate this pain, this patch experimentally enables the
usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is
used to "wrap" an underlying sink (such as a `fileSink`), and
asynchronously outputs buffered data to said sink. It is especially good
at shielding writers from any problems with the underlying sink, which
makes it a great candidate to help us accomplish our resiliency goals
when it comes to disk outage scenarios.

Now, users can leverage the `bufferedSink` in `file-defaults` and
`file-groups  by using the `buffering` config option. For example:

```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```

Note that we don't allow both `buffered-writes` and `buffering` to be
used together. This patch adds a validation step to ensure this.

We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.

Release note (ops change): Users have the ability to update their log
configuration to enable asynchronous buffering of `file-group` log
sinks via the `buffering` [configuration
options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks).

The `buffering` configuration can be applied to `file-defaults` or
individual `file-groups` as needed.

Note however that `buffering` and `buffered-writes` are incompatible for
the time being. If you'd like to try the `buffering` option on a file
log sink, it's necessary to explicitly set `buffered-writes: false`. For
example:
```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```
We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.
ebembi-crdb added a commit to ebembi-crdb/cockroach that referenced this issue Sep 29, 2024
Informs: cockroachdb#72452

For some time, two buffering mechanisms have existed within the log
package. One is controlled by `buffered-writes`, which is only used by
the file sink. `buffered-writes` buffers writes until a buffer is full,
after which it flushes the buffer to an underlying file. This is not an
asynchronous operation.

Separately, the `bufferedSink` was introduced to provide an async
buffering mechanism. This was originally intended for network log sinks
such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink`
was not allowed for use with the `fileSink` as we already had a
buffering mechanism available.

Today, we find ourselves in a situation where we aim to make CRDB even
more resilient in the face of disk failures. However, some of these
efforts like WAL failover are unable to achieve their goals if the
logging subsystem isn't resilient to disk unavailability as well.

Today, if a file sink is configured to write to an unavailable disk, any
goroutine making a logging call to one of those sinks will hang as the
underlying i/o operations used are unable to interact with the disk.
This can have a severe negative impact on cluster performance and
throughput, even when using `buffered-writes: true`.

As a way to alleviate this pain, this patch experimentally enables the
usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is
used to "wrap" an underlying sink (such as a `fileSink`), and
asynchronously outputs buffered data to said sink. It is especially good
at shielding writers from any problems with the underlying sink, which
makes it a great candidate to help us accomplish our resiliency goals
when it comes to disk outage scenarios.

Now, users can leverage the `bufferedSink` in `file-defaults` and
`file-groups  by using the `buffering` config option. For example:

```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```

Note that we don't allow both `buffered-writes` and `buffering` to be
used together. This patch adds a validation step to ensure this.

We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.

Release note (ops change): Users have the ability to update their log
configuration to enable asynchronous buffering of `file-group` log
sinks via the `buffering` [configuration
options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks).

The `buffering` configuration can be applied to `file-defaults` or
individual `file-groups` as needed.

Note however that `buffering` and `buffered-writes` are incompatible for
the time being. If you'd like to try the `buffering` option on a file
log sink, it's necessary to explicitly set `buffered-writes: false`. For
example:
```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```
We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-logging In and around the logging infrastructure. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
None yet
Development

No branches or pull requests

2 participants