Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c2c: prevent NPE in SHOW TENANT WITH REPLICATION STATUS #120434

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

msbutler
Copy link
Collaborator

If the user ran SHOW TENANT ... WITH REPLICATION STATUS before the stream ingestion job pts was set, a null pointer exception would occur and crash the node. This patch fixes this bug.

Epic: None

Release note: none

If the user ran SHOW TENANT ... WITH REPLICATION STATUS before the stream
ingestion job pts was set, a null pointer exception would occur and crash the
node. This patch fixes this bug.

Epic: None

Release note: none
@msbutler msbutler added T-disaster-recovery backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. labels Mar 13, 2024
@msbutler msbutler requested review from dt and stevendanna March 13, 2024 19:44
@msbutler msbutler self-assigned this Mar 13, 2024
Copy link

blathers-crl bot commented Mar 13, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Do we know what SHOW TENANT will show in this case? NULL?

@msbutler
Copy link
Collaborator Author

It should show the zero timestamp. This field isn't a pointer to it is safe to leave unset.

protectedTimestamp hlc.Timestamp

@msbutler
Copy link
Collaborator Author

TFTR!

bors r=stevendanna

@craig
Copy link
Contributor

craig bot commented Mar 15, 2024

@craig craig bot merged commit f2985c3 into cockroachdb:master Mar 15, 2024
19 checks passed
msbutler added a commit to msbutler/cockroach that referenced this pull request Mar 19, 2024
This patch adds a test for the fix in cockroachdb#120434

Epic: none

Release note: none
craig bot pushed a commit that referenced this pull request Mar 20, 2024
120428: pkg/util/log: enable bufferedSink usage with fileSink as experimental r=dhartunian,jbowens a=abarganier

Informs: #72452

Epic: CRDB-35401

For some time, two buffering mechanisms have existed within the log
package. One is controlled by `buffered-writes`, which is only used by
the file sink. `buffered-writes` buffers writes until a buffer is full,
after which it flushes the buffer to an underlying file. This is not an
asynchronous operation.

Separately, the `bufferedSink` was introduced to provide an async
buffering mechanism. This was originally intended for network log sinks
such as the `fluentSink` and `httpSink`. Originally, the `bufferedSink`
was not allowed for use with the `fileSink` as we already had a
buffering mechanism available.

Today, we find ourselves in a situation where we aim to make CRDB even
more resilient in the face of disk failures. However, some of these
efforts like WAL failover are unable to achieve their goals if the
logging subsystem isn't resilient to disk unavailability as well.

Today, if a file sink is configured to write to an unavailable disk, any
goroutine making a logging call to one of those sinks will hang as the
underlying i/o operations used are unable to interact with the disk.
This can have a severe negative impact on cluster performance and
throughput, even when using `buffered-writes: true`.

As a way to alleviate this pain, this patch experimentally enables the
usage of the `bufferedSink` with the `fileSink`. The `bufferedSink` is
used to "wrap" an underlying sink (such as a `fileSink`), and
asynchronously outputs buffered data to said sink. It is especially good
at shielding writers from any problems with the underlying sink, which
makes it a great candidate to help us accomplish our resiliency goals
when it comes to disk outage scenarios.

Users can leverage the `bufferedSink` by using the `buffering` config option. 
For example:

```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 5s
    flush-trigger-size: 1.0MiB
    max-buffer-size: 50MiB
```

Note that we don't allow both `buffered-writes` and `buffering` to be
used together. This patch adds a validation step to ensure this.

Release note (ops change): Users have the ability to update their log
configuration to enable asynchronous buffering of `file-group` log
sinks via the `buffering` [configuration
options](https://www.cockroachlabs.com/docs/stable/configure-logs#log-buffering-for-network-sinks).

The `buffering` configuration can be applied to `file-defaults` or
individual `file-groups` as needed.

Note however that `buffering` and `buffered-writes` are incompatible for
the time being. If you'd like to try the `buffering` option on a file
log sink, it's necessary to explicitly set `buffered-writes: false`. For
example:
```
file-defaults:
  buffered-writes: false
  buffering:
    max-staleness: 1s
    flush-trigger-size: 256KiB
    max-buffer-size: 50MiB
```
We recommend a `max-staleness: 1s` and `flush-trigger-size: 256KiB` when
using this feature with file log sinks.

120725: streamingccl: add show tenant without pts test r=stevendanna a=msbutler

This patch adds a test for the fix in #120434

Epic: none

Release note: none

Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
msbutler added a commit to msbutler/cockroach that referenced this pull request Mar 20, 2024
This patch adds a test for the fix in cockroachdb#120434

Epic: none

Release note: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. T-disaster-recovery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants