Skip to content

Commit

Permalink
Add known issues to Raft WAL docs. (#16600)
Browse files Browse the repository at this point in the history
* Add known issues to Raft WAL docs.

* Refactor update based on review feedback
  • Loading branch information
banks authored Mar 15, 2023
1 parent ad25ba3 commit e557fb4
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 7 deletions.
15 changes: 12 additions & 3 deletions website/content/docs/agent/wal-logstore/enable.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: >-

# Enable the experimental WAL LogStore backend

This topic describes how to safely configure and test the WAL backend in your Consul deployment.
This topic describes how to safely configure and test the WAL backend in your Consul deployment.

The overall process for enabling the WAL LogStore backend for one server consists of the following steps. In production environments, we recommend starting by enabling the backend on a single server . If you eventually choose to expand the test to further servers, you must repeat these steps for each one.

Expand All @@ -17,9 +17,9 @@ The overall process for enabling the WAL LogStore backend for one server consist
1. Remove data directory from target server.
1. Update target server's configuration.
1. Start the target server.
1. Monitor target server raft metrics and logs.
1. Monitor target server raft metrics and logs.

!> **Experimental feature:** The WAL LogStore backend is experimental.
!> **Experimental feature:** The WAL LogStore backend is experimental and may contain bugs that could cause data loss. Follow this guide to manage risk during testing.

## Requirements

Expand All @@ -32,6 +32,15 @@ We recommend taking the following additional measures:
- Monitor Consul server metrics and logs, and set an alert on specific log events that occur when WAL is enabled. Refer to [Monitor Raft metrics and logs for WAL](/consul/docs/agent/wal-logstore/monitoring) for more information.
- Enable WAL in a pre-production environment and run it for a several days before enabling it in production.

## Known issues

The following issues were discovered after release of Consul 1.15.1 and will be
fixed in a future patch release.

* A follower that is disconnected may be unable to catch up if it is using the WAL backend.
* Restoring user snapshots can break replication to WAL-enabled followers.
* Restoring user snapshots can cause a WAL-enabled leader to panic.

## Risks

While their likelihood remains low to very low, be aware of the following risks before implementing the WAL backend:
Expand Down
13 changes: 9 additions & 4 deletions website/content/docs/agent/wal-logstore/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,22 @@ description: >-

# Experimental WAL LogStore backend overview

This topic provides an overview of the experimental WAL (write-ahead log) LogStore backend.
This topic provides an overview of the WAL (write-ahead log) LogStore backend.
The WAL backend is an experimental feature. Refer to
[Requirements](/consul/docs/agent/wal-logstore/enable#requirements) for
supported environments and known issues.

!> **Experimental feature:** The WAL LogStore backend is experimental.
We do not recommend enabling the WAL backend in production without following
[our guide for safe
testing](/consul/docs/agent/wal-logstore/enable).

## WAL versus BoltDB

WAL implements a traditional log with rotating, append-only log files. WAL resolves many issues with the existing `LogStore` provided by the BoltDB backend. The BoltDB `LogStore` is a copy-on-write BTree, which is not optimized for append-only, write-heavy workloads.
WAL implements a traditional log with rotating, append-only log files. WAL resolves many issues with the existing `LogStore` provided by the BoltDB backend. The BoltDB `LogStore` is a copy-on-write BTree, which is not optimized for append-only, write-heavy workloads.

### BoltDB storage scalability issues

The existing BoltDB log store inefficiently stores append-only logs to disk because it was designed as a full key-value database. It is a single file that only ever grows. Deleting the oldest logs, which Consul does regularly when it makes new snapshots of the state, leaves free space in the file. The free space must be tracked in a `freelist` so that BoltDB can reuse it on future writes. By contrast, a simple segmented log can delete the oldest log files from disk.
The existing BoltDB log store inefficiently stores append-only logs to disk because it was designed as a full key-value database. It is a single file that only ever grows. Deleting the oldest logs, which Consul does regularly when it makes new snapshots of the state, leaves free space in the file. The free space must be tracked in a `freelist` so that BoltDB can reuse it on future writes. By contrast, a simple segmented log can delete the oldest log files from disk.

A burst of writes at double or triple the normal volume can suddenly cause the log file to grow to several times its steady-state size. After Consul takes the next snapshot and truncates the oldest logs, the resulting file is mostly empty space.

Expand Down

0 comments on commit e557fb4

Please sign in to comment.