From ac39342d47062c1a9aec9b744b40ad7fd2de27b2 Mon Sep 17 00:00:00 2001 From: Luke Chen Date: Thu, 14 Sep 2023 20:21:00 +0800 Subject: [PATCH] KAFKA-15442: add a section in doc for tiered storage (#14382) Added 6.11: Tiered Storage section and notable changes ini v3.6.0 Reviewers: Satish Duggana , Gantigmaa Selenge --- docs/ops.html | 92 +++++++++++++++++++++++++++++++++++++++++++++++ docs/toc.html | 8 +++++ docs/upgrade.html | 5 +++ 3 files changed, 105 insertions(+) diff --git a/docs/ops.html b/docs/ops.html index 7f56c8567d04b..7c4b85aacd161 100644 --- a/docs/ops.html +++ b/docs/ops.html @@ -3859,6 +3859,98 @@

Finalizing the migration

# Other configs ... + +

6.11 Tiered Storage

+ +

Tiered Storage Overview

+ +

Kafka data is mostly consumed in a streaming fashion using tail reads. Tail reads leverage OS's page cache to serve the data instead of disk reads. + Older data is typically read from the disk for backfill or failure recovery purposes and is infrequent.

+ +

In the tiered storage approach, Kafka cluster is configured with two tiers of storage - local and remote. + The local tier is the same as the current Kafka that uses the local disks on the Kafka brokers to store the log segments. + The new remote tier uses external storage systems, such as HDFS or S3, to store the completed log segments. + Please check KIP-405 for more information. +

+ +

Note: Tiered storage is considered as an early access feature, and is not recommended for use in production environments

+ +

Configuration

+ +
Broker Configurations
+ +

By default, Kafka server will not enable tiered storage feature. remote.log.storage.system.enable + is the property to control whether to enable tiered storage functionality in a broker or not. Setting it to "true" enables this feature. +

+ +

RemoteStorageManager is an interface to provide the lifecycle of remote log segments and indexes. Kafka server + doesn't provide out-of-the-box implementation of RemoteStorageManager. Configuring remote.log.storage.manager.class.name + and remote.log.storage.manager.class.path to specify the implementation of RemoteStorageManager. +

+ +

RemoteLogMetadataManager is an interface to provide the lifecycle of metadata about remote log segments with strongly consistent semantics. + By default, Kafka provides an implementation with storage as an internal topic. This implementation can be changed by configuring + remote.log.metadata.manager.class.name and remote.log.metadata.manager.class.path. + When adopting the default kafka internal topic based implementation, remote.log.metadata.manager.listener.name + is a mandatory property to specify which listener the clients created by the default RemoteLogMetadataManager implementation. +

+ + +
Topic Configurations
+ +

After correctly configuring broker side configurations for tiered storage feature, there are still configurations in topic level needed to be set. + remote.storage.enable is the switch to determine if a topic wants to use tiered storage or not. By default it is set to false. + After enabling remote.storage.enable property, the next thing to consider is the log retention. + When tiered storage is enabled for a topic, there are 2 additional log retention configurations to set: + +

    +
  • local.retention.ms
  • +
  • retention.ms
  • +
  • local.retention.bytes
  • +
  • retention.bytes
  • +
+ + The configuration prefixed with local are to specify the time/size the "local" log file can accept before moving to remote storage, and then get deleted. + If unset, The value in retention.ms and retention.bytes will be used. +

+ +

Configurations Example

+ +

Here is a sample configuration to enable tiered storage feature in broker side: +

+# Sample Zookeeper/Kraft broker server.properties listening on PLAINTEXT://:9092
+remote.log.storage.system.enable=true
+# Please provide the implementation for remoteStorageManager. This is the mandatory configuration for tiered storage.
+# remote.log.storage.manager.class.name=org.apache.kafka.server.log.remote.storage.NoOpRemoteStorageManager
+# Using the "PLAINTEXT" listener for the clients in RemoteLogMetadataManager to talk to the brokers.
+remote.log.metadata.manager.listener.name=PLAINTEXT
+
+

+ +

After broker is started, creating a topic with tiered storage enabled, and a small log time retention value to try this feature: +

bin/kafka-topics.sh --create --topic tieredTopic --bootstrap-server localhost:9092 --config remote.storage.enable=true --config local.retention.ms=1000
+
+

+ +

Then, after the active segment is rolled, the old segment should be moved to the remote storage and get deleted. +

+ +

Limitations

+ +

While the early access release of Tiered Storage offers the opportunity to try out this new feature, it is important to be aware of the following limitations: +

    +
  • No support for clusters with multiple log directories (i.e. JBOD feature)
  • +
  • No support for compacted topics
  • +
  • Cannot disable tiered storage at the topic level
  • +
  • Deleting tiered storage enabled topics is required before disabling tiered storage at the broker level
  • +
  • Admin actions related to tiered storage feature are only supported on clients from version 3.0 onwards
  • +
+

+ +

For more information, please check Tiered Storage Early Access Release Note. +

+ +
diff --git a/docs/toc.html b/docs/toc.html index 88dd62c92dd4e..737ef887cd1eb 100644 --- a/docs/toc.html +++ b/docs/toc.html @@ -169,6 +169,14 @@
  • ZooKeeper to KRaft Migration
  • +
  • 6.11 Tiered Storage + +
  • 7. Security diff --git a/docs/upgrade.html b/docs/upgrade.html index 13e76b79cc359..ca86b1c839f13 100644 --- a/docs/upgrade.html +++ b/docs/upgrade.html @@ -50,6 +50,11 @@
    Notable changes in 3 replication.policy.internal.topic.separator.enabled property. If upgrading from 3.0.x or earlier, it may be necessary to set this property to false; see the property's documentation for more details.
  • +
  • Early access of tiered storage feature is available, and it is not recommended for use in production environments. + Welcome to test it and provide any feedback to us. + For more information about the early access tiered storage feature, please check KIP-405 and + Tiered Storage Early Access Release Note. +
  • Upgrading to 3.5.0 from any version 0.8.x through 3.4.x