Skip to content

Commit

Permalink
KAFKA-15442: add a section in doc for tiered storage (apache#14382)
Browse files Browse the repository at this point in the history
Added 6.11: Tiered Storage section and notable changes ini v3.6.0

Reviewers: Satish Duggana <[email protected]>, Gantigmaa Selenge <[email protected]>
  • Loading branch information
showuon authored Sep 14, 2023
1 parent 2a41beb commit ac39342
Show file tree
Hide file tree
Showing 3 changed files with 105 additions and 0 deletions.
92 changes: 92 additions & 0 deletions docs/ops.html
Original file line number Diff line number Diff line change
Expand Up @@ -3859,6 +3859,98 @@ <h3>Finalizing the migration</h3>

# Other configs ...</pre>


<h3 class="anchor-heading"><a id="tiered_storage" class="anchor-link"></a><a href="#kraft">6.11 Tiered Storage</a></h3>

<h4 class="anchor-heading"><a id="tiered_storage_overview" class="anchor-link"></a><a href="#tiered_storage_overview">Tiered Storage Overview</a></h4>

<p>Kafka data is mostly consumed in a streaming fashion using tail reads. Tail reads leverage OS's page cache to serve the data instead of disk reads.
Older data is typically read from the disk for backfill or failure recovery purposes and is infrequent.</p>

<p>In the tiered storage approach, Kafka cluster is configured with two tiers of storage - local and remote.
The local tier is the same as the current Kafka that uses the local disks on the Kafka brokers to store the log segments.
The new remote tier uses external storage systems, such as HDFS or S3, to store the completed log segments.
Please check <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage">KIP-405</a> for more information.
</p>

<p><b>Note: Tiered storage is considered as an early access feature, and is not recommended for use in production environments</b></p>

<h4 class="anchor-heading"><a id="tiered_storage_config" class="anchor-link"></a><a href="#tiered_storage_config">Configuration</a></h4>

<h5 class="anchor-heading"><a id="tiered_storage_config_broker" class="anchor-link"></a><a href="#tiered_storage_config_broker">Broker Configurations</a></h5>

<p>By default, Kafka server will not enable tiered storage feature. <code>remote.log.storage.system.enable</code>
is the property to control whether to enable tiered storage functionality in a broker or not. Setting it to "true" enables this feature.
</p>

<p><code>RemoteStorageManager</code> is an interface to provide the lifecycle of remote log segments and indexes. Kafka server
doesn't provide out-of-the-box implementation of RemoteStorageManager. Configuring <code>remote.log.storage.manager.class.name</code>
and <code>remote.log.storage.manager.class.path</code> to specify the implementation of RemoteStorageManager.
</p>

<p><code>RemoteLogMetadataManager</code> is an interface to provide the lifecycle of metadata about remote log segments with strongly consistent semantics.
By default, Kafka provides an implementation with storage as an internal topic. This implementation can be changed by configuring
<code>remote.log.metadata.manager.class.name</code> and <code>remote.log.metadata.manager.class.path</code>.
When adopting the default kafka internal topic based implementation, <code>remote.log.metadata.manager.listener.name</code>
is a mandatory property to specify which listener the clients created by the default RemoteLogMetadataManager implementation.
</p>


<h5 class="anchor-heading"><a id="tiered_storage_config_topic" class="anchor-link"></a><a href="#tiered_storage_config_topic">Topic Configurations</a></h5>

<p>After correctly configuring broker side configurations for tiered storage feature, there are still configurations in topic level needed to be set.
<code>remote.storage.enable</code> is the switch to determine if a topic wants to use tiered storage or not. By default it is set to false.
After enabling <code>remote.storage.enable</code> property, the next thing to consider is the log retention.
When tiered storage is enabled for a topic, there are 2 additional log retention configurations to set:

<ul>
<li><code>local.retention.ms</code></li>
<li><code>retention.ms</code></li>
<li><code>local.retention.bytes</code></li>
<li><code>retention.bytes</code></li>
</ul>

The configuration prefixed with <code>local</code> are to specify the time/size the "local" log file can accept before moving to remote storage, and then get deleted.
If unset, The value in <code>retention.ms</code> and <code>retention.bytes</code> will be used.
</p>

<h4 class="anchor-heading"><a id="tiered_storage_config_ex" class="anchor-link"></a><a href="#tiered_storage_config_ex">Configurations Example</a></h4>

<p>Here is a sample configuration to enable tiered storage feature in broker side:
<pre>
# Sample Zookeeper/Kraft broker server.properties listening on PLAINTEXT://:9092
remote.log.storage.system.enable=true
# Please provide the implementation for remoteStorageManager. This is the mandatory configuration for tiered storage.
# remote.log.storage.manager.class.name=org.apache.kafka.server.log.remote.storage.NoOpRemoteStorageManager
# Using the "PLAINTEXT" listener for the clients in RemoteLogMetadataManager to talk to the brokers.
remote.log.metadata.manager.listener.name=PLAINTEXT
</pre>
</p>

<p>After broker is started, creating a topic with tiered storage enabled, and a small log time retention value to try this feature:
<pre>bin/kafka-topics.sh --create --topic tieredTopic --bootstrap-server localhost:9092 --config remote.storage.enable=true --config local.retention.ms=1000
</pre>
</p>

<p>Then, after the active segment is rolled, the old segment should be moved to the remote storage and get deleted.
</p>

<h4 class="anchor-heading"><a id="tiered_storage_limitation" class="anchor-link"></a><a href="#tiered_storage_limitation">Limitations</a></h4>

<p>While the early access release of Tiered Storage offers the opportunity to try out this new feature, it is important to be aware of the following limitations:
<ul>
<li>No support for clusters with multiple log directories (i.e. JBOD feature)</li>
<li>No support for compacted topics</li>
<li>Cannot disable tiered storage at the topic level</li>
<li>Deleting tiered storage enabled topics is required before disabling tiered storage at the broker level</li>
<li>Admin actions related to tiered storage feature are only supported on clients from version 3.0 onwards</li>
</ul>
</p>

<p>For more information, please check <a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes">Tiered Storage Early Access Release Note</a>.
</p>


</script>

<div class="p-ops"></div>
8 changes: 8 additions & 0 deletions docs/toc.html
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,14 @@
<li><a href="#kraft_zk_migration">ZooKeeper to KRaft Migration</a></li>
</ul>
</li>
<li><a href="#tiered_storage">6.11 Tiered Storage</a>
<ul>
<li><a href="#tiered_storage_overview">Tiered Storage Overview</a></li>
<li><a href="#tiered_storage_config">Configuration</a></li>
<li><a href="#tiered_storage_config_ex">Configurations Example</a></li>
<li><a href="#tiered_storage_limitation">Limitations</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#security">7. Security</a>
Expand Down
5 changes: 5 additions & 0 deletions docs/upgrade.html
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,11 @@ <h5><a id="upgrade_360_notable" href="#upgrade_360_notable">Notable changes in 3
<code>replication.policy.internal.topic.separator.enabled</code>
property. If upgrading from 3.0.x or earlier, it may be necessary to set this property to <code>false</code>; see the property's
<a href="#mirror_connector_replication.policy.internal.topic.separator.enabled">documentation</a> for more details.</li>
<li>Early access of tiered storage feature is available, and it is not recommended for use in production environments.
Welcome to test it and provide any feedback to us.
For more information about the early access tiered storage feature, please check <a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage">KIP-405</a> and
<a href="https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes">Tiered Storage Early Access Release Note</a>.
</li>
</ul>

<h4><a id="upgrade_3_5_0" href="#upgrade_3_5_0">Upgrading to 3.5.0 from any version 0.8.x through 3.4.x</a></h4>
Expand Down

0 comments on commit ac39342

Please sign in to comment.