Skip to content

Commit

Permalink
HDFS-15374. Add documentation for fedbalance tool. Contributed by Jin…
Browse files Browse the repository at this point in the history
…glun.
  • Loading branch information
linyiqun committed Jul 1, 2020
1 parent de2cb86 commit ff8bb67
Show file tree
Hide file tree
Showing 4 changed files with 202 additions and 0 deletions.
1 change: 1 addition & 0 deletions hadoop-project/src/site/site.xml
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,7 @@
<item name="Hadoop Archives" href="hadoop-archives/HadoopArchives.html"/>
<item name="Hadoop Archive Logs" href="hadoop-archive-logs/HadoopArchiveLogs.html"/>
<item name="DistCp" href="hadoop-distcp/DistCp.html"/>
<item name="HDFS Federation Balance" href="hadoop-federation-balance/HDFSFederationBalance.html"/>
<item name="GridMix" href="hadoop-gridmix/GridMix.html"/>
<item name="Rumen" href="hadoop-rumen/Rumen.html"/>
<item name="Resource Estimator Service" href="hadoop-resourceestimator/ResourceEstimator.html"/>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

HDFS Federation Balance Guide
=====================

<!-- MACRO{toc|fromDepth=0|toDepth=3} -->

Overview
--------

HDFS Federation Balance is a tool balancing data across different federation
namespaces. It uses [DistCp](../hadoop-distcp/DistCp.html) to copy data from
the source path to the target path. First it creates a snapshot at the source
path and submits the initial distcp. Second it uses distcp diff to do the
incremental copy until the source and the target are the same. Then If it's
working in RBF mode it updates the mount table in Router. Finally it moves the
source to trash.

This document aims to describe the usage and design of the HDFS Federation
Balance.

Usage
-----

### Basic Usage

The hdfs federation balance tool supports both normal federation cluster and
router-based federation cluster. Taking rbf for example. Supposing we have a
mount entry in Router:

Source Destination
/foo/src hdfs://namespace-0/foo/src

The command below runs an hdfs federation balance job. The first parameter is
the mount entry. The second one is the target path which must include the
target cluster. The option `-router` indicates this is in router-based
federation mode.

bash$ /bin/hadoop fedbalance -router submit /foo/src hdfs://namespace-1/foo/dst

It copies data from hdfs://namespace-0/foo/src to hdfs://namespace-1/foo/dst
incrementally and finally updates the mount entry to:

Source Destination
/foo/src hdfs://namespace-1/foo/dst

If the hadoop shell process exits unexpectedly, we can use the command below
to continue the unfinished job:

bash$ /bin/hadoop fedbalance continue

This will scan the journal to find all the unfinished jobs, recover and
continue to execute them.

If we want to balance in a normal federation cluster, use the command below.

bash$ /bin/hadoop fedbalance submit hdfs://namespace-0/foo/src hdfs://namespace-1/foo/dst

In normal federation mode the source path must includes the path schema.

### RBF Mode And Normal Federation Mode

The hdfs federation balance tool has 2 modes:

* the router-based federation mode (RBF mode).
* the normal federation mode.

By default the command runs in the normal federation mode. You can specify the
rbf mode by using the option `-router`.

In the rbf mode the first parameter is taken as the mount point. It disables
write by setting the mount point readonly.

In the normal federation mode the first parameter is taken as the full path of
the source. The first parameter must include the source cluster. It disables
write by cancelling all the permissions of the source path.

Details about disabling write see [HDFS FedBalance](#HDFS_FedBalance).

### Command Options

Command `submit` has 5 options:

| Option key | Description | Default |
| ------------------------------ | ------------------------------------ | ------- |
| -router | Run in router-based federation mode. | Normal federation mode. |
| -forceCloseOpen | Force close all open files when there is no diff in the DIFF_DISTCP stage. | Wait until there is no open files. |
| -map | Max number of concurrent maps to use for copy. | 10 |
| -bandwidth | Specify bandwidth per map in MB. | 10 |
| -delay | Specify the delayed duration(millie seconds) when the job needs to retry. | 1000 |
| -moveToTrash | This options has 3 values: `trash` (move the source path to trash), `delete` (delete the source path directly) and `skip` (skip both trash and deletion). By default the server side trash interval is used. If the trash is disabled in the server side, the default trash interval 60 minutes is used. | trash |

### Configuration Options
--------------------

Set configuration options at fedbalance-site.xml.

| Configuration key | Description | Default |
| ------------------------------ | ------------------------------------ | ------- |
| hdfs.fedbalance.procedure.work.thread.num | The worker threads number of the BalanceProcedureScheduler. BalanceProcedureScheduler is responsible for scheduling a balance job, including submit, run, delay and recover. | 10 |
| hdfs.fedbalance.procedure.scheduler.journal.uri | The uri of the journal, the journal file is used for handling the job persistence and recover. | hdfs://localhost:8020/tmp/procedure |

Architecture of HDFS Federation Balance
----------------------

The components of the HDFS Federation Balance can be classified into the
following categories:

* Balance Procedure Scheduler
* HDFS FedBalance

### Balance Procedure Scheduler

The Balance Procedure Scheduler implements a state machine. It's responsible
for scheduling a balance job, including submit, run, delay and recover.
The model is showed below:

![Balance Procedure Scheduler](images/BalanceProcedureScheduler.png)

* After a job is submitted, the job is added to the pendingQueue.
* The worker threads take jobs and run them. Journals are written to storage.
* If writing the journal fails, the job is added to the recoverQueue for later
recovery. If Worker thread catches a RetryTaskException, it adds the job to
the delayQueue.
* Rooster thread takes job from delayQueue and adds it back to pendingQueue.
* When a scheduler starts, it scans all the unfinished jobs from the journal
and add them to the recoverQueue. The recover thread will recover them from
the journal and add them back to the pendingQueue.

### HDFS FedBalance

HDFS FedBalance is implemented as a job of the state machine. All the distcp
balance logic are implemented here. An HDFS FedBalance job consists of 3
procedures:

* DistCpProcedure: This is the first procedure. It handles all the data copy
works. There are 6 stages:
* PRE_CHECK: Do the pre-check of the src and dst path.
* INIT_DISTCP: Create a snapshot of the source path and distcp it to the
target.
* DIFF_DISTCP: Submit distcp with `-diff` round by round to sync source and
target paths. If `-forceCloseOpen` is set, this stage will finish when
there is no diff between src and dst. Otherwise this stage only finishes
when there is no diff and no open files.
* DISABLE_WRITE: Disable write operations so the src won't be changed. When
working in router mode, it is done by making the mount point readonly.
In normal federation mode it is done by cancelling all the permissions of
the source path.
* FINAL_DISTCP: Force close all the open files and submit the final distcp.
* FINISH: Do the cleanup works. In normal federation mode the finish stage
also restores the permission of the dst path.

* MountTableProcedure: This procedure updates the mount entry in Router. The
readonly is unset and the destination is updated of the mount point. This
procedure is activated only when option `-router`.

* TrashProcedure: This procedure moves the source path to trash.

After all 3 procedures finish, the balance job is done.
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#banner {
height: 93px;
background: none;
}

#bannerLeft img {
margin-left: 30px;
margin-top: 10px;
}

#bannerRight img {
margin: 17px;
}

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit ff8bb67

Please sign in to comment.