The HFSA tool provides a summary overview of the HDFS data files and directories of users and groups (answering 'who has how many/big/small files...').
- Download distribution
- Select the latest version folder
- Download hfsa-tool--bin.zip archive
- Unpack archive
- Run command
hfsa-tool-<VERSION>-SNAPSHOT/bin/hfsa-tool
You need a JDK 8+ installation andjava
in yourPATH
Analyze Hadoop FSImage file for user/group reports
Usage: hfsa-tool [-hV] [-v]... [-fun=<userNameFilter>] [-p=<dirs>[,
<dirs>...]]... FILE [COMMAND]
FILE FSImage file to process.
-fun, --filter-by-user=<userNameFilter>
Filter user name by <regexp>.
-h, --help Show this help message and exit.
-p, --path=<dirs>[,<dirs>...]
Directory path(s) to start traversing (default: [/]).
Default: [/]
-v Turns on verbose output. Use `-vv` for debug output.
-V, --version Print version information and exit.
Commands:
summary Generates an HDFS usage summary (default command if no other
command specified)
smallfiles, sf Reports on small file usage
inode, i Shows INode details
path, p Lists INode paths
Runs summary command by default.
> hfsa-tool src/test/resources/fsi_small.img
HDFS Summary : /
----------------
#Groups | #Users | #Directories | #Symlinks | #Files | Size [MB] | #Blocks | File Size Buckets
| | | | | | | 0 B 1 MiB 2 MiB 4 MiB 8 MiB 16 MiB 32 MiB 64 MiB 128 MiB 256 MiB
----------------------------------------------------------------------------------------------------------------------------------------------------------
3 | 3 | 8 | 0 | 11 | 331 | 12 | 0 2 1 2 1 0 2 1 1 1
By group: 3 | #Directories | #SymLinks | #File | Size [MB] | #Blocks | File Size Buckets
| | | | | | 0 B 1 MiB 2 MiB 4 MiB 8 MiB 16 MiB 32 MiB 64 MiB 128 MiB 256 MiB
---------------------------------------------------------------------------------------------------------------------------------------------------------
root | 0 | 0 | 1 | 0 | 1 | 0 1 0 0 0 0 0 0 0 0
supergroup | 8 | 0 | 8 | 151 | 8 | 0 1 1 2 1 0 1 1 1 0
nobody | 0 | 0 | 2 | 180 | 3 | 0 0 0 0 0 0 1 0 0 1
By user: 3 | #Directories | #SymLinks | #File | Size [MB] | #Blocks | File Size Buckets
| | | | | | 0 B 1 MiB 2 MiB 4 MiB 8 MiB 16 MiB 32 MiB 64 MiB 128 MiB 256 MiB
---------------------------------------------------------------------------------------------------------------------------------------------------------
root | 0 | 0 | 1 | 0 | 1 | 0 1 0 0 0 0 0 0 0 0
foo | 0 | 0 | 1 | 160 | 2 | 0 0 0 0 0 0 0 0 0 1
mm | 8 | 0 | 9 | 171 | 9 | 0 1 1 2 1 0 2 1 1 0
Usage: hfsa-tool summary [-hV] [-s=<sort>]
Generates an HDFS usage summary (default command if no other command specified)
-h, --help Show this help message and exit.
-s, --sort=<sort> Sort by <fs> size, <fc> file count, <dc> directory count or
<bc> block count (default: fs).
Default: fs
-V, --version Print version information and exit.
Usage: hfsa-tool smallfiles [-hV] [--fsl=<fileSizeLimitBytes>]
[--uphl=<hotspotsLimit>]
Reports on small file usage
--fsl, -fileSizeLimit=<fileSizeLimitBytes>
Small file size limit in bytes (IEC binary formatted, eg 2MiB).
Every file less equals the limit counts as a small file.
Default: 2097152
--uphl, -userPathHotspotLimit=<hotspotsLimit>
Limit of directory hotspots containing most small files.
Default: 10
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
Report on small files less than 3 megabytes, for all users matching regexp m.*
:
> hfsa-tool -fun="m.*" src/test/resources/fsi_small.img smallfiles --fsl 3MiB
Small files report (< 3 MiB)
Overall small files : 4
User (filtered) small files : 3
#Small files | Path (top 10)
------------------------------
4 | /
3 | /test3
2 | /test3/foo
1 | /test3/foo/bar
Username | #Small files
-----------------------
mm | 3
Username | Small files hotspots (top 10 count/path)
---------------------------------------------------
mm | 3 | /
| 2 | /test3
| 1 | /test3/foo
| 1 | /test3/foo/bar
---------------------------------------------------
Show details of selected INode, e.g. by directory path or file path or inode ID:
> hfsa-tool src/test/resources/fsi_small.img inode "/test3" "/test3/test_160MiB.img"
type: DIRECTORY
id: 16388
name: "test3"
directory {
modificationTime: 1497734744891
nsQuota: 18446744073709551615
dsQuota: 18446744073709551615
permission: 1099511759341
}
type: FILE
id: 16402
name: "test_160MiB.img"
file {
replication: 1
modificationTime: 1497734744886
accessTime: 1497734743534
preferredBlockSize: 134217728
permission: 5497558401444
blocks {
blockId: 1073741834
genStamp: 1010
numBytes: 134217728
}
blocks {
blockId: 1073741835
genStamp: 1011
numBytes: 33554432
}
storagePolicyID: 0
}
Lists all INode paths (files, directories, symlinks) similar to a recursive 'ls'.
Example filtering user with regexp m.*
and for paths /test3
and /test1
:
> hfsa-tool -fun="m.*" -p "/test3","/test1" src/test/resources/fsi_small.img p
Path report (paths=[/test3, /test1], user=~m.*) :
-------------------------------------------------
8 files, 4 directories and 0 symlinks
drwxr-xr-x mm supergroup /test1/test1
drwxr-xr-x mm supergroup /test3/foo
drwxr-xr-x mm supergroup /test3/foo/bar
-rw-r--r-- mm nobody /test3/foo/bar/test_20MiB.img
-rw-r--r-- mm supergroup /test3/foo/bar/test_2MiB.img
-rw-r--r-- mm supergroup /test3/foo/bar/test_40MiB.img
-rw-r--r-- mm supergroup /test3/foo/bar/test_4MiB.img
-rw-r--r-- mm supergroup /test3/foo/bar/test_5MiB.img
-rw-r--r-- mm supergroup /test3/foo/bar/test_80MiB.img
-rw-r--r-- mm supergroup /test3/foo/test_20MiB.img
-rw-r--r-- mm supergroup /test3/test.img
drwxr-xr-x mm supergroup /test3/test3
Useful to find locations with old data.
> hfsa-tool src/test/resources/fsi_small.img uu -a 60d mm
Size report (user=mm, start dir=/, last modification older 2021-05-12T23:49:44.203)
/ | 172 MiB
/test3 | 172 MiB
/test3/foo | 171 MiB
/test3/foo/bar | 151 MiB
See requirements