HsunTzu

version

Beat 1.0

Very Fast Hdfs Origin File To Compress Decompress Untar Tarball

LISENCE

MIT

工欲善其事必先利其器 --- 荀子

这个工具主要是应用在HDFS上做文件及日志的压缩归档和逆操作解压等，支持多目录同时并行压缩，支持HDFS 现有的六种压缩格式，经测试在PB级数据上完全没有问题，该压缩工具使用不会占用 MapReduce Job队列，友好支持在 shell repl 中运行，也支持集成到独立的项目中，使用前请确认贵司的HDFS集群环境，需要配置一下集群的地址等等信息，在运行命令时，也需要指定必要的参数，比如要压缩的文件路径操作后的输出路径，压缩类型配置文件路径输入的压缩格式输出的压缩格式，项目还在不断添加新的功能中，欢迎大家踊跃尝试，解决 HDFS上的文件归档痛点，释放 HDFS更大的空间。现在支持四种完美的类型，1.原始文件被压2.原始文件打包，3，tar包文件解压为原始文件 4，批量目录文件的压缩或打包

Good tools are prerequisite to the successful execution of a job

First

You need install jdk 8 scala 2.12.1 + sbt 1.0.4 + hadoop 2.8.1 + ,

also you can edit the version on build.sbt and ./project/build.properties

Get

git clone [email protected]:mullerhai/HsunTzu.git

cd ./HsunTzu

Compile

sbt clean compile

Package

sbt update

sbt assembly

Run

hadoop jarHsunTzuPro-beat-2.0.jar inputPath outPath CompressType PropertiesFilePath inputCodec OutputCodec

you will see the logger info on console output