Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Commit

Permalink
Support HDFS 3.x (#1913)
Browse files Browse the repository at this point in the history
  • Loading branch information
PHILO-HE authored Aug 28, 2018
1 parent ee43b9c commit 4389259
Show file tree
Hide file tree
Showing 71 changed files with 3,980 additions and 489 deletions.
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ matrix:
env: PROFILE="-Phadoop-2.7"
- jdk: "openjdk8"
env: PROFILE="-Phadoop-cdh-2.6"
- jdk: "openjdk8"
env: PROFILE="-Phadoop-3.1"

cache:
directories:
Expand Down
4 changes: 4 additions & 0 deletions bin/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -267,9 +267,13 @@ function smart_stop_daemon() {

function reorder_lib() {
local ajar="lib/jersey-core-1.9.jar"
local bjar="lib/jsr311-api-1.1.1.jar"
if [ -f "${SMART_HOME}/${ajar}" ]; then
SMART_CLASSPATH="${SMART_HOME}/${ajar}:${SMART_CLASSPATH}"
fi
if [ -f "${SMART_HOME}/${bjar}" ]; then
SMART_CLASSPATH="${SMART_HOME}/${bjar}:${SMART_CLASSPATH}"
fi
}

function init_command() {
Expand Down
2 changes: 1 addition & 1 deletion docs/block-level-ec.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ lot from its design and experience.

Striped EC vs Block EC
======================
EC with striped layout can achieve storage saving with both small files and large files. It supports online erasure coding and when data writen to HDFS, the coding is already done. On high speed network environment, it outperforms than replica because at the same time it can write to more than one datanode in parallevel. While one drawback is it loses the data locality advantage which may impact the performance of upper layer applications especially for those particularly optimized according to the advantage. For more Striped EC introduction,
EC with striped layout can achieve storage saving with both small files and large files. It supports online erasure coding and when data writen to HDFS, the coding is already done. On high speed network environment, it outperforms than replica because at the same time it can write to more than one datanode in parallel. While one drawback is it loses the data locality advantage which may impact the performance of upper layer applications especially for those particularly optimized according to the advantage. For more Striped EC introduction,
refer to this [our joint blog with Cloudera](https://blog.cloudera.com/blog/2015/09/introduction-to-hdfs-erasure-coding-in-apache-hadoop/).

Block EC is very suitable for very large files of enough blocks needed by a erasure coding group. The erasure coding is done offline and in background instead of online while client writing data. Compared with striped EC, block EC keeps data locality, being of less performance impact to some frameworks and applications.
Expand Down
35 changes: 20 additions & 15 deletions docs/ssm-deployment-guide.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# SSM Deployment Guide with Hadoop (CDH5.10.1 or Apache Hadoop 2.7.3)
# SSM Deployment Guide with Hadoop
----------------------------------------------------------------------------------
## Requirements:

* Unix/Unix-like Operation System
* JDK 1.7 for CDH5.10.1 or JDK 1.8 for Apache Hadoop 2.7.3
* CDH 5.10.1 or Apache Hadoop 2.7.3
* Unix/Unix-like OS
* JDK 1.7 for CDH 5.10.1 or JDK 1.8 for Apache Hadoop 2.7.3/3.1.0
* CDH 5.10.1 or Apache Hadoop 2.7.3/3.1.0
* MySQL Community 5.7.18+
* Maven 3.1.1+
* Maven 3.1.1+ (merely for build use)

## Why JDK 1.7 is preferred
## Why JDK 1.7 is preferred for CDH 5.10.1

It is because by default CDH5.10.1 supports compile and run with JDK 1.7. If you
It is because by default CDH 5.10.1 supports compile and run with JDK 1.7. If you
want to use JDK1.8, please turn to Cloudera web site for how to support JDK1.8
in CDH5.10.1.
in CDH 5.10.1.
For SSM, JDK 1.7 and 1.8 are both supported.


Expand All @@ -24,14 +24,17 @@ Download SSM branch from Github https://github.com/Intel-bigdata/SSM/

## **Build SSM**

### For CDH5.10.1
### For CDH 5.10.1

`mvn clean package -Pdist,web,hadoop-cdh-2.6 -DskipTests`

### For Hadoop 2.7.3

`mvn clean package -Pdist,web,hadoop-2.7 -DskipTests`

### For Hadoop 3.1.0

`mvn clean package -Pdist,web,hadoop-3.1 -DskipTests`

A tar distribution package will be generated under 'smart-dist/target'. unzip the tar distribution package to ${SMART_HOME} directory, the configuration files of SSM is under '${SMART_HOME}/conf'.
More detailed information, please refer to BUILDING.txt file.
Expand Down Expand Up @@ -190,7 +193,9 @@ After finishing the SSM configuration, we can start to deploy the SSM package wi
# Deploy SSM
---------------------------------------------------------------------------------

SSM supports two running modes, standalone service and SSM service with multiple Smart Agents. If file move performance is not the concern, then standalone service mode is enough. If better performance is desired, we recommend to deploy one agent on each Datanode. To deploy SSM to Smart Server nodes and Smart Agent nodes (if configured), you can enter into ${SMART_HOME} directory and type "./bin/install.sh". You can use --config <config-dir> to specify where SSM's config directory is. ${SMART_HOME}/conf is the default config directory if the config option is not used.
SSM supports two running modes, standalone service and SSM service with multiple Smart Agents. If file move performance is not the concern, then standalone service mode is enough. If better performance is desired, we recommend to deploy one agent on each Datanode.

To deploy SSM to Smart Server nodes and Smart Agent nodes (if configured), please execute command `./bin/install.sh` in ${SMART_HOME}. You can use --config <config-dir> to specify where SSM's config directory is. ${SMART_HOME}/conf is the default config directory if the config option is not used.

## Standalone SSM Service

Expand Down Expand Up @@ -251,12 +256,12 @@ Enter into ${SMART_HOME} directory for running SSM. You can type `./bin/ssm vers

# Hadoop Configuration
----------------------------------------------------------------------------------
After install CDH5.10.1 or Apache Hadoop 2.7.3, please do the following configurations for integrating SSM.
Please do the following configurations for integrating SSM with CDH 5.10.1, Apache Hadoop 2.7.3 or Apache Hadoop 3.1.0.

**Warning: This step may lead to `Hadoop not working issue` if it's not configured correctly. So, during testing, we don't recommend changing any configurations in Hadoop.** Actually, SSM can work with an existing Hadoop cluster without any configuration change in Hadoop. Although in that case SSM cannot collect access count or data temperature from Hadoop, you can still use SSM action to change access count or data temperature. For example, you can use `read -file XXX` to change access count or data temperature of file `XXX`.


## Apache Hadoop 2.7.3
## Apache Hadoop 2.7.3 or 3.1.0

### core-site.xml changes

Expand Down Expand Up @@ -327,7 +332,7 @@ Copy the SSM jars to the default Hadoop class path
2. Distribute the jars starts with smart to one of default hadoop classpath in each NameNode/DataNode. For example, copy SSM jars to `$HADOOP_HOME/share/hadoop/common/lib`.


## CDH5.10.1
## CDH 5.10.1

### core-site.xml changes

Expand Down Expand Up @@ -527,7 +532,7 @@ SSM choose to save cmdlet and action execution history in metastore for audit an
```
SSM service restart is required after the configuration changes.

## Batch Size of Namespace fetcher
## Batch size of Namespace fetcher

SSM will fetch/sync namespace from namenode when it is started. According to our tests, a large namespace may lead to long start up time. To avoid this, we add a parameter named `smart.namespace.fetcher.batch`, its default value is 500. You can change it if namespace is very large, e.g., 100M or more. A larger batch size will greatly speed up fetcher efficiency, and reduce start up time.
```xml
Expand All @@ -551,7 +556,7 @@ SSM supports concurrently fetching namespace. You can set a large value for each
<description>Number of consumers in namespace fetcher</description>
</property>
```
## Disable SSM Client
## Disable SSM SmartDFSClient

For some reasons, if you do want to disable SmartDFSClients on a specific host from contacting SSM server, it can be realized by using the following commands. After that, newly created SmartDFSClients on that node will not try to connect SSM server while other functions (like HDFS read/write) will remain unaffected.

Expand Down
48 changes: 48 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,22 @@
</repository>
</repositories>
</profile>
<profile>
<id>hadoop-3.1</id>
<modules>
<module>smart-hadoop-support</module>
</modules>
<properties>
<hadoop.version>3.1.0</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>alluxio</id>
<modules>
Expand Down Expand Up @@ -277,6 +293,26 @@
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-servlet</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-webapp</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty.websocket</groupId>
<artifactId>websocket-server</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
Expand Down Expand Up @@ -316,6 +352,18 @@
<groupId>javax.servlet.jsp</groupId>
<artifactId>jsp-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util-ajax</artifactId>
</exclusion>
</exclusions>
</dependency>

Expand Down
8 changes: 1 addition & 7 deletions smart-agent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,6 @@
<version>1.5.0-SNAPSHOT</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client</artifactId>
<version>1.5.0-SNAPSHOT</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-tidb</artifactId>
Expand All @@ -67,7 +61,7 @@
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
Expand Down
14 changes: 14 additions & 0 deletions smart-common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,20 @@
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-servlet</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-webapp</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>junit</groupId>
Expand Down
49 changes: 48 additions & 1 deletion smart-dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,22 @@
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client</artifactId>
<artifactId>smart-inputstream</artifactId>
<version>1.5.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-2</artifactId>
</exclusion>
<exclusion>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-2.7</artifactId>
</exclusion>
<exclusion>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client-2.7</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
Expand All @@ -62,10 +76,18 @@
<version>1.5.0-SNAPSHOT</version>
<!--Work around for https://issues.apache.org/jira/browse/MNG-1388 -->
<exclusions>
<exclusion>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-2</artifactId>
</exclusion>
<exclusion>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-2.7</artifactId>
</exclusion>
<exclusion>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client-2.7</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
Expand Down Expand Up @@ -124,6 +146,11 @@
<artifactId>smart-hadoop-2.7</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client-2.7</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
</dependencies>
</profile>
<profile>
Expand All @@ -134,6 +161,26 @@
<artifactId>smart-hadoop-cdh-2.6</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client-cdh-2.6</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>hadoop-3.1</id>
<dependencies>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-3.1</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client-3.1</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
</dependencies>
</profile>
<profile>
Expand Down
41 changes: 36 additions & 5 deletions smart-engine/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,12 @@
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client</artifactId>
<version>1.5.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<groupId>commons-configuration</groupId>
<artifactId>commons-configuration</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--
<dependency>
Expand Down Expand Up @@ -144,6 +145,11 @@
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.25</version>
</dependency>
</dependencies>

<profiles>
Expand All @@ -159,6 +165,11 @@
<artifactId>smart-hadoop-2.7</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client-2.7</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
</dependencies>
</profile>
<profile>
Expand All @@ -169,6 +180,26 @@
<artifactId>smart-hadoop-cdh-2.6</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client-cdh-2.6</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>hadoop-3.1</id>
<dependencies>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-3.1</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.smartdata</groupId>
<artifactId>smart-hadoop-client-3.1</artifactId>
<version>1.5.0-SNAPSHOT</version>
</dependency>
</dependencies>
</profile>
<profile>
Expand Down
Loading

0 comments on commit 4389259

Please sign in to comment.