Support HDFS 3.x (#1913)

Intel-bigdata · Aug 28, 2018 · 4389259 · 4389259
1 parent ee43b9c
commit 4389259
Show file tree

Hide file tree

Showing 71 changed files with 3,980 additions and 489 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -18,6 +18,8 @@ matrix:
       env: PROFILE="-Phadoop-2.7"
     - jdk: "openjdk8"
       env: PROFILE="-Phadoop-cdh-2.6"
+    - jdk: "openjdk8"
+      env: PROFILE="-Phadoop-3.1"
 
 cache:
   directories:

diff --git a/bin/common.sh b/bin/common.sh
@@ -267,9 +267,13 @@ function smart_stop_daemon() {
 
 function reorder_lib() {
   local ajar="lib/jersey-core-1.9.jar"
+  local bjar="lib/jsr311-api-1.1.1.jar"
   if [ -f "${SMART_HOME}/${ajar}" ]; then
     SMART_CLASSPATH="${SMART_HOME}/${ajar}:${SMART_CLASSPATH}"
   fi
+  if [ -f "${SMART_HOME}/${bjar}" ]; then
+    SMART_CLASSPATH="${SMART_HOME}/${bjar}:${SMART_CLASSPATH}"
+  fi
 }
 
 function init_command() {

diff --git a/docs/block-level-ec.md b/docs/block-level-ec.md
@@ -14,7 +14,7 @@ lot from its design and experience.
 
 Striped EC vs Block EC
 ======================
-EC with striped layout can achieve storage saving with both small files and large files. It supports online erasure coding and when data writen to HDFS, the coding is already done. On high speed network environment, it outperforms than replica because at the same time it can write to more than one datanode in parallevel. While one drawback is it loses the data locality advantage which may impact the performance of upper layer applications especially for those particularly optimized according to the advantage. For more Striped EC introduction,
+EC with striped layout can achieve storage saving with both small files and large files. It supports online erasure coding and when data writen to HDFS, the coding is already done. On high speed network environment, it outperforms than replica because at the same time it can write to more than one datanode in parallel. While one drawback is it loses the data locality advantage which may impact the performance of upper layer applications especially for those particularly optimized according to the advantage. For more Striped EC introduction,
 refer to this [our joint blog with Cloudera](https://blog.cloudera.com/blog/2015/09/introduction-to-hdfs-erasure-coding-in-apache-hadoop/).
 
 Block EC is very suitable for very large files of enough blocks needed by a erasure coding group. The erasure coding is done offline and in background instead of online while client writing data. Compared with striped EC, block EC keeps data locality, being of less performance impact to some frameworks and applications.

diff --git a/docs/ssm-deployment-guide.md b/docs/ssm-deployment-guide.md
@@ -1,18 +1,18 @@
-# SSM Deployment Guide with Hadoop (CDH5.10.1 or Apache Hadoop 2.7.3)
+# SSM Deployment Guide with Hadoop
 ----------------------------------------------------------------------------------
 ## Requirements:
 
-* Unix/Unix-like Operation System
-* JDK 1.7 for CDH5.10.1 or JDK 1.8 for Apache Hadoop 2.7.3
-* CDH 5.10.1 or Apache Hadoop 2.7.3
+* Unix/Unix-like OS
+* JDK 1.7 for CDH 5.10.1 or JDK 1.8 for Apache Hadoop 2.7.3/3.1.0
+* CDH 5.10.1 or Apache Hadoop 2.7.3/3.1.0
 * MySQL Community 5.7.18+
-* Maven 3.1.1+
+* Maven 3.1.1+ (merely for build use)
 
-## Why JDK 1.7 is preferred
+## Why JDK 1.7 is preferred for CDH 5.10.1
 
-  It is because by default CDH5.10.1 supports compile and run with JDK 1.7. If you
+  It is because by default CDH 5.10.1 supports compile and run with JDK 1.7. If you
   want to use JDK1.8, please turn to Cloudera web site for how to support JDK1.8
-  in CDH5.10.1.
+  in CDH 5.10.1.
   For SSM, JDK 1.7 and 1.8 are both supported.
 
 
@@ -24,14 +24,17 @@ Download SSM branch from Github https://github.com/Intel-bigdata/SSM/
 
 ##  **Build SSM**
 
-###   For CDH5.10.1
+###   For CDH 5.10.1
 
   	`mvn clean package -Pdist,web,hadoop-cdh-2.6 -DskipTests`
 
 ###   For Hadoop 2.7.3
 
 	`mvn clean package -Pdist,web,hadoop-2.7 -DskipTests`
 
+###   For Hadoop 3.1.0
+
+	`mvn clean package -Pdist,web,hadoop-3.1 -DskipTests`
 
 A tar distribution package will be generated under 'smart-dist/target'. unzip the tar distribution package to ${SMART_HOME} directory, the configuration files of SSM is under '${SMART_HOME}/conf'.
 More detailed information, please refer to BUILDING.txt file.
@@ -190,7 +193,9 @@ After finishing the SSM configuration, we can start to deploy the SSM package wi
 # Deploy SSM
 ---------------------------------------------------------------------------------
 
-SSM supports two running modes, standalone service and SSM service with multiple Smart Agents. If file move performance is not the concern, then standalone service mode is enough. If better performance is desired, we recommend to deploy one agent on each Datanode. To deploy SSM to Smart Server nodes and Smart Agent nodes (if configured), you can enter into ${SMART_HOME} directory and type "./bin/install.sh". You can use --config <config-dir> to specify where SSM's config directory is. ${SMART_HOME}/conf is the default config directory if the config option is not used.
+SSM supports two running modes, standalone service and SSM service with multiple Smart Agents. If file move performance is not the concern, then standalone service mode is enough. If better performance is desired, we recommend to deploy one agent on each Datanode.
+
+To deploy SSM to Smart Server nodes and Smart Agent nodes (if configured), please execute command `./bin/install.sh` in ${SMART_HOME}. You can use --config <config-dir> to specify where SSM's config directory is. ${SMART_HOME}/conf is the default config directory if the config option is not used.
 
    ## Standalone SSM Service
 
@@ -251,12 +256,12 @@ Enter into ${SMART_HOME} directory for running SSM. You can type `./bin/ssm vers
 
 # Hadoop Configuration
 ----------------------------------------------------------------------------------
-After install CDH5.10.1 or Apache Hadoop 2.7.3, please do the following configurations for integrating SSM.
+Please do the following configurations for integrating SSM with CDH 5.10.1, Apache Hadoop 2.7.3 or Apache Hadoop 3.1.0.
 
 **Warning: This step may lead to `Hadoop not working issue` if it's not configured correctly. So, during testing, we don't recommend changing any configurations in Hadoop.** Actually, SSM can work with an existing Hadoop cluster without any configuration change in Hadoop. Although in that case SSM cannot collect access count or data temperature from Hadoop, you can still use SSM action to change access count or data temperature. For example, you can use `read -file XXX` to change access count or data temperature of file `XXX`.
 
 
-## Apache Hadoop 2.7.3
+## Apache Hadoop 2.7.3 or 3.1.0
 
 ### core-site.xml changes 
 
@@ -327,7 +332,7 @@ Copy the SSM jars to the default Hadoop class path
   2. Distribute the jars starts with smart to one of default hadoop classpath in each NameNode/DataNode. For example, copy SSM jars to `$HADOOP_HOME/share/hadoop/common/lib`.
 
 
-## CDH5.10.1
+## CDH 5.10.1
 
 ### core-site.xml changes 
 
@@ -527,7 +532,7 @@ SSM choose to save cmdlet and action execution history in metastore for audit an
 ```
 SSM service restart is required after the configuration changes.
 
-## Batch Size of Namespace fetcher
+## Batch size of Namespace fetcher
 
 SSM will fetch/sync namespace from namenode when it is started. According to our tests, a large namespace may lead to long start up time. To avoid this, we add a parameter named `smart.namespace.fetcher.batch`, its default value is 500. You can change it if namespace is very large, e.g., 100M or more. A larger batch size will greatly speed up fetcher efficiency, and reduce start up time.
 ```xml
@@ -551,7 +556,7 @@ SSM supports concurrently fetching namespace. You can set a large value for each
         <description>Number of consumers in namespace fetcher</description>
     </property>
 ```
-##  Disable SSM Client
+##  Disable SSM SmartDFSClient
 
 For some reasons, if you do want to disable SmartDFSClients on a specific host from contacting SSM server, it can be realized by using the following commands. After that, newly created SmartDFSClients on that node will not try to connect SSM server while other functions (like HDFS read/write) will remain unaffected.
 

diff --git a/pom.xml b/pom.xml
@@ -91,6 +91,22 @@
         </repository>
       </repositories>
     </profile>
+    <profile>
+      <id>hadoop-3.1</id>
+      <modules>
+        <module>smart-hadoop-support</module>
+      </modules>
+      <properties>
+        <hadoop.version>3.1.0</hadoop.version>
+      </properties>
+      <dependencies>
+        <dependency>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-hdfs-client</artifactId>
+          <version>${hadoop.version}</version>
+        </dependency>
+      </dependencies>
+    </profile>
     <profile>
       <id>alluxio</id>
       <modules>
@@ -277,6 +293,26 @@
             <groupId>com.google.code.findbugs</groupId>
             <artifactId>jsr305</artifactId>
           </exclusion>
+          <exclusion>
+            <groupId>org.eclipse.jetty</groupId>
+            <artifactId>jetty-server</artifactId>
+          </exclusion>
+          <exclusion>
+            <groupId>org.eclipse.jetty</groupId>
+            <artifactId>jetty-servlet</artifactId>
+          </exclusion>
+          <exclusion>
+            <groupId>org.eclipse.jetty</groupId>
+            <artifactId>jetty-util</artifactId>
+          </exclusion>
+          <exclusion>
+            <groupId>org.eclipse.jetty</groupId>
+            <artifactId>jetty-webapp</artifactId>
+          </exclusion>
+          <exclusion>
+            <groupId>org.eclipse.jetty.websocket</groupId>
+            <artifactId>websocket-server</artifactId>
+          </exclusion>
         </exclusions>
       </dependency>
       <dependency>
@@ -316,6 +352,18 @@
             <groupId>javax.servlet.jsp</groupId>
             <artifactId>jsp-api</artifactId>
           </exclusion>
+          <exclusion>
+            <groupId>org.eclipse.jetty</groupId>
+            <artifactId>jetty-server</artifactId>
+          </exclusion>
+          <exclusion>
+            <groupId>org.eclipse.jetty</groupId>
+            <artifactId>jetty-util</artifactId>
+          </exclusion>
+          <exclusion>
+            <groupId>org.eclipse.jetty</groupId>
+            <artifactId>jetty-util-ajax</artifactId>
+          </exclusion>
         </exclusions>
       </dependency>
 

diff --git a/smart-agent/pom.xml b/smart-agent/pom.xml
@@ -38,12 +38,6 @@
       <version>1.5.0-SNAPSHOT</version>
       <scope>provided</scope>
     </dependency>
-    <dependency>
-      <groupId>org.smartdata</groupId>
-      <artifactId>smart-hadoop-client</artifactId>
-      <version>1.5.0-SNAPSHOT</version>
-      <scope>runtime</scope>
-    </dependency>
     <dependency>
       <groupId>org.smartdata</groupId>
       <artifactId>smart-tidb</artifactId>
@@ -67,7 +61,7 @@
       <scope>test</scope>
     </dependency>
   </dependencies>
-
+  
   <build>
     <plugins>
       <plugin>

diff --git a/smart-common/pom.xml b/smart-common/pom.xml
@@ -39,6 +39,20 @@
         <dependency>
             <groupId>org.apache.hadoop</groupId>
             <artifactId>hadoop-common</artifactId>
+            <exclusions>
+                <exclusion>
+                    <groupId>org.eclipse.jetty</groupId>
+                    <artifactId>jetty-server</artifactId>
+                </exclusion>
+                <exclusion>
+                    <groupId>org.eclipse.jetty</groupId>
+                    <artifactId>jetty-servlet</artifactId>
+                </exclusion>
+                <exclusion>
+                    <groupId>org.eclipse.jetty</groupId>
+                    <artifactId>jetty-webapp</artifactId>
+                </exclusion>
+            </exclusions>
         </dependency>
         <dependency>
             <groupId>junit</groupId>

diff --git a/smart-dist/pom.xml b/smart-dist/pom.xml
@@ -43,8 +43,22 @@
         </dependency>
         <dependency>
             <groupId>org.smartdata</groupId>
-            <artifactId>smart-hadoop-client</artifactId>
+            <artifactId>smart-inputstream</artifactId>
             <version>1.5.0-SNAPSHOT</version>
+            <exclusions>
+                <exclusion>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-2</artifactId>
+                </exclusion>
+                <exclusion>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-2.7</artifactId>
+                </exclusion>
+                <exclusion>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-client-2.7</artifactId>
+                </exclusion>
+            </exclusions>
         </dependency>
         <dependency>
             <groupId>org.smartdata</groupId>
@@ -62,10 +76,18 @@
             <version>1.5.0-SNAPSHOT</version>
             <!--Work around for https://issues.apache.org/jira/browse/MNG-1388 -->
             <exclusions>
+                <exclusion>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-2</artifactId>
+                </exclusion>
                 <exclusion>
                     <groupId>org.smartdata</groupId>
                     <artifactId>smart-hadoop-2.7</artifactId>
                 </exclusion>
+                <exclusion>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-client-2.7</artifactId>
+                </exclusion>
             </exclusions>
         </dependency>
         <dependency>
@@ -124,6 +146,11 @@
                     <artifactId>smart-hadoop-2.7</artifactId>
                     <version>1.5.0-SNAPSHOT</version>
                 </dependency>
+                <dependency>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-client-2.7</artifactId>
+                    <version>1.5.0-SNAPSHOT</version>
+                </dependency>
             </dependencies>
         </profile>
         <profile>
@@ -134,6 +161,26 @@
                     <artifactId>smart-hadoop-cdh-2.6</artifactId>
                     <version>1.5.0-SNAPSHOT</version>
                 </dependency>
+                <dependency>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-client-cdh-2.6</artifactId>
+                    <version>1.5.0-SNAPSHOT</version>
+                </dependency>
+            </dependencies>
+        </profile>
+        <profile>
+            <id>hadoop-3.1</id>
+            <dependencies>
+                <dependency>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-3.1</artifactId>
+                    <version>1.5.0-SNAPSHOT</version>
+                </dependency>
+                <dependency>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-client-3.1</artifactId>
+                    <version>1.5.0-SNAPSHOT</version>
+                </dependency>
             </dependencies>
         </profile>
         <profile>

diff --git a/smart-engine/pom.xml b/smart-engine/pom.xml
@@ -46,11 +46,12 @@
             <groupId>org.smartdata</groupId>
             <artifactId>smart-hadoop</artifactId>
             <version>1.5.0-SNAPSHOT</version>
-        </dependency>
-        <dependency>
-            <groupId>org.smartdata</groupId>
-            <artifactId>smart-hadoop-client</artifactId>
-            <version>1.5.0-SNAPSHOT</version>
+            <exclusions>
+                <exclusion>
+                    <groupId>commons-configuration</groupId>
+                    <artifactId>commons-configuration</artifactId>
+                </exclusion>
+            </exclusions>
         </dependency>
         <!--
         <dependency>
@@ -144,6 +145,11 @@
             <artifactId>junit</artifactId>
             <scope>test</scope>
         </dependency>
+        <dependency>
+            <groupId>org.slf4j</groupId>
+            <artifactId>slf4j-api</artifactId>
+            <version>1.7.25</version>
+        </dependency>
     </dependencies>
 
     <profiles>
@@ -159,6 +165,11 @@
                     <artifactId>smart-hadoop-2.7</artifactId>
                     <version>1.5.0-SNAPSHOT</version>
                 </dependency>
+                <dependency>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-client-2.7</artifactId>
+                    <version>1.5.0-SNAPSHOT</version>
+                </dependency>
             </dependencies>
         </profile>
         <profile>
@@ -169,6 +180,26 @@
                     <artifactId>smart-hadoop-cdh-2.6</artifactId>
                     <version>1.5.0-SNAPSHOT</version>
                 </dependency>
+                <dependency>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-client-cdh-2.6</artifactId>
+                    <version>1.5.0-SNAPSHOT</version>
+                </dependency>
+            </dependencies>
+        </profile>
+        <profile>
+            <id>hadoop-3.1</id>
+            <dependencies>
+                <dependency>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-3.1</artifactId>
+                    <version>1.5.0-SNAPSHOT</version>
+                </dependency>
+                <dependency>
+                    <groupId>org.smartdata</groupId>
+                    <artifactId>smart-hadoop-client-3.1</artifactId>
+                    <version>1.5.0-SNAPSHOT</version>
+                </dependency>
             </dependencies>
         </profile>
         <profile>