Skip to content

Commit

Permalink
Merge pull request #471 from sufism/cassandra
Browse files Browse the repository at this point in the history
cassandra plugins
  • Loading branch information
TrafalgarLuo authored Oct 11, 2019
2 parents 57c8dd8 + 2f8cc74 commit 1037110
Show file tree
Hide file tree
Showing 33 changed files with 2,321 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ DataX目前已经有了比较全面的插件体系,主流的RDBMS数据库、N
| | Phoenix5.x |||[](https://github.com/alibaba/DataX/blob/master/hbase20xsqlreader/doc/hbase20xsqlreader.md)[](https://github.com/alibaba/DataX/blob/master/hbase20xsqlwriter/doc/hbase20xsqlwriter.md)|
| | MongoDB |||[](https://github.com/alibaba/DataX/blob/master/mongoreader/doc/mongoreader.md)[](https://github.com/alibaba/DataX/blob/master/mongowriter/doc/mongowriter.md)|
| | Hive |||[](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md)[](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)|
| | Cassandra |||[](https://github.com/alibaba/DataX/blob/master/cassandrareader/doc/cassandrareader.md)[](https://github.com/alibaba/DataX/blob/master/cassandrawriter/doc/cassandrawriter.md)|
| 无结构化数据存储 | TxtFile |||[](https://github.com/alibaba/DataX/blob/master/txtfilereader/doc/txtfilereader.md)[](https://github.com/alibaba/DataX/blob/master/txtfilewriter/doc/txtfilewriter.md)|
| | FTP |||[](https://github.com/alibaba/DataX/blob/master/ftpreader/doc/ftpreader.md)[](https://github.com/alibaba/DataX/blob/master/ftpwriter/doc/ftpwriter.md)|
| | HDFS |||[](https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md)[](https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md)|
Expand Down
217 changes: 217 additions & 0 deletions cassandrareader/doc/cassandrareader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@

# CassandraReader 插件文档


___



## 1 快速介绍

CassandraReader插件实现了从Cassandra读取数据。在底层实现上,CassandraReader通过datastax的java driver连接Cassandra实例,并执行相应的cql语句将数据从cassandra中SELECT出来。


## 2 实现原理

简而言之,CassandraReader通过java driver连接到Cassandra实例,并根据用户配置的信息生成查询SELECT CQL语句,然后发送到Cassandra,并将该CQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。

对于用户配置Table、Column的信息,CassandraReader将其拼接为CQL语句发送到Cassandra。


## 3 功能说明

### 3.1 配置样例

* 配置一个从Cassandra同步抽取数据到本地的作业:

```
{
"job": {
"setting": {
"speed": {
"channel": 3
}
},
"content": [
{
"reader": {
"name": "cassandrareader",
"parameter": {
"host": "localhost",
"port": 9042,
"useSSL": false,
"keyspace": "test",
"table": "datax_src",
"column": [
"textCol",
"blobCol",
"writetime(blobCol)",
"boolCol",
"smallintCol",
"tinyintCol",
"intCol",
"bigintCol",
"varintCol",
"floatCol",
"doubleCol",
"decimalCol",
"dateCol",
"timeCol",
"timeStampCol",
"uuidCol",
"inetCol",
"durationCol",
"listCol",
"mapCol",
"setCol"
"tupleCol"
"udtCol",
]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print":true
}
}
}
]
}
}
```


### 3.2 参数说明

* **host**

* 描述:Cassandra连接点的域名或ip,多个node之间用逗号分隔。 <br />

* 必选:是 <br />

* 默认值:无 <br />

* **port**

* 描述:Cassandra端口。 <br />

* 必选:是 <br />

* 默认值:9042 <br />

* **username**

* 描述:数据源的用户名 <br />

* 必选:否 <br />

* 默认值:无 <br />

* **password**

* 描述:数据源指定用户名的密码 <br />

* 必选:否 <br />

* 默认值:无 <br />

* **useSSL**

* 描述:是否使用SSL连接。<br />

* 必选:否 <br />

* 默认值:false <br />

* **keyspace**

* 描述:需要同步的表所在的keyspace。<br />

* 必选:是 <br />

* 默认值:无 <br />

* **table**

* 描述:所选取的需要同步的表。<br />

* 必选:是 <br />

* 默认值:无 <br />

* **column**

* 描述:所配置的表中需要同步的列集合。<br />
其中的元素可以指定列的名称或writetime(column_name),后一种形式会读取column_name列的时间戳而不是数据。

* 必选:是 <br />

* 默认值:无 <br />


* **where**

* 描述:数据筛选条件的cql表达式,例如:<br />
```
"where":"textcol='a'"
```

* 必选:否 <br />

* 默认值:无 <br />

* **allowFiltering**

* 描述:是否在服务端过滤数据。参考cassandra文档中ALLOW FILTERING关键字的相关描述。<br />

* 必选:否 <br />

* 默认值:无 <br />

* **consistancyLevel**

* 描述:数据一致性级别。可选ONE|QUORUM|LOCAL_QUORUM|EACH_QUORUM|ALL|ANY|TWO|THREE|LOCAL_ONE<br />

* 必选:否 <br />

* 默认值:LOCAL_QUORUM <br />


### 3.3 类型转换

目前CassandraReader支持除counter和Custom类型之外的所有类型。

下面列出CassandraReader针对Cassandra类型转换列表:


| DataX 内部类型| Cassandra 数据类型 |
| -------- | ----- |
| Long |int, tinyint, smallint,varint,bigint,time|
| Double |float, double, decimal|
| String |ascii,varchar, text,uuid,timeuuid,duration,list,map,set,tuple,udt,inet |
| Date |date, timestamp |
| Boolean |bool |
| Bytes |blob |



请注意:

* 目前不支持counter类型和custom类型。

## 4 性能报告


## 5 约束限制

### 5.1 主备同步数据恢复问题


## 6 FAQ



133 changes: 133 additions & 0 deletions cassandrareader/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-all</artifactId>
<version>0.0.1-SNAPSHOT</version>
</parent>
<artifactId>cassandrareader</artifactId>
<name>cassandrareader</name>
<packaging>jar</packaging>

<dependencies>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-common</artifactId>
<version>${datax-project-version}</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</dependency>

<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.7.2</version>
<classifier>shaded</classifier>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>16.0.1</version>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.9</version>
</dependency>

<!-- for test -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-core</artifactId>
<version>${datax-project-version}</version>
<exclusions>
<exclusion>
<groupId>com.alibaba.datax</groupId>
<artifactId>datax-service-face</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
</exclusion>
<exclusion>
<groupId>javolution</groupId>
<artifactId>javolution</artifactId>
</exclusion>
</exclusions>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>1.9.5</version>
<scope>test</scope>
</dependency>

</dependencies>

<build>
<plugins>
<!-- compiler plugin -->
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>${project-sourceEncoding}</encoding>
</configuration>
</plugin>
<!-- assembly plugin -->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptors>
<descriptor>src/main/assembly/package.xml</descriptor>
</descriptors>
<finalName>datax</finalName>
</configuration>
<executions>
<execution>
<id>dwzip</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
35 changes: 35 additions & 0 deletions cassandrareader/src/main/assembly/package.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<assembly
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
<id></id>
<formats>
<format>dir</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<fileSet>
<directory>src/main/resources</directory>
<includes>
<include>plugin.json</include>
<include>plugin_job_template.json</include>
</includes>
<outputDirectory>plugin/reader/cassandrareader</outputDirectory>
</fileSet>
<fileSet>
<directory>target/</directory>
<includes>
<include>cassandrareader-0.0.1-SNAPSHOT.jar</include>
</includes>
<outputDirectory>plugin/reader/cassandrareader</outputDirectory>
</fileSet>
</fileSets>

<dependencySets>
<dependencySet>
<useProjectArtifact>false</useProjectArtifact>
<outputDirectory>plugin/reader/cassandrareader/libs</outputDirectory>
<scope>runtime</scope>
</dependencySet>
</dependencySets>
</assembly>
Loading

0 comments on commit 1037110

Please sign in to comment.