Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-31551] Add support for CrateDB #29

Merged
merged 23 commits into from
Jun 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
18c2639
[FLINK-31551] Add support for CrateDB
matriv Feb 22, 2023
8efe18d
Add test for CrateDB dialect
matriv Mar 27, 2023
9ce39fd
Use MiniClusterExtension and remove from archunit violations
matriv Mar 29, 2023
5c25e78
Address comments
matriv Mar 29, 2023
455c743
Fix choice of catalog
matriv Mar 29, 2023
f75f74c
Revert "Use MiniClusterExtension and remove from archunit violations"
matriv Mar 29, 2023
a5b3953
User MiniCluster only for CrateDB
matriv Mar 29, 2023
6ade070
Address comments #2
matriv Mar 29, 2023
d7cc61c
Address comments #3
matriv Mar 29, 2023
36d1140
Address comments #4
matriv Mar 29, 2023
234939d
Fixup - just remove methods from abstract class
matriv Mar 29, 2023
0deb433
apply spotless
matriv Mar 29, 2023
8f02209
Use vanilla PostgreSQL driver isntead of custom CrateDB one
matriv Apr 4, 2023
9bfd55b
Revert "Use vanilla PostgreSQL driver isntead of custom CrateDB one"
matriv Apr 4, 2023
28a5f06
fixed timestamp related tests
matriv Apr 5, 2023
50c2a3d
Simplify test container
matriv Apr 7, 2023
c3d1068
Fix imports, upgrade CrateDB and its JDBC driver
matriv May 15, 2023
66d65fd
Merge remote-tracking branch 'upstream/main' into mt/add-cratedb
matriv May 24, 2023
e44173b
revert uneeded change
matriv May 24, 2023
f2dd226
exclude jna as transitive dep of crate-jdbc
matriv May 24, 2023
5fcd588
Merge remote-tracking branch 'upstream/main' into mt/add-cratedb
matriv May 26, 2023
8499707
fixup after merging with main
matriv May 26, 2023
217ba92
Move CrateDB below postgres
matriv May 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 41 additions & 14 deletions docs/content.zh/docs/connectors/table/jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,14 @@ JDBC 连接器不是二进制发行版的一部分,请查阅[这里]({{< ref "

在连接到具体数据库时,也需要对应的驱动依赖,目前支持的驱动如下:

| Driver | Group Id | Artifact Id | JAR |
| :-----------| :------------------| :----------------------| :----------------|
| MySQL | `mysql` | `mysql-connector-java` | [下载](https://repo.maven.apache.org/maven2/mysql/mysql-connector-java/) |
| Oracle | `com.oracle.database.jdbc` | `ojdbc8` | [下载](https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8)
| PostgreSQL | `org.postgresql` | `postgresql` | [下载](https://jdbc.postgresql.org/download/) |
| Derby | `org.apache.derby` | `derby` | [下载](http://db.apache.org/derby/derby_downloads.html) | |
| SQL Server | `com.microsoft.sqlserver` | `mssql-jdbc` | [下载](https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver16) |
| Driver | Group Id | Artifact Id | JAR |
|:-----------|:---------------------------|:-----------------------| :----------------|
| MySQL | `mysql` | `mysql-connector-java` | [下载](https://repo.maven.apache.org/maven2/mysql/mysql-connector-java/) |
| Oracle | `com.oracle.database.jdbc` | `ojdbc8` | [下载](https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8)
| PostgreSQL | `org.postgresql` | `postgresql` | [下载](https://jdbc.postgresql.org/download/) |
| Derby | `org.apache.derby` | `derby` | [下载](http://db.apache.org/derby/derby_downloads.html) | |
| SQL Server | `com.microsoft.sqlserver` | `mssql-jdbc` | [下载](https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver16) |
| CrateDB | `io.crate` | `crate-jdbc` | [Download](https://repo1.maven.org/maven2/io/crate/crate-jdbc/) |

当前,JDBC 连接器和驱动不在 Flink 二进制发布包中,请参阅[这里]({{< ref "docs/dev/configuration/overview" >}})了解在集群上执行时何连接它们。

Expand Down Expand Up @@ -609,14 +610,15 @@ SELECT * FROM given_database.test_table2;

数据类型映射
----------------
Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、Oracle、PostgreSQL、Derby 等。其中,Derby 通常是用于测试目的。下表列出了从关系数据库数据类型到 Flink SQL 数据类型的类型映射,映射表可以使得在 Flink 中定义 JDBC 表更加简单。
Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、Oracle、PostgreSQL、CrateDB, Derby 等。其中,Derby 通常是用于测试目的。下表列出了从关系数据库数据类型到 Flink SQL 数据类型的类型映射,映射表可以使得在 Flink 中定义 JDBC 表更加简单。

<table class="table table-bordered">
<thead>
<tr>
<th class="text-left"><a href="https://dev.mysql.com/doc/refman/8.0/en/data-types.html">MySQL type</a></th>
<th class="text-left"><a href="https://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF30020">Oracle type</a></th>
<th class="text-left"><a href="https://www.postgresql.org/docs/12/datatype.html">PostgreSQL type</a></th>
<th class="text-left"><a href="https://crate.io/docs/crate/reference/en/master/general/ddl/data-types.html">CrateDB type</a></th>
<th class="text-left"><a href="https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-ver16">SQL Server type</a></th>
<th class="text-left"><a href="{{< ref "docs/dev/table/types" >}}">Flink SQL type</a></th>
</tr>
Expand All @@ -626,6 +628,7 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<td><code>TINYINT</code></td>
<td></td>
<td></td>
<td></td>
<td><code>TINYINT</code></td>
<td><code>TINYINT</code></td>
</tr>
Expand All @@ -639,6 +642,9 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<code>INT2</code><br>
<code>SMALLSERIAL</code><br>
<code>SERIAL2</code></td>
<td>
<code>SMALLINT</code>
<code>SHORT</code></td>
<td><code>SMALLINT</code></td>
<td><code>SMALLINT</code></td>
</tr>
Expand All @@ -651,6 +657,9 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<td>
<code>INTEGER</code><br>
<code>SERIAL</code></td>
<td>
<code>INTEGER</code><br>
<code>INT</code></td>
<td><code>INT</code></td>
<td><code>INT</code></td>
</tr>
Expand All @@ -662,6 +671,9 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<td>
<code>BIGINT</code><br>
<code>BIGSERIAL</code></td>
<td>
<code>BIGINT</code><br>
<code>LONG</code></td>
<td><code>BIGINT</code></td>
<td><code>BIGINT</code></td>
</tr>
Expand All @@ -670,21 +682,19 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<td></td>
<td></td>
<td></td>
<td></td>
<td><code>DECIMAL(20, 0)</code></td>
</tr>
<tr>
matriv marked this conversation as resolved.
Show resolved Hide resolved
<td><code>BIGINT</code></td>
<td></td>
<td><code>BIGINT</code></td>
<td><code>BIGINT</code></td>
</tr>
<tr>
<td><code>FLOAT</code></td>
<td>
<code>BINARY_FLOAT</code></td>
<td>
<code>REAL</code><br>
<code>FLOAT4</code></td>
<td>
<code>REAL</code><br>
<code>FLOAT</code></td>
<td><code>REAL</code></td>
<td><code>FLOAT</code></td>
</tr>
Expand All @@ -696,6 +706,9 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<td>
<code>FLOAT8</code><br>
<code>DOUBLE PRECISION</code></td>
<td>
<code>DOUBLE</code><br>
<code>DOUBLE PRECISION</code></td>
<td><code>FLOAT</code></td>
<td><code>DOUBLE</code></td>
</tr>
Expand All @@ -712,6 +725,7 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<td>
<code>NUMERIC(p, s)</code><br>
<code>DECIMAL(p, s)</code></td>
<td><code>NUMERIC(p, s)</code></td>
<td><code>DECIMAL(p, s)</code></td>
<td><code>DECIMAL(p, s)</code></td>
</tr>
Expand All @@ -721,27 +735,31 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<code>TINYINT(1)</code></td>
<td></td>
<td><code>BOOLEAN</code></td>
<td><code>BOOLEAN</code></td>
<td><code>BIT</code></td>
<td><code>BOOLEAN</code></td>
</tr>
<tr>
<td><code>DATE</code></td>
<td><code>DATE</code></td>
<td><code>DATE</code></td>
<td><code>DATE</code> (only in expressions - not stored type)</td>
<td><code>DATE</code></td>
<td><code>DATE</code></td>
</tr>
<tr>
<td><code>TIME [(p)]</code></td>
<td><code>DATE</code></td>
<td><code>TIME [(p)] [WITHOUT TIMEZONE]</code></td>
<td><code>TIME</code> (only in expressions - not stored type)</td>
<td><code>TIME(0)</code></td>
<td><code>TIME [(p)] [WITHOUT TIMEZONE]</code></td>
</tr>
<tr>
<td><code>DATETIME [(p)]</code></td>
<td><code>TIMESTAMP [(p)] [WITHOUT TIMEZONE]</code></td>
<td><code>TIMESTAMP [(p)] [WITHOUT TIMEZONE]</code></td>
<td><code>TIMESTAMP [(p)] [WITHOUT TIMEZONE]</code></td>
<td>
<code>DATETIME</code>
<code>DATETIME2</code>
Expand All @@ -763,6 +781,13 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<code>VARCHAR(n)</code><br>
<code>CHARACTER VARYING(n)</code><br>
<code>TEXT</code></td>
<td>
<code>CHAR(n)</code><br>
<code>CHARACTER(n)</code><br>
<code>VARCHAR(n)</code><br>
<code>CHARACTER VARYING(n)</code><br>
<code>TEXT</code>
<code>STRING</code></td>
<td>
<code>CHAR(n)</code><br>
<code>NCHAR(n)</code><br>
Expand All @@ -781,6 +806,7 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<code>RAW(s)</code><br>
<code>BLOB</code></td>
<td><code>BYTEA</code></td>
<td></td>
<td>
<code>BINARY(n)</code><br>
<code>VARBINARY(n)</code><br></td>
Expand All @@ -790,6 +816,7 @@ Flink 支持连接到多个使用方言(dialect)的数据库,如 MySQL、O
<td></td>
<td></td>
<td><code>ARRAY</code></td>
<td><code>ARRAY</code></td>
<td></td>
<td><code>ARRAY</code></td>
</tr>
Expand Down
81 changes: 73 additions & 8 deletions docs/content/docs/connectors/table/jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,14 @@ See how to link with it for cluster execution [here]({{< ref "docs/dev/configura

A driver dependency is also required to connect to a specified database. Here are drivers currently supported:

| Driver | Group Id | Artifact Id | JAR |
|:-----------| :------------------| :----------------------| :----------------|
| MySQL | `mysql` | `mysql-connector-java` | [Download](https://repo.maven.apache.org/maven2/mysql/mysql-connector-java/) |
| Oracle | `com.oracle.database.jdbc` | `ojdbc8` | [Download](https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8) |
| PostgreSQL | `org.postgresql` | `postgresql` | [Download](https://jdbc.postgresql.org/download/) |
| Derby | `org.apache.derby` | `derby` | [Download](http://db.apache.org/derby/derby_downloads.html) |
| SQL Server | `com.microsoft.sqlserver` | `mssql-jdbc` | [Download](https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver16) |
| Driver | Group Id | Artifact Id | JAR |
|:-----------|:---------------------------|:-----------------------| :----------------|
| MySQL | `mysql` | `mysql-connector-java` | [Download](https://repo.maven.apache.org/maven2/mysql/mysql-connector-java/) |
| Oracle | `com.oracle.database.jdbc` | `ojdbc8` | [Download](https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8) |
| PostgreSQL | `org.postgresql` | `postgresql` | [Download](https://jdbc.postgresql.org/download/) |
| Derby | `org.apache.derby` | `derby` | [Download](http://db.apache.org/derby/derby_downloads.html) |
| SQL Server | `com.microsoft.sqlserver` | `mssql-jdbc` | [Download](https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver16) |
| CrateDB | `io.crate` | `crate-jdbc` | [Download](https://repo1.maven.org/maven2/io/crate/crate-jdbc/) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also help to update the corresponding section in 'zh' doc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a section in the "JDBC Catalog".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but this part requires translation to chinese, could you please take care of this?



JDBC connector and drivers are not part of Flink's binary distribution. See how to link with them for cluster execution [here]({{< ref "docs/dev/configuration/overview" >}}).
Expand Down Expand Up @@ -602,17 +603,50 @@ SELECT * FROM test_table;
SELECT * FROM mysql_catalog.given_database.test_table2;
SELECT * FROM given_database.test_table2;
```
### JDBC Catalog for CrateDB

#### CrateDB Metaspace Mapping

CrateDB is similar to PostgreSQL, but it has only on database which defaults to `crate`. It has an additional namespace as `schema`, a CrateDB instance can have multiple schemas with a default one named "doc", each schema can have multiple tables.
In Flink, when querying tables registered by CrateDB catalog, users can use either `schema_name.table_name` or just `table_name`. The `schema_name` is optional and defaults to `doc`.

Therefore, the metaspace mapping between Flink Catalog and CrateDB is as following:

| Flink Catalog Metaspace Structure | CrateDB Metaspace Structure |
| :------------------------------------|:-------------------------------|
| catalog name (defined in Flink only) | N/A |
| database name | database name (always `crate`) |
| table name | [schema_name.]table_name |

The full path of CrateDB table in Flink should be ``"<catalog>.<db>.`<schema.table>`"`` if schema is specified, note the `<schema.table>` should be escaped.

Here are some examples to access CrateDB tables:

```sql
-- scan table 'test_table' of 'doc' schema (i.e. the default schema), the schema name can be omitted
SELECT * FROM mycatalog.crate.doc.test_table;
SELECT * FROM crate.doc.test_table;
SELECT * FROM doc.test_table;
SELECT * FROM test_table;

-- scan table 'test_table2' of 'custom_schema' schema,
-- the custom schema can not be omitted and must be escaped with table.
SELECT * FROM mycatalog.crate.`custom_schema.test_table2`
SELECT * FROM crate.`custom_schema.test_table2`;
SELECT * FROM `custom_schema.test_table2`;
```

Data Type Mapping
----------------
Flink supports connect to several databases which uses dialect like MySQL, Oracle, PostgreSQL, Derby. The Derby dialect usually used for testing purpose. The field data type mappings from relational databases data types to Flink SQL data types are listed in the following table, the mapping table can help define JDBC table in Flink easily.
Flink supports connect to several databases which uses dialect like MySQL, Oracle, PostgreSQL, CrateDB, Derby. The Derby dialect usually used for testing purpose. The field data type mappings from relational databases data types to Flink SQL data types are listed in the following table, the mapping table can help define JDBC table in Flink easily.

<table class="table table-bordered">
<thead>
<tr>
<th class="text-left"><a href="https://dev.mysql.com/doc/refman/8.0/en/data-types.html">MySQL type</a></th>
<th class="text-left"><a href="https://docs.oracle.com/database/121/SQLRF/sql_elements001.htm#SQLRF30020">Oracle type</a></th>
<th class="text-left"><a href="https://www.postgresql.org/docs/12/datatype.html">PostgreSQL type</a></th>
<th class="text-left"><a href="https://crate.io/docs/crate/reference/en/master/general/ddl/data-types.html">CrateDB type</a></th>
<th class="text-left"><a href="https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-ver16">SQL Server type</a></th>
<th class="text-left"><a href="{{< ref "docs/dev/table/types" >}}">Flink SQL type</a></th>
</tr>
Expand All @@ -622,6 +656,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td><code>TINYINT</code></td>
<td></td>
<td></td>
<td></td>
<td><code>TINYINT</code></td>
<td><code>TINYINT</code></td>
</tr>
Expand All @@ -635,6 +670,9 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<code>INT2</code><br>
<code>SMALLSERIAL</code><br>
<code>SERIAL2</code></td>
<td>
<code>SMALLINT</code>
<code>SHORT</code></td>
<td><code>SMALLINT</code></td>
<td><code>SMALLINT</code></td>
</tr>
Expand All @@ -647,6 +685,9 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td>
<code>INTEGER</code><br>
<code>SERIAL</code></td>
<td>
<code>INTEGER</code><br>
<code>INT</code></td>
<td><code>INT</code></td>
<td><code>INT</code></td>
</tr>
Expand All @@ -658,6 +699,9 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td>
<code>BIGINT</code><br>
<code>BIGSERIAL</code></td>
<td>
<code>BIGINT</code><br>
<code>LONG</code></td>
<td><code>BIGINT</code></td>
<td><code>BIGINT</code></td>
</tr>
Expand All @@ -666,6 +710,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td></td>
<td></td>
<td></td>
<td></td>
<td><code>DECIMAL(20, 0)</code></td>
</tr>
<tr>
Expand All @@ -675,6 +720,9 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td>
<code>REAL</code><br>
<code>FLOAT4</code></td>
<td>
<code>REAL</code><br>
<code>FLOAT</code></td>
<td><code>REAL</code></td>
<td><code>FLOAT</code></td>
</tr>
Expand All @@ -686,6 +734,9 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td>
<code>FLOAT8</code><br>
<code>DOUBLE PRECISION</code></td>
<td>
<code>DOUBLE</code><br>
<code>DOUBLE PRECISION</code></td>
<td><code>FLOAT</code></td>
<td><code>DOUBLE</code></td>
</tr>
Expand All @@ -702,6 +753,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td>
<code>NUMERIC(p, s)</code><br>
<code>DECIMAL(p, s)</code></td>
<td><code>NUMERIC(p, s)</code></td>
<td><code>DECIMAL(p, s)</code></td>
<td><code>DECIMAL(p, s)</code></td>
</tr>
Expand All @@ -711,27 +763,31 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<code>TINYINT(1)</code></td>
<td></td>
<td><code>BOOLEAN</code></td>
<td><code>BOOLEAN</code></td>
<td><code>BIT</code></td>
<td><code>BOOLEAN</code></td>
</tr>
<tr>
<td><code>DATE</code></td>
<td><code>DATE</code></td>
<td><code>DATE</code></td>
<td><code>DATE</code> (only in expressions - not stored type)</td>
<td><code>DATE</code></td>
<td><code>DATE</code></td>
</tr>
<tr>
<td><code>TIME [(p)]</code></td>
<td><code>DATE</code></td>
<td><code>TIME [(p)] [WITHOUT TIMEZONE]</code></td>
<td><code>TIME</code> (only in expressions - not stored type)</td>
<td><code>TIME(0)</code></td>
<td><code>TIME [(p)] [WITHOUT TIMEZONE]</code></td>
</tr>
<tr>
<td><code>DATETIME [(p)]</code></td>
<td><code>TIMESTAMP [(p)] [WITHOUT TIMEZONE]</code></td>
<td><code>TIMESTAMP [(p)] [WITHOUT TIMEZONE]</code></td>
<td><code>TIMESTAMP [(p)] [WITHOUT TIMEZONE]</code></td>
<td>
<code>DATETIME</code>
<code>DATETIME2</code>
Expand All @@ -753,6 +809,13 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<code>VARCHAR(n)</code><br>
<code>CHARACTER VARYING(n)</code><br>
<code>TEXT</code></td>
<td>
<code>CHAR(n)</code><br>
<code>CHARACTER(n)</code><br>
<code>VARCHAR(n)</code><br>
<code>CHARACTER VARYING(n)</code><br>
<code>TEXT</code>
<code>STRING</code></td>
<td>
<code>CHAR(n)</code><br>
<code>NCHAR(n)</code><br>
Expand All @@ -771,6 +834,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<code>RAW(s)</code><br>
<code>BLOB</code></td>
<td><code>BYTEA</code></td>
<td></td>
<td>
<code>BINARY(n)</code><br>
<code>VARBINARY(n)</code><br></td>
Expand All @@ -780,6 +844,7 @@ Flink supports connect to several databases which uses dialect like MySQL, Oracl
<td></td>
<td></td>
<td><code>ARRAY</code></td>
<td><code>ARRAY</code></td>
<td></td>
<td><code>ARRAY</code></td>
</tr>
Expand Down
Loading