Skip to content

Commit

Permalink
[SPARK-30878][SQL][DOC] Improve the CREATE TABLE document
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Improve the CREATE TABLE document:
1. mention that some clauses can come in as any order.
2. refine the description for some parameters.
3. mention how data source table interacts with data source
4. make the examples consistent between data source and hive serde tables.

### Why are the changes needed?

improve doc

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

N/A

Closes apache#27638 from cloud-fan/doc.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
  • Loading branch information
cloud-fan authored and Seongjin Cho committed Apr 14, 2020
1 parent a5dbd11 commit ed1d7e7
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 28 deletions.
49 changes: 38 additions & 11 deletions docs/sql-ref-syntax-ddl-create-table-datasource.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ The `CREATE TABLE` statement defines a new table using a Data Source.
{% highlight sql %}
CREATE TABLE [ IF NOT EXISTS ] table_identifier
[ ( col_name1 col_type1 [ COMMENT col_comment1 ], ... ) ]
USING data_source
[USING data_source]
[ OPTIONS ( key1=val1, key2=val2, ... ) ]
[ PARTITIONED BY ( col_name1, col_name2, ... ) ]
[ CLUSTERED BY ( col_name3, col_name4, ... )
Expand All @@ -39,6 +39,9 @@ CREATE TABLE [ IF NOT EXISTS ] table_identifier
[ AS select_statement ]
{% endhighlight %}

Note that, the clauses between the USING clause and the AS SELECT clause can come in
as any order. For example, you can write COMMENT table_comment after TBLPROPERTIES.

### Parameters

<dl>
Expand Down Expand Up @@ -78,32 +81,56 @@ CREATE TABLE [ IF NOT EXISTS ] table_identifier

<dl>
<dt><code><em>COMMENT</em></code></dt>
<dd>Table comments are added.</dd>
<dd>A string literal to describe the table.</dd>
</dl>

<dl>
<dt><code><em>TBLPROPERTIES</em></code></dt>
<dd>Table properties that have to be set are specified, such as `created.by.user`, `owner`, etc.
</dd>
<dd>A list of key-value pairs that is used to tag the table definition.</dd>
</dl>

<dl>
<dt><code><em>AS select_statement</em></code></dt>
<dd>The table is populated using the data from the select statement.</dd>
</dl>

### Data Source Interaction
A Data Source table acts like a pointer to the underlying data source. For example, you can create
a table "foo" in Spark which points to a table "bar" in MySQL using JDBC Data Source. When you
read/write table "foo", you actually read/write table "bar".

In general CREATE TABLE is creating a "pointer", and you need to make sure it points to something
existing. An exception is file source such as parquet, json. If you don't specify the LOCATION,
Spark will create a default table location for you.

For CREATE TABLE AS SELECT, Spark will overwrite the underlying data source with the data of the
input query, to make sure the table gets created contains exactly the same data as the input query.

### Examples
{% highlight sql %}

--Using data source
CREATE TABLE Student (Id INT,name STRING ,age INT) USING CSV;
--Use data source
CREATE TABLE student (id INT, name STRING, age INT) USING CSV;

--Use data from another table
CREATE TABLE student_copy USING CSV
AS SELECT * FROM student;

--Omit the USING clause, which uses the default data source (parquet by default)
CREATE TABLE student (id INT, name STRING, age INT);

--Specify table comment and properties
CREATE TABLE student (id INT, name STRING, age INT) USING CSV
COMMENT 'this is a comment'
TBLPROPERTIES ('foo'='bar');

--Using data from another table
CREATE TABLE StudentInfo
AS SELECT * FROM Student;
--Specify table comment and properties with different clauses order
CREATE TABLE student (id INT, name STRING, age INT) USING CSV
TBLPROPERTIES ('foo'='bar')
COMMENT 'this is a comment';

--Partitioned and bucketed
CREATE TABLE Student (Id INT,name STRING ,age INT)
--Create partitioned and bucketed table
CREATE TABLE student (id INT, name STRING, age INT)
USING CSV
PARTITIONED BY (age)
CLUSTERED BY (Id) INTO 4 buckets;
Expand Down
47 changes: 32 additions & 15 deletions docs/sql-ref-syntax-ddl-create-table-hiveformat.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier

{% endhighlight %}

Note that, the clauses between the columns definition clause and the AS SELECT clause can come in
as any order. For example, you can write COMMENT table_comment after TBLPROPERTIES.

### Parameters

<dl>
Expand Down Expand Up @@ -77,14 +80,12 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier

<dl>
<dt><code><em>COMMENT</em></code></dt>
<dd>Table comments are added.</dd>
<dd>A string literal to describe the table.</dd>
</dl>

<dl>
<dt><code><em>TBLPROPERTIES</em></code></dt>
<dd>
Table properties that have to be set are specified, such as `created.by.user`, `owner`, etc.
</dd>
<dd>A list of key-value pairs that is used to tag the table definition.</dd>
</dl>

<dl>
Expand All @@ -96,21 +97,37 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier
### Examples
{% highlight sql %}

--Using Comment and loading data from another table into the created table
CREATE TABLE StudentInfo
COMMENT 'Table is created using existing data'
AS SELECT * FROM Student;
--Use hive format
CREATE TABLE student (id INT, name STRING, age INT) STORED AS ORC;

--Use data from another table
CREATE TABLE student_copy STORED AS ORC
AS SELECT * FROM student;

--Specify table comment and properties
CREATE TABLE student (id INT, name STRING, age INT)
COMMENT 'this is a comment'
STORED AS ORC
TBLPROPERTIES ('foo'='bar');

--Specify table comment and properties with different clauses order
CREATE TABLE student (id INT, name STRING, age INT)
STORED AS ORC
TBLPROPERTIES ('foo'='bar')
COMMENT 'this is a comment';

--Partitioned table
CREATE TABLE Student (Id INT,name STRING)
--Create partitioned table
CREATE TABLE student (id INT, name STRING)
PARTITIONED BY (age INT)
TBLPROPERTIES ('owner'='xxxx');
STORED AS ORC;

CREATE TABLE Student (Id INT,name STRING,age INT)
PARTITIONED BY (name,age);
--Create partitioned table with different clauses order
CREATE TABLE student (id INT, name STRING)
STORED AS ORC
PARTITIONED BY (age INT);

--Using Row Format and file format
CREATE TABLE Student (Id INT,name STRING)
--Use Row Format and file format
CREATE TABLE student (id INT,name STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Expand Down
4 changes: 2 additions & 2 deletions docs/sql-ref-syntax-ddl-create-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ license: |
---

### Description
`CREATE TABLE` statement is used to define a table in an exsisting database.
`CREATE TABLE` statement is used to define a table in an existing database.

The CREATE statements:
* [CREATE TABLE USING DATASOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
* [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
* [CREATE TABLE USING HIVE FORMAT](sql-ref-syntax-ddl-create-table-hiveformat.html)
* [CREATE TABLE LIKE](sql-ref-syntax-ddl-create-table-like.html)

Expand Down

0 comments on commit ed1d7e7

Please sign in to comment.