Skip to content

Commit

Permalink
[SPARK-39503][SQL] Add session catalog name for v1 database table and…
Browse files Browse the repository at this point in the history
… function

### What changes were proposed in this pull request?

- Add session catalog name in identifiers, then all identifiers will be 3 part name

### Why are the changes needed?

To make it more clearer that this table or function comes from which catalog. It affects:

- the scan table/permanent view of the query plan
- the target table of the data writing
- desc database
- desc table
- desc function

Note that, we do not support temporary view since it does not belong to any database and catalog

This a new appraoch of #36936 that:
- add catalog field in identifier, so identifier just print catalog if defined
- inject catalog at the beginning of identifier life

### Does this PR introduce _any_ user-facing change?

maybe yes, so add a new config `spark.sql.legacy.nonIdentifierOutputCatalogName` to restore the old behavior

### How was this patch tested?

change list:
```scala
docs/sql-migration-guide.md                                                                          |   1 +

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala              |  10 +++++++---
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala               |  28 ++++++++++++++++++---------
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala                    |   1 +
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala                          |  56 ++++++++++++++++++++++++++++++++++++++++++++---------
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala                       |   4 ++--
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala          |  13 +++++++++----
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala               |   8 ++++++--
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala                              |   9 +++++++++
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala                                   |   7 ++++---
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala                             |  14 ++++++++------
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala                       |   6 ++++--
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala    |   1 +
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala                        |   5 +++--
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala                              |   5 +++--

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala         |  5 +++--
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala          |  58 ++++++++++++++++++++++++++++++-------------------------
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/LookupCatalogSuite.scala          |   4 +++-
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala                                |   4 +++-
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala                            |  13 +++++++++----
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala                                     |   7 +++++--
sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala                     |   4 +++-
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala                            |  39 ++++++++++++++++++++++++------------
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala                        |   7 +++++--
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala                        |  36 +++++++++++++++++++++--------------
sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala             |  51 +++++++++++++++++++++++++++++----------------------
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeNamespaceSuite.scala       |  12 +++++++-----
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala           |   2 ++
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTblPropertiesSuite.scala       |   3 ++-
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala       |   1 +
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala                  |   3 ++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala                    |  25 +++++++++++++-----------
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala                              |   7 ++++---
sql/hive/src/test/scala/org/apache/spark/sql/hive/UDFSuite.scala                                     |   9 +++++----
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala                       |  27 ++++++++++++++++----------
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala                   |   3 ++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala                   |   4 +++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala                 |  13 +++++++------
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala                       |   3 ++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala                      |   8 +++++---
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/DescribeTableSuite.scala         |   2 ++
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/ShowFunctionsSuite

sql/core/src/test/resources/sql-tests/results/charvarchar.sql.out                                    |  20 ++++++++++++++++++-
sql/core/src/test/resources/sql-tests/results/describe-table-after-alter-table.sql.out               |   5 +++++
sql/core/src/test/resources/sql-tests/results/describe.sql.out                                       |  22 +++++++++++++--------
sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out                                    | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------
sql/core/src/test/resources/sql-tests/results/explain-cbo.sql.out                                    |   8 ++++----
sql/core/src/test/resources/sql-tests/results/explain.sql.out                                        | 100 +++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out                                   |   2 +-
sql/core/src/test/resources/sql-tests/results/postgreSQL/create_view.sql.out                         |  40 +++++++++++++++++++++++++-------------
sql/core/src/test/resources/sql-tests/results/postgreSQL/numeric.sql.out                             |   2 +-
sql/core/src/test/resources/sql-tests/results/show-tables.sql.out                                    |  10 ++++++----
sql/core/src/test/resources/sql-tests/results/show-tblproperties.sql.out                             |   2 +-
sql/core/src/test/resources/sql-tests/results/udaf.sql.out                                           |   4 ++--
sql/core/src/test/resources/sql-tests/results/udf/udf-udaf.sql.out                                   |   4 ++--
```

Closes #37021 from ulysses-you/output-catalog-2.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
  • Loading branch information
a0x8o committed Jul 1, 2022
1 parent 7fae655 commit f3ab1ea
Show file tree
Hide file tree
Showing 842 changed files with 11,476 additions and 7,046 deletions.
2 changes: 1 addition & 1 deletion R/pkg/tests/fulltests/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -4018,7 +4018,7 @@ test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", {
paste0("Error in setCurrentDatabase : analysis error - Database ",
"'zxwtyswklpf' does not exist"))
dbs <- collect(listDatabases())
expect_equal(names(dbs), c("name", "description", "locationUri"))
expect_equal(names(dbs), c("name", "catalog", "description", "locationUri"))
expect_equal(which(dbs[, 1] == "default"), 1)
})

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,9 @@ protected void initChannel(SocketChannel ch) {
channelFuture = bootstrap.bind(address);
channelFuture.syncUninterruptibly();

port = ((InetSocketAddress) channelFuture.channel().localAddress()).getPort();
logger.debug("Shuffle server started on port: {}", port);
InetSocketAddress localAddress = (InetSocketAddress) channelFuture.channel().localAddress();
port = localAddress.getPort();
logger.debug("Shuffle server started on {} with port {}", localAddress.getHostString(), port);
}

public MetricSet getAllMetrics() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,8 @@ private[sql] class AvroDeserializer(
case b: ByteBuffer =>
val bytes = new Array[Byte](b.remaining)
b.get(bytes)
// Do not forget to reset the position
b.rewind()
bytes
case b: Array[Byte] => b
case other =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -360,4 +360,25 @@ class AvroCatalystDataConversionSuite extends SparkFunSuite
None,
new OrderedFilters(Seq(Not(EqualTo("Age", 39))), sqlSchema))
}

test("AvroDeserializer with binary type") {
val jsonFormatSchema =
"""
|{
| "type": "record",
| "name": "record",
| "fields" : [
| {"name": "a", "type": "bytes"}
| ]
|}
""".stripMargin
val avroSchema = new Schema.Parser().parse(jsonFormatSchema)
val avroRecord = new GenericData.Record(avroSchema)
val bb = java.nio.ByteBuffer.wrap(Array[Byte](97, 48, 53))
avroRecord.put("a", bb)

val expected = InternalRow(Array[Byte](97, 48, 53))
checkDeserialization(avroSchema, avroRecord, Some(expected))
checkDeserialization(avroSchema, avroRecord, Some(expected))
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ abstract class KafkaSourceTest extends StreamTest with SharedSparkSession with K

val sources: Seq[SparkDataStream] = {
query.get.logicalPlan.collect {
case StreamingExecutionRelation(source: KafkaSource, _) => source
case StreamingExecutionRelation(source: KafkaSource, _, _) => source
case r: StreamingDataSourceV2Relation if r.stream.isInstanceOf[KafkaMicroBatchStream] ||
r.stream.isInstanceOf[KafkaContinuousStream] =>
r.stream
Expand Down Expand Up @@ -1392,7 +1392,7 @@ class KafkaMicroBatchV1SourceSuite extends KafkaMicroBatchSourceSuiteBase {
makeSureGetOffsetCalled,
AssertOnQuery { query =>
query.logicalPlan.collect {
case StreamingExecutionRelation(_: KafkaSource, _) => true
case StreamingExecutionRelation(_: KafkaSource, _, _) => true
}.nonEmpty
}
)
Expand Down
48 changes: 24 additions & 24 deletions core/benchmarks/MapStatusesSerDeserBenchmark-jdk11-results.txt
Original file line number Diff line number Diff line change
@@ -1,64 +1,64 @@
OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
200000 MapOutputs, 10 blocks w/ broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Serialization 151 165 8 1.3 754.7 1.0X
Deserialization 223 303 69 0.9 1112.6 0.7X
Serialization 191 202 8 1.0 954.7 1.0X
Deserialization 250 335 75 0.8 1249.4 0.8X

Compressed Serialized MapStatus sizes: 410 bytes
Compressed Serialized Broadcast MapStatus sizes: 2 MB


OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
200000 MapOutputs, 10 blocks w/o broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------
Serialization 132 137 6 1.5 661.0 1.0X
Deserialization 222 287 88 0.9 1111.8 0.6X
Serialization 161 171 8 1.2 806.3 1.0X
Deserialization 248 318 83 0.8 1238.5 0.7X

Compressed Serialized MapStatus sizes: 2 MB
Compressed Serialized Broadcast MapStatus sizes: 0 bytes


OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
200000 MapOutputs, 100 blocks w/ broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------
Serialization 299 314 12 0.7 1493.4 1.0X
Deserialization 248 315 77 0.8 1238.8 1.2X
Serialization 354 380 19 0.6 1767.9 1.0X
Deserialization 272 341 86 0.7 1361.9 1.3X

Compressed Serialized MapStatus sizes: 429 bytes
Compressed Serialized Broadcast MapStatus sizes: 13 MB


OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
200000 MapOutputs, 100 blocks w/o broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
Serialization 250 257 12 0.8 1251.7 1.0X
Deserialization 246 315 84 0.8 1230.8 1.0X
Serialization 307 321 14 0.7 1535.6 1.0X
Deserialization 305 457 255 0.7 1524.3 1.0X

Compressed Serialized MapStatus sizes: 13 MB
Compressed Serialized Broadcast MapStatus sizes: 0 bytes


OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
200000 MapOutputs, 1000 blocks w/ broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
Serialization 1238 1245 10 0.2 6190.5 1.0X
Deserialization 605 667 81 0.3 3027.1 2.0X
Serialization 1612 1673 86 0.1 8061.2 1.0X
Deserialization 752 806 62 0.3 3758.3 2.1X

Compressed Serialized MapStatus sizes: 555 bytes
Compressed Serialized Broadcast MapStatus sizes: 121 MB


OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
200000 MapOutputs, 1000 blocks w/o broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------
Serialization 1127 1138 17 0.2 5632.6 1.0X
Deserialization 621 680 53 0.3 3102.5 1.8X
Serialization 1416 1424 11 0.1 7081.1 1.0X
Deserialization 728 734 9 0.3 3639.1 1.9X

Compressed Serialized MapStatus sizes: 121 MB
Compressed Serialized Broadcast MapStatus sizes: 0 bytes
Expand Down
50 changes: 25 additions & 25 deletions core/benchmarks/MapStatusesSerDeserBenchmark-jdk17-results.txt
Original file line number Diff line number Diff line change
@@ -1,64 +1,64 @@
OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
200000 MapOutputs, 10 blocks w/ broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------
Serialization 148 155 6 1.3 741.4 1.0X
Deserialization 237 257 30 0.8 1184.1 0.6X
Serialization 131 139 6 1.5 657.2 1.0X
Deserialization 245 267 34 0.8 1223.6 0.5X

Compressed Serialized MapStatus sizes: 410 bytes
Compressed Serialized Broadcast MapStatus sizes: 2 MB


OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
200000 MapOutputs, 10 blocks w/o broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------
Serialization 132 136 3 1.5 660.7 1.0X
Deserialization 257 261 2 0.8 1287.3 0.5X
Serialization 126 129 2 1.6 628.8 1.0X
Deserialization 244 252 10 0.8 1222.0 0.5X

Compressed Serialized MapStatus sizes: 2 MB
Compressed Serialized Broadcast MapStatus sizes: 0 bytes


OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
200000 MapOutputs, 100 blocks w/ broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
--------------------------------------------------------------------------------------------------------------------------
Serialization 275 285 18 0.7 1376.4 1.0X
Deserialization 263 283 31 0.8 1316.8 1.0X
Serialization 259 270 11 0.8 1293.9 1.0X
Deserialization 272 302 32 0.7 1359.5 1.0X

Compressed Serialized MapStatus sizes: 429 bytes
Compressed Serialized Broadcast MapStatus sizes: 13 MB


OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
200000 MapOutputs, 100 blocks w/o broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
Serialization 256 259 3 0.8 1279.6 1.0X
Deserialization 259 269 16 0.8 1292.7 1.0X
Serialization 248 250 2 0.8 1238.5 1.0X
Deserialization 287 309 32 0.7 1436.1 0.9X

Compressed Serialized MapStatus sizes: 13 MB
Compressed Serialized Broadcast MapStatus sizes: 0 bytes


OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
200000 MapOutputs, 1000 blocks w/ broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
---------------------------------------------------------------------------------------------------------------------------
Serialization 1144 1145 2 0.2 5718.2 1.0X
Deserialization 483 492 14 0.4 2416.8 2.4X
Serialization 1143 1161 25 0.2 5716.1 1.0X
Deserialization 499 551 54 0.4 2495.6 2.3X

Compressed Serialized MapStatus sizes: 556 bytes
Compressed Serialized MapStatus sizes: 554 bytes
Compressed Serialized Broadcast MapStatus sizes: 121 MB


OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
200000 MapOutputs, 1000 blocks w/o broadcast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------
Serialization 1090 1094 5 0.2 5451.7 1.0X
Deserialization 486 505 16 0.4 2431.9 2.2X
Serialization 933 967 30 0.2 4665.8 1.0X
Deserialization 493 520 31 0.4 2465.0 1.9X

Compressed Serialized MapStatus sizes: 121 MB
Compressed Serialized Broadcast MapStatus sizes: 0 bytes
Expand Down
Loading

0 comments on commit f3ab1ea

Please sign in to comment.