[SPARK-2177][SQL] describe table result contains only one column #1118

yhuai · 2014-06-18T19:15:25Z

scala> hql("describe src").collect().foreach(println)

[key                    string                  None                ]
[value                  string                  None                ]

The result should contain 3 columns instead of one. This screws up JDBC or even the downstream consumer of the Scala/Java/Python APIs.

I am providing a workaround. We handle a subset of describe commands in Spark SQL, which are defined by ...

DESCRIBE [EXTENDED] [db_name.]table_name

All other cases are treated as Hive native commands.

Also, if we upgrade Hive to 0.13, we need to check the results of context.sessionState.isHiveServerQuery() to determine how to split the result. This method is introduced by https://issues.apache.org/jira/browse/HIVE-4545. We may want to set Hive to use JsonMetaDataFormatter for the output of a DDL statement (set hive.ddl.output.format=json introduced by https://issues.apache.org/jira/browse/HIVE-2822).

The link to JIRA: https://issues.apache.org/jira/browse/SPARK-2177

AmplabJenkins · 2014-06-18T19:19:52Z

Merged build triggered.

AmplabJenkins · 2014-06-18T19:20:02Z

Merged build started.

rxin · 2014-06-18T19:45:48Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/hiveOperators.scala

+        // This method is introduced by https://issues.apache.org/jira/browse/HIVE-4545.
+        // Right now, we split every string by any number of consecutive spaces.
+        sideEffectResult.map(
+          r => r.split("\\s+")).map(r => new GenericRow(r.asInstanceOf[Array[Any]]))


actually for describe can we only split up to 3 columns?

scala> "a b c d e".split("\\s+", 3) res2: Array[String] = Array(a, b, c d e)

AmplabJenkins · 2014-06-18T20:14:52Z

Merged build triggered.

AmplabJenkins · 2014-06-18T20:15:02Z

Merged build started.

concretevitamin · 2014-06-18T20:19:30Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/hiveOperators.scala

@@ -445,7 +445,19 @@ case class NativeCommand(
    if (sideEffectResult.size == 0) {
      context.emptyResult
    } else {
-      val rows = sideEffectResult.map(r => new GenericRow(Array[Any](r)))
+      // TODO: Need a better way to handle the result of a native command.
+      // We may want to consider to use JsonMetaDataFormatter in Hive.


Instead of introducing a special case here, can we put this piece of logic in a separate DescribeCommand? A while ago the introduction of SetCommand / ExplainCommand / CacheCommand serves partly to reduce special-casing in random places -- pinging @liancheng on this too.

That sounds good. Let's merge this first and submit another PR for that. (Reason is this should make it into 1.0.1)

Yeah, it sounds good.

Ah, my bad, when saying "just refer to NativeCommand", I actually meant to add a DescribeCommand following NativeCommand in hiveOperations.scala.

Actually, as briefly mentioned at the end of section PR Overview of PR #1071 description, we should specialize all native commands in the same way, and use NativeCommand as a default handler for those commands that haven't been specialized yet.

AmplabJenkins · 2014-06-18T20:34:16Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-18T20:34:16Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15879/

AmplabJenkins · 2014-06-18T21:24:03Z

Merged build finished.

AmplabJenkins · 2014-06-18T21:24:03Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15881/

AmplabJenkins · 2014-06-19T05:59:54Z

Merged build triggered.

AmplabJenkins · 2014-06-19T06:00:01Z

Merged build started.

rxin · 2014-06-19T06:01:10Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala

@@ -17,7 +17,7 @@

 package org.apache.spark.sql.hive

-import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.{SQLContext}


no need to change this

AmplabJenkins · 2014-06-19T06:04:54Z

Merged build triggered.

AmplabJenkins · 2014-06-19T06:05:01Z

Merged build started.

rxin · 2014-06-19T06:06:14Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala

+            Seq(DescribeHiveTableCommand(
+              t, describe.output, describe.isFormatted, describe.isExtended)(context))
+          case o: LogicalPlan =>
+            if (describe.isFormatted)


Maybe for non metastore tables, we can just added some formatted/extended information saying they are registered as temporary tables? Then we can get rid of the extra lines here ...

AmplabJenkins · 2014-06-19T06:06:26Z

Merged build finished.

AmplabJenkins · 2014-06-19T06:06:27Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15898/

rxin · 2014-06-19T06:06:28Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/hiveOperators.scala

@@ -19,8 +19,10 @@ package org.apache.spark.sql.hive.execution

 import org.apache.hadoop.hive.common.`type`.{HiveDecimal, HiveVarchar}
 import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.metastore.api.FieldSchema


api should go after MetaStoreUtils since api is a package

AmplabJenkins · 2014-06-19T20:15:06Z

Merged build started.

AmplabJenkins · 2014-06-19T21:27:40Z

Merged build finished.

AmplabJenkins · 2014-06-19T21:27:40Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15923/

AmplabJenkins · 2014-06-19T22:14:56Z

Merged build triggered.

AmplabJenkins · 2014-06-19T22:15:02Z

Merged build started.

AmplabJenkins · 2014-06-19T23:29:39Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-19T23:29:40Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15929/

rxin · 2014-06-20T00:14:17Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/commands.scala

@@ -60,3 +60,23 @@ case class ExplainCommand(plan: LogicalPlan) extends Command {
 * Returned for the "CACHE TABLE tableName" and "UNCACHE TABLE tableName" command.
 */
 case class CacheCommand(tableName: String, doCache: Boolean) extends Command
+
+/**


remove this block

yhuai · 2014-06-20T00:49:34Z

For this PRD, if user want to describe a column, DESCRIBE tableName columnName should be used (right now, the command to describe a column is treated as a hive native command) because if this user uses DESCRIBE tableName.columnName, tableName will be treated as the db name and columnName will be treated as the table name. Need a follow-up jira to address this issue.

AmplabJenkins · 2014-06-20T00:49:57Z

Merged build triggered.

AmplabJenkins · 2014-06-20T00:50:03Z

Merged build started.

AmplabJenkins · 2014-06-20T02:04:19Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-20T02:04:19Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15932/

rxin · 2014-06-20T06:41:19Z

Ok I'm merging this in master & branch-1.0. Thanks!

``` scala> hql("describe src").collect().foreach(println) [key string None ] [value string None ] ``` The result should contain 3 columns instead of one. This screws up JDBC or even the downstream consumer of the Scala/Java/Python APIs. I am providing a workaround. We handle a subset of describe commands in Spark SQL, which are defined by ... ``` DESCRIBE [EXTENDED] [db_name.]table_name ``` All other cases are treated as Hive native commands. Also, if we upgrade Hive to 0.13, we need to check the results of context.sessionState.isHiveServerQuery() to determine how to split the result. This method is introduced by https://issues.apache.org/jira/browse/HIVE-4545. We may want to set Hive to use JsonMetaDataFormatter for the output of a DDL statement (`set hive.ddl.output.format=json` introduced by https://issues.apache.org/jira/browse/HIVE-2822). The link to JIRA: https://issues.apache.org/jira/browse/SPARK-2177 Author: Yin Huai <[email protected]> Closes #1118 from yhuai/SPARK-2177 and squashes the following commits: fd2534c [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 b9b9aa5 [Yin Huai] rxin's comments. e7c4e72 [Yin Huai] Fix unit test. 656b068 [Yin Huai] 100 characters. 6387217 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 8003cf3 [Yin Huai] Generate strings with the format like Hive for unit tests. 9787fff [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 440c5af [Yin Huai] rxin's comments. f1a417e [Yin Huai] Update doc. 83adb2f [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 366f891 [Yin Huai] Add describe command. 74bd1d4 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 342fdf7 [Yin Huai] Split to up to 3 parts. 725e88c [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 bb8bbef [Yin Huai] Split every string in the result of a describe command. (cherry picked from commit f397e92) Signed-off-by: Reynold Xin <[email protected]>

``` scala> hql("describe src").collect().foreach(println) [key string None ] [value string None ] ``` The result should contain 3 columns instead of one. This screws up JDBC or even the downstream consumer of the Scala/Java/Python APIs. I am providing a workaround. We handle a subset of describe commands in Spark SQL, which are defined by ... ``` DESCRIBE [EXTENDED] [db_name.]table_name ``` All other cases are treated as Hive native commands. Also, if we upgrade Hive to 0.13, we need to check the results of context.sessionState.isHiveServerQuery() to determine how to split the result. This method is introduced by https://issues.apache.org/jira/browse/HIVE-4545. We may want to set Hive to use JsonMetaDataFormatter for the output of a DDL statement (`set hive.ddl.output.format=json` introduced by https://issues.apache.org/jira/browse/HIVE-2822). The link to JIRA: https://issues.apache.org/jira/browse/SPARK-2177 Author: Yin Huai <[email protected]> Closes apache#1118 from yhuai/SPARK-2177 and squashes the following commits: fd2534c [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 b9b9aa5 [Yin Huai] rxin's comments. e7c4e72 [Yin Huai] Fix unit test. 656b068 [Yin Huai] 100 characters. 6387217 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 8003cf3 [Yin Huai] Generate strings with the format like Hive for unit tests. 9787fff [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 440c5af [Yin Huai] rxin's comments. f1a417e [Yin Huai] Update doc. 83adb2f [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 366f891 [Yin Huai] Add describe command. 74bd1d4 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 342fdf7 [Yin Huai] Split to up to 3 parts. 725e88c [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2177 bb8bbef [Yin Huai] Split every string in the result of a describe command.

…ets SSL errors from server (apache#1118) Co-authored-by: Egor Krivokon <>

yhuai added 2 commits June 18, 2014 12:09

Split every string in the result of a describe command.

bb8bbef

Merge remote-tracking branch 'upstream/master' into SPARK-2177

725e88c

rxin reviewed Jun 18, 2014
View reviewed changes

yhuai added 2 commits June 18, 2014 13:12

Split to up to 3 parts.

342fdf7

Merge remote-tracking branch 'upstream/master' into SPARK-2177

74bd1d4

concretevitamin reviewed Jun 18, 2014
View reviewed changes

yhuai added 2 commits June 18, 2014 22:54

Add describe command.

366f891

Merge remote-tracking branch 'upstream/master' into SPARK-2177

83adb2f

Update doc.

f1a417e

rxin reviewed Jun 19, 2014
View reviewed changes

Fix unit test.

e7c4e72

rxin reviewed Jun 20, 2014
View reviewed changes

yhuai added 2 commits June 19, 2014 17:45

rxin's comments.

b9b9aa5

Merge remote-tracking branch 'upstream/master' into SPARK-2177

fd2534c

asfgit closed this in f397e92 Jun 20, 2014

yhuai deleted the SPARK-2177 branch July 31, 2014 21:12

schlosna mentioned this pull request Feb 8, 2018

Bump codahale metrics palantir/spark#309

Closed

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024

[MAPRYARN-397] Proxy should respond to client with a redirect if it g…

98afd45

…ets SSL errors from server (apache#1118) Co-authored-by: Egor Krivokon <>

[SPARK-2177][SQL] describe table result contains only one column #1118

[SPARK-2177][SQL] describe table result contains only one column #1118

Conversation

yhuai commented Jun 18, 2014

AmplabJenkins commented Jun 18, 2014

AmplabJenkins commented Jun 18, 2014

Choose a reason for hiding this comment

AmplabJenkins commented Jun 18, 2014

AmplabJenkins commented Jun 18, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Jun 18, 2014

AmplabJenkins commented Jun 18, 2014

AmplabJenkins commented Jun 18, 2014

AmplabJenkins commented Jun 18, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

Choose a reason for hiding this comment

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

Choose a reason for hiding this comment

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

Choose a reason for hiding this comment

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

Choose a reason for hiding this comment

yhuai commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

rxin commented Jun 20, 2014