-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1550] feat(spark-connector) support partition,bucket, sortorder table #2540
Conversation
c8cf9ff
to
fc2770f
Compare
3086089
to
fcbe74b
Compare
...ctor/src/test/java/com/datastrato/gravitino/spark/connector/TestSparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Show resolved
Hide resolved
It's ready to review now, @jerryshao @qqqttt123 @yuqi1129 @mchades @diqiu50 please help to review when you are free. |
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Show resolved
Hide resolved
spark-connector/src/main/java/com/datastrato/gravitino/spark/connector/ConnectorConstants.java
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...-test/src/test/java/com/datastrato/gravitino/integration/test/util/spark/SparkTableInfo.java
Outdated
Show resolved
Hide resolved
...-test/src/test/java/com/datastrato/gravitino/integration/test/util/spark/SparkTableInfo.java
Outdated
Show resolved
Hide resolved
9644609
to
d404224
Compare
...k-connector/src/main/java/com/datastrato/gravitino/spark/connector/table/SparkBaseTable.java
Show resolved
Hide resolved
@jerryshao , please help to review when you are free, thanks |
@@ -27,6 +28,10 @@ dependencies { | |||
implementation("org.apache.kyuubi:kyuubi-spark-connector-hive_$scalaVersion:$kyuubiVersion") | |||
implementation("org.apache.spark:spark-catalyst_$scalaVersion:$sparkVersion") | |||
implementation("org.apache.spark:spark-sql_$scalaVersion:$sparkVersion") | |||
implementation("org.scala-lang.modules:scala-java8-compat_$scalaVersion:$scalaJava8CompatVersion") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work in jdk 11 or 17?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's A Java 8 (and up) compatibility kit for Scala.
which could pass IT.
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Show resolved
Hide resolved
split |
...rc/test/java/com/datastrato/gravitino/integration/test/util/spark/SparkTableInfoChecker.java
Outdated
Show resolved
Hide resolved
...-test/src/test/java/com/datastrato/gravitino/integration/test/util/spark/SparkTableInfo.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
// Gravitino use ["a","b"] for nested fields while Spark use "a.b"; | ||
private static String getFieldNameFromGravitinoNamedReference( | ||
NamedReference gravitinoNamedReference) { | ||
return String.join(ConnectorConstants.DOT, gravitinoNamedReference.fieldName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mchades Does Gravitino support nested fields? I remember ["a","b"]
, a means the table reference and b is the real column name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the array of fieldName is used to represent access nested fields. For example, the column a
is struct type{b int, c string}
, then we can use a.b
or a.c
to reference a nested field
integration-test/src/test/java/com/datastrato/gravitino/integration/test/spark/SparkEnvIT.java
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
...onnector/src/main/java/com/datastrato/gravitino/spark/connector/SparkTransformConverter.java
Outdated
Show resolved
Hide resolved
bucketNum, createSparkNamedReference(bucketFields), createSparkNamedReference(sortFields)); | ||
} | ||
|
||
// columnName could be "a" or "a.b" for nested column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you need to handle nested column case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to handle it, because both spark and gravitino interfaces support nested columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't see you do it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I see the code there, please ignore.
@mchades @qqqttt123 @yuqi1129 @jerryshao @diqiu50 all comments are addressed, please help to review again. |
I have no further comment, I think we can go to unblock other PRs. If there's missing parts. We can fix in another PR. |
…er table (apache#2540) ### What changes were proposed in this pull request? add partition, distribution, sort order support for spark connector ### Why are the changes needed? Fix: apache#1550 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? add UT and IT, also verified in local env.
What changes were proposed in this pull request?
add partition, distribution, sort order support for spark connector
Why are the changes needed?
Fix: #1550
Does this PR introduce any user-facing change?
no
How was this patch tested?
add UT and IT, also verified in local env.