-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Hive Metastore compatibility between different systems #1045
Comments
Hi @zsxwing, |
For example, when reading Delta tables using Hive. We need to run the following command to create an external table:
In other words, Delta connector for Hive requires to see However, if we use Spark to create a Delta table today, Spark won't write In addition, if we run the above command in Hive, it will write I tried this before and hit the above issue. After we solve it, there may be other incompatibility issue I'm not aware as well. |
Hi @zsxwing Is there any plan when this feature could be done and released? |
@TCGOGOGO this is a complicated issue. We haven't finished the entire investigation of how HMS works cross different engines. There is no ETA right now. |
@dnskr |
@mtthsbrr I'm running Spark Thrift Server in Kubernetes as Deployment (de facto as one Pod) with the following command: /opt/spark/sbin/start-thriftserver.sh --name sts --conf spark.driver.host=$(hostname -I) --conf spark.kubernetes.driver.pod.name=$(hostname) && tail -f /opt/spark/logs/*.out There is a basic config injected to Spark pods through ConfigMap: apiVersion: v1
kind: ConfigMap
metadata:
...
data:
spark-defaults.conf: >-
spark.master k8s://https://kubernetes.default.svc
spark.kubernetes.namespace namespace-to-use
spark.kubernetes.container.image private.registry/spark:v3.3.2-delta2.2.0
spark.kubernetes.container.image.pullSecrets private.registry.secret.key.name
spark.kubernetes.authenticate.serviceAccountName spark-service-account
spark.hive.metastore.uris thrift://hive-metastore.hive-metastore-namespace.svc.cluster.local:9083
spark.hive.server2.enable.doAs false
spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalog
spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension
# Driver and Executor resources properties
... Custom Catalog definition for Presto, see Delta Lake Connector for more details:
Create table query executed in Spark Thrift Server: CREATE OR REPLACE TABLE my_delta_table
USING DELTA
AS SELECT * FROM my_parquet_table; |
Any conclusions about this? I'm having the same problem now 23/04/07 17:12:42 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table |
@zsxwing I would like to know if this coming soon. Reproduced the issue as per doc https://github.com/delta-io/connectors/tree/master/hive:
Such compute engines (e.g. Spark/Hive) interoperability would be nice, specially in shared hms scenarios. |
When this will be available to use. One of our biggest customer who is using hive is not able to use delta data because of this. We need interoperability between delta tables created in spark and hive. |
I find myself entangled in this bug as well. |
Here's an example of getting it to work with Spark SQL, HMS, MinIO S3 and StarRocks. https://github.com/StarRocks/demo/tree/master/documentation-samples/deltalake |
@zsxwing , hi , Do we have any plans to implement compatibility on different systems? |
Feature request
Overview
Currently when we creates a Delta table in Hive Metastore using different systems, we will store different formats in Hive Metastore. This causes the following issue:
Similar issues happen in Presto and Flink as well. It would be great if we can define a unified format in Hive Metastore for Delta.
Motivation
If we define a unifed format in Hive Metastore, and all of systems (Spark, Hive, Presto, Flink) use the same format, then no matter how a table is created, it can be accessed by all of systems.
The text was updated successfully, but these errors were encountered: