-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#91]feat(catalog): Hive schema entity serde and store support #208
Conversation
Code Coverage Report
Files
|
Done |
catalog-hive/src/main/java/com/datastrato/graviton/catalog/hive/HiveCatalogOperations.java
Show resolved
Hide resolved
catalog-hive/src/main/java/com/datastrato/graviton/catalog/hive/HiveCatalogOperations.java
Show resolved
Hide resolved
catalog-hive/src/main/java/com/datastrato/graviton/catalog/hive/HiveCatalogOperations.java
Show resolved
Hide resolved
catalog-hive/src/main/java/com/datastrato/graviton/catalog/hive/HiveCatalogOperations.java
Show resolved
Hide resolved
catalog-hive/src/main/java/com/datastrato/graviton/catalog/hive/HiveCatalogOperations.java
Show resolved
Hide resolved
@yuqi1129 If we move hive-related operations out of the backend transaction method, once hive-related failed, the Graviton store will not rollback |
core/src/main/java/com/datastrato/graviton/proto/AuditInfoSerDe.java
Outdated
Show resolved
Hide resolved
|
||
return commonSchema; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't like this class CommonSchema
here. CommonSchema
has nothing to do with HiveSchema
. Semantically you'll both have CommonSchema and HiveSchema, but actually CommonSchema
is just a placeholder for schema info, not an entity.
If it is a placeholder, then it should have a different name, if it is a base class of HiveSchema, then the class organization should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original intention was to use commonSchema
as a generic class for storing basic information. Any entity that does not require additional information can be serialized and deserialized using commonSchema
.
Another mainly reason for needing the commonSchema
class is that BaseSchema
is an abstract class and cannot be instantiated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, I think we can have several options:
- Remove
abstract
limitation forBaseSchema
class. - Directly create
HiveSchema
's serde to protoSchema
.
Basically, each entity (with identifier) should be mapping to a metadata object. Semantically CommonSchema
and HiveSchema
are two different entities if they don't have inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, option one was chosen
uint64 catalog_id = 2; | ||
string name = 3; | ||
optional string comment = 4; | ||
map<string, string> properties = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to store comment/properties, they will be stored in hive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
.setName(schemaEntity.name()) | ||
.setAuditInfo(AuditInfoSerDe.ser(schemaEntity.auditInfo())) | ||
.setComment(schemaEntity.getComment()) | ||
.putAllProperties(schemaEntity.getProperties()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you check the nullability of comment
and properties
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment
and properties
are removed for now.
"com.datastrato.graviton.meta.rel.BaseSchema.CommonSchema", | ||
"com.datastrato.graviton.proto.SchemaEntitySerDe", | ||
"com.datastrato.graviton.catalog.hive.HiveSchema", | ||
"com.datastrato.graviton.proto.SchemaEntitySerDe"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- why do you both have
commonSchema
andHiveSchema
here? - If you put CommonSchema and SchemaEntitySerDe here in core module, why do you meet class not found issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's review the serde procedure for HiveSchema
as follows:
- serialize:
- Find
SchemaEntitySerDe
class according toHiveSchema
class. SchemaEntitySerDe
serializeHiveSchema
entity toSchema
proto
- deserialize:
- Find
Schema
proto class according to passed parameter class ofHiveSchema
- deserialize data from bytesArray to
Schema
proto instance - Find
SchemaEntitySerDe
class according to passed parameter class ofHiveSchema
SchemaEntitySerDe
deserialize fromSchema
proto toBaseSchema
instance
With above describtioin, we really don't need map key of CommonSchema
(BaseSchema
), I have removed CommonSchema
.
client.dropDatabase(ident.name(), false, false, cascade); | ||
store.executeInTransaction( | ||
() -> { | ||
store.delete(ident, SCHEMA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to delete all tables under this schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, delete tables operation added.
catalog-hive/src/main/java/com/datastrato/graviton/catalog/hive/HiveCatalogOperations.java
Show resolved
Hide resolved
} catch (EntityAlreadyExistsException e) { | ||
throw new SchemaAlreadyExistsException( | ||
String.format( | ||
"Hive schema (database) '%s' already exists in Graviton store", ident.name())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A blank line after each exception to make the code more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
@@ -24,13 +24,20 @@ public class ProtoEntitySerDe implements EntitySerDe { | |||
"com.datastrato.graviton.meta.BaseMetalake", | |||
"com.datastrato.graviton.proto.BaseMetalakeSerDe", | |||
"com.datastrato.graviton.meta.CatalogEntity", | |||
"com.datastrato.graviton.proto.CatalogEntitySerDe"); | |||
"com.datastrato.graviton.proto.CatalogEntitySerDe", | |||
"com.datastrato.graviton.catalog.hive.HiveSchema", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you're using HiveSchema
with SchemaEntitySerDe
, but the type parameter of SchemaEntitySerDe
is BaseSchema
, do we need to unify to BaseSchema
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mapping indicates that the HiveSchema
can utilize SchemaEntitySerDe
.
In the future, we may also require other catalogs like xxxSchema
(if xxx also do not require storing additional information) to make use of this schemaEntitySerDe
, at that time, we can directly add a mapping relationship: xxxSchema
-> SchemaEntitySerDe
.
So the BaseSchema
type parameter of SchemaEntitySerDe
is used to reusing serder logic.
// TODO. We should also fetch the customized HiveSchema entity fields from our own | ||
// underlying storage, like id, auditInfo, etc. | ||
EntityStore store = GravitonEnv.getInstance().entityStore(); | ||
BaseSchema baseSchema = store.get(ident, SCHEMA, HiveSchema.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use HiveSchema.class
as a parameter, but the returned type is BaseSchema
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to the specific type of schema
when converting schema from a byte array, what's is why we need HiveSchema.class
not BaseSchema.class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to avoid confusion, I changed it to BaseSchema baseSchema = store.get(ident, SCHEMA, BaseSchema.class)
here and then added BaseSchema
serde mapping.
What changes were proposed in this pull request?
CommonSchema
class andSchema
proto for serdeHiveSchema
asCommonSchema
Why are the changes needed?
we could store the additional entity information to our own storage.
Fix: #91
Does this PR introduce any user-facing change?
Add some check for Graviton store
How was this patch tested?
UTs added