Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#3607] feat(core): Support to import the entities when loading entities #3623

Merged
merged 17 commits into from
Jun 18, 2024

Conversation

qqqttt123
Copy link
Contributor

@qqqttt123 qqqttt123 commented May 29, 2024

What changes were proposed in this pull request?

Support to import the entities when loading entities

Why are the changes needed?

Fix: #3607

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Add ut.

@qqqttt123 qqqttt123 marked this pull request as draft May 29, 2024 11:09
@jerqi jerqi force-pushed the ISSUE-3607 branch 7 times, most recently from 9587cea to 30b5053 Compare May 30, 2024 06:40
@jerryshao
Copy link
Contributor

From my point, I think:

  1. Maybe we can use "managed/unmanaged" to differentiate the entity, not "imported"? We'd better figure out a better property name.
  2. I saw that you're using a unified object API to import all the entities. As we only need to import schema, table, topic, maybe we can add APIs to each interface, like "..../table/{table_name}/manage" to mange the table, WDYT?
  3. Is it better to use a property to demonstrate whether the entity is managed or not, not a field, the pro is that it is more compatible, the con is that user may not be aware of this, WDYT?

@jerqi
Copy link
Contributor

jerqi commented May 30, 2024

2. /table/{table_name}/mana

  1. I think import is better than managed/unmanaged. I think registered and metaImported is an alternative. Because some tables are created by us, but they didn't have entity in our storage. It's not suitable to use managed. Managed table has other meanings. So I prefer not using it.
  2. I use unified api to do batch operations. It will be convenient for user.
  3. I think it's better to use field. Because the property will write to the underlying system, some system doesn't support the property like schema of PG. This property is very important, so I prefer using field instead of property.
  4. I don't update the property id for unimported tables in the underlying system, Because I just want to simplify the cases. Maybe I can do it in the future. If we don't update the property id for unimported tables in the underlying system, we can't handle the cases when users rename underlying system tables directly.

@lw-yang
Copy link
Contributor

lw-yang commented May 31, 2024

I'm considering a point: is it necessary to expose the concept of import to users?

when enterprises integrating Gravitino , there will be two phases

  • first is the transitional state during migration
  • second is the final state after migration is complete.

During the transitional state of migration, users request cannot yet converge on Gravitino, so we need to periodically import newly added tables, schemas... from outside into Gravitino. but for users, they don't need to care whether the state is "imported" or not.

When the migration is complete, all request will go through Gravitino, and at this point, "import" is no longer necessary.


another thing, why don't we automatically check and insert metadata during the get operation? This way, users wouldn't have to periodically import data.

@qqqttt123
Copy link
Contributor Author

I ever had concern about the issue of the lock.
We use the read lock when we read a table. If we add the import operation, we need add a parent lock for it. But I find I can use the double check to reduce the influence of the lock.

@jerqi jerqi force-pushed the ISSUE-3607 branch 6 times, most recently from ebf6696 to 20f57e5 Compare June 5, 2024 03:13
@qqqttt123 qqqttt123 changed the title [ISSUE-3607] Support to sync the entities [#3607] feat: Support to import the entities when loading entities Jun 5, 2024
@jerqi jerqi force-pushed the ISSUE-3607 branch 2 times, most recently from 219641e to d0910a7 Compare June 5, 2024 15:38
@qqqttt123 qqqttt123 marked this pull request as ready for review June 6, 2024 06:13
@qqqttt123 qqqttt123 requested a review from jerryshao June 6, 2024 06:13
@qqqttt123
Copy link
Contributor Author

@lw-yang @jerryshao Could you help me review this pull request?

@qqqttt123 qqqttt123 requested review from jerryshao and yuqi1129 June 11, 2024 13:16
@qqqttt123
Copy link
Contributor Author

All comments are addressed. @jerryshao @yuqi1129 Could you take an another look?

Copy link
Contributor

@jerryshao jerryshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please carefully polish your code, I feel like that you didn't carefully check the behavior of each return value and the corner case. This makes me think that you don't have UTs to cover your different cases.

@@ -23,6 +24,9 @@ public final class EntityCombinedSchema implements Schema {

// Sets of properties that should be hidden from the user.
private Set<String> hiddenProperties;
// If imported is true, it means that storage backend have stored the correct entity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -272,6 +273,15 @@ boolean isManagedEntity(NameIdentifier catalogIdent, Capability.Scope scope) {
IllegalArgumentException.class);
}

boolean isEntityExist(NameIdentifier ident, Entity.EntityType type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need thing method to be package private, or protected is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package private is enough. It should keep consistent with other methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean should protected be enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to protected. But other methods in this class are package public. I think this method should have the same visibility with other methods with the same class.

catalogIdentifier,
HasPropertyMetadata::schemaPropertiesMetadata,
schema.properties()));
if (schema == null || schema.imported()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method definition here will throw a NoSuchSchemaException instead of returning null, you should follow this, the method definition doesn't return null, instead it throws an exception.

catalogIdentifier,
HasPropertyMetadata::schemaPropertiesMetadata,
schema.properties()));
if (schema == null || schema.imported()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, do you really check that the code above will return null or throw an exception?

catalogIdentifier,
HasPropertyMetadata::schemaPropertiesMetadata,
schema.properties()));
return schema;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you return a schema you got previously here, IIUC, should we return a schema after it is imported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If schema is imported, we don't need to import it again, so we can return it directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what I mean is that you will have a new entity combined schema in importSchema(), you should return the new one, not the old one. If you don't understand what I'm talking about, just ping me offline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this comment been resolved? I have the same question, I think here should return EntityCombinedSchema after import.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging into the code, the difference of schema before and after import is only about AuditInfo, and since the imported entity doesn't have the new AuditInfo, so schema is the same before and after import, no need to create a new one. It is also the same for table and topic. @mchades .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. If there is no change in the auditInfo then it is acceptable not to use imported EntityCombinedSchema here.

@@ -330,4 +303,102 @@ public boolean dropSchema(NameIdentifier ident, boolean cascade) throws NonEmpty
? droppedFromStore
: droppedFromCatalog;
}

private boolean importSchema(NameIdentifier identifier) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't leverage the return value of this method, so why do you need to define a return value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Maybe I should add some logs.

}

StringIdentifier stringId = getStringIdFromProperties(schema.properties());
// Case 1: The schema is not created by Gravitino.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't update as what @yuqi1129 mentioned here.

ident,
identifier -> store.get(identifier, SCHEMA, SchemaEntity.class),
"GET",
stringId.id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For entity store, it throws a NoSuchEntityException instead of returning null. So the code below is not valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operateOnEntity will catch NoSuchEntityException and return null.

@@ -24,6 +25,10 @@ public final class EntityCombinedSchema implements Schema {
// Sets of properties that should be hidden from the user.
private Set<String> hiddenProperties;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can handle this not in this PR, we may need a class named 'combinedEntity' to unify the handling of entities from different sources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mchades what is the meaning here, can you explain more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean do we need to provide an abstract class CombinedEntity<T> for EntityCombinedSchema, EntityCombinedTable, and EntityCombinedTopic?

// Load the schema to make sure the schema is imported.
SchemaDispatcher schemaDispatcher = GravitinoEnv.getInstance().schemaDispatcher();
NameIdentifier schemaIdent = NameIdentifier.of(ident.namespace().levels());
schemaDispatcher.loadSchema(schemaIdent);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will produce an extra event of loadSchema if we use the loadSchema instead of SchemaExists. Is it expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But using SchemaExists seems a bit weird...

Copy link
Contributor Author

@qqqttt123 qqqttt123 Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Contributor Author

@qqqttt123 qqqttt123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jerryshao jerryshao changed the title [#3607] feat: Support to import the entities when loading entities [#3607] feat(core): Support to import the entities when loading entities Jun 18, 2024
@jerryshao
Copy link
Contributor

@yuqi1129 @mchades can you please help to review?

@jerryshao
Copy link
Contributor

Please review again.

Copy link
Contributor

@yuqi1129 yuqi1129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mchades
Copy link
Contributor

mchades commented Jun 18, 2024

Overall LGTM

@jerryshao jerryshao merged commit c0f8be0 into apache:main Jun 18, 2024
33 checks passed
@jerqi jerqi deleted the ISSUE-3607 branch June 18, 2024 11:44
shaofengshi pushed a commit to shaofengshi/gravitino that referenced this pull request Jun 24, 2024
… entities (apache#3623)

### What changes were proposed in this pull request?
Support to import the entities when loading entities

### Why are the changes needed?

Fix: apache#3607 

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Add ut.

---------

Co-authored-by: Heng Qin <[email protected]>
Co-authored-by: Rory <[email protected]>
Co-authored-by: Jerry Shao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Supports to allocate an entity id for every external entity
6 participants