Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose get_entities_v2 endpoint in python client #10694

Merged
merged 2 commits into from
Jun 14, 2024
Merged

Conversation

noggi
Copy link
Collaborator

@noggi noggi commented Jun 12, 2024

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@jjoyce0510

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Jun 12, 2024
Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's get @hsheth2 signoff !

metadata-ingestion/src/datahub/ingestion/graph/client.py Outdated Show resolved Hide resolved
aspect_raw_value = entity_aspects.get(aspect.ASPECT_NAME).get(
"value"
)
aspect_value = aspect.from_obj(aspect_raw_value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this always work? I thought there's some differences between the openapi format and the restli format that we currently use?

note that from_obj ignores unknown fields, so it's possible this is silently dropping data - needs to be tested more completely

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the alternative though? We can return raw dict if that's preferred, but if the field is unknown to the model, what else can we do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the alternative might be to raise if the dict has unknown fields? That means that this API becomes unusable for everyone every time a new field is introduced but python model is not up to date, whereas in the current implementation it's unusable only for those who depend on specific new field(s) that have to be added to the model anyway. In such case I'd vote for the current behavior.

Copy link
Collaborator Author

@noggi noggi Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also as a compromise, we can return both the raw dict and the native objects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was simply that from_obj is going to crash for a bunch of object types, because the formatting openapi uses is slightly different from the format used by restli. The python classes only support restli format and avro.

As a simple example, try this on dev01

from pprint import pp

import datahub.metadata.schema_classes as models
from datahub.ingestion.graph.client import get_default_graph

graph = get_default_graph()

urn = "urn:li:assertion:81c2e13a-1f41-43d5-8637-1171977badee"
info = graph.get_entities_v2(
    entity_name="assertion",
    urns=[urn],
    aspects=[models.AssertionInfoClass],
)
assert info is not None
pp(info[urn])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity: we discussed offline and agreed that at this point this method should return raw dict with aspects and it's up to the user to deserialize it as appropriate. This approach seems the most reliable and is not affected by silent data loss issue. As soon as data model consistency situation improves, we can implement proper deserialization support, perhaps as a wrapper method.

self,
entity_name: str,
urns: List[str],
aspects: List[str],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
aspects: List[str],
aspects: Optional[List[str]],

@noggi noggi merged commit e66726b into master Jun 14, 2024
57 of 58 checks passed
@noggi noggi deleted the ak--entities-v2 branch June 14, 2024 20:46
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants