-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose get_entities_v2 endpoint in python client #10694
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's get @hsheth2 signoff !
aspect_raw_value = entity_aspects.get(aspect.ASPECT_NAME).get( | ||
"value" | ||
) | ||
aspect_value = aspect.from_obj(aspect_raw_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this always work? I thought there's some differences between the openapi format and the restli format that we currently use?
note that from_obj ignores unknown fields, so it's possible this is silently dropping data - needs to be tested more completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the alternative though? We can return raw dict if that's preferred, but if the field is unknown to the model, what else can we do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the alternative might be to raise if the dict has unknown fields? That means that this API becomes unusable for everyone every time a new field is introduced but python model is not up to date, whereas in the current implementation it's unusable only for those who depend on specific new field(s) that have to be added to the model anyway. In such case I'd vote for the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also as a compromise, we can return both the raw dict and the native objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was simply that from_obj is going to crash for a bunch of object types, because the formatting openapi uses is slightly different from the format used by restli. The python classes only support restli format and avro.
As a simple example, try this on dev01
from pprint import pp
import datahub.metadata.schema_classes as models
from datahub.ingestion.graph.client import get_default_graph
graph = get_default_graph()
urn = "urn:li:assertion:81c2e13a-1f41-43d5-8637-1171977badee"
info = graph.get_entities_v2(
entity_name="assertion",
urns=[urn],
aspects=[models.AssertionInfoClass],
)
assert info is not None
pp(info[urn])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity: we discussed offline and agreed that at this point this method should return raw dict with aspects and it's up to the user to deserialize it as appropriate. This approach seems the most reliable and is not affected by silent data loss issue. As soon as data model consistency situation improves, we can implement proper deserialization support, perhaps as a wrapper method.
1c7d9b1
to
35728a9
Compare
self, | ||
entity_name: str, | ||
urns: List[str], | ||
aspects: List[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aspects: List[str], | |
aspects: Optional[List[str]], |
Checklist
@jjoyce0510