-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(graph service): only query for entities that should have lineage [Breaking Change] #5539
fix(graph service): only query for entities that should have lineage [Breaking Change] #5539
Conversation
@@ -147,7 +147,7 @@ export const OperationsTab = () => { | |||
inputs: run?.inputs?.relationships.map((relationship) => relationship.entity), | |||
outputs: run?.outputs?.relationships.map((relationship) => relationship.entity), | |||
externalUrl: run?.externalUrl, | |||
parentTemplate: run?.parentTemplate?.relationships?.[0].entity, | |||
parentTemplate: run?.parentTemplate?.relationships?.[0]?.entity, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small bug fix i noticed while testing
@@ -19,6 +19,8 @@ export const CompactEntityNameList = ({ entities, onClick, linkUrlParams, showTo | |||
return ( | |||
<> | |||
{entities.map((entity, index) => { | |||
if (!entity) return <></>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, small bug fix here ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is entity not a required field? Didn't realize... Nice!
@@ -86,6 +89,14 @@ public LineageSpec getLineageSpec(String entityName) { | |||
return _lineageSpecMap.get(entityName.toLowerCase()); | |||
} | |||
|
|||
public Set<String> getEntitiesWithLineage() { | |||
Map<String, EntitySpec> specs = _entityRegistry.getEntitySpecs(); | |||
return _lineageSpecMap.keySet().stream().filter(key -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: You maybe could use _lineageSpecMap.entrySet() to avoid the .get(key) calls
|
||
@Data | ||
@AllArgsConstructor | ||
public class GraphFilters { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other services (e.g. search service) we use PDLs to model the filter types...Any reason why we shouldn't keep that consistent here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't exposed via API yet so ill keep them just in java for now so its easier to change. Once we've settled on them let's move them to pdl.
@@ -79,11 +80,12 @@ public interface GraphService { | |||
* - RelatedEntity("DownstreamOf", "dataset three") | |||
*/ | |||
@Nonnull | |||
RelatedEntitiesResult findRelatedEntities(@Nullable final String sourceType, @Nonnull final Filter sourceEntityFilter, | |||
@Nullable final String destinationType, @Nonnull final Filter destinationEntityFilter, | |||
RelatedEntitiesResult findRelatedEntities(@Nullable final List<String> sourceTypes, @Nonnull final Filter sourceEntityFilter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing : We don't really need to remove this API, we could just have the old API call the new API with the different arguments.
RelatedEntitiesResult findRelatedEntities(final String sourceType, Filter filter,...) {
return findRelatedEntities(ImmutableList.of(sourceType, ...);
}
RelatedEntitiesResult findRelatedEntities(final List<String> sourceType, Filter filter,...) {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually creates ambiguity if the user provides null for both fields (which is something we support), so I'll keep it as is for now.
* Unless overridden, it uses the lineage registry to fetch valid edge types and queries for them | ||
*/ | ||
@Nonnull | ||
default EntityLineageResult getLineage(@Nonnull Urn entityUrn, @Nonnull LineageDirection direction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can only filter for entity types that the current URN is related to via a lineage edge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we are just filtering for anything that has lineage... That will solve our immedaite problem with data process instance but in the long tem could this pop up again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed- only relevant entity types will be pulled
@@ -291,16 +291,20 @@ protected static String getQueryForRelatedEntities(@Nullable String sourceType, | |||
List<String> destinationFilterNames = new ArrayList<>(); | |||
List<String> relationshipTypeFilterNames = new ArrayList<>(); | |||
|
|||
if (sourceType != null) { | |||
if (sourceTypes != null && sourceTypes.size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!!
@@ -294,9 +295,22 @@ public QueryBuilder getQueryForLineage(List<Urn> urns, List<EdgeInfo> lineageEdg | |||
incomingEdgeQuery.must(buildEdgeFilters(incomingEdges)); | |||
query.should(incomingEdgeQuery); | |||
} | |||
|
|||
if (graphFilters != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: this function is getting a bit long.. consider breaking into sub methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -50,8 +51,9 @@ public BulkByScrollResponse deleteByQuery(@Nullable final String sourceType, @No | |||
@Nullable final String destinationType, @Nonnull final Filter destinationEntityFilter, | |||
@Nonnull final List<String> relationshipTypes, @Nonnull final RelationshipFilter relationshipFilter) { | |||
BoolQueryBuilder finalQuery = | |||
buildQuery(sourceType, sourceEntityFilter, destinationType, destinationEntityFilter, relationshipTypes, | |||
relationshipFilter); | |||
buildQuery(sourceType == null ? ImmutableList.of() : ImmutableList.of(sourceType), sourceEntityFilter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also use Collections.emptyList for an empty list : )
also, this method can take null right?
if it shouldn't take null values you can mark the param as @nonnull
@@ -108,9 +109,9 @@ public void addEdge(@Nonnull final Edge edge) { | |||
|
|||
@Nonnull | |||
public RelatedEntitiesResult findRelatedEntities( | |||
@Nullable final String sourceType, | |||
@Nullable final List<String> sourceTypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we keep nullable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below- yes
@@ -94,9 +95,9 @@ public void addEdge(@Nonnull final Edge edge) { | |||
|
|||
@Nonnull | |||
public RelatedEntitiesResult findRelatedEntities( | |||
@Nullable final String sourceType, | |||
@Nullable final List<String> sourceTypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we keep nullable? if we can avoid less program states we should go for it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, null is going to represent all
as it has historically
// Build Statement strings | ||
String whereClause = ""; | ||
|
||
Boolean hasSourceTypes = sourceTypes != null && !sourceTypes.isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance we can break this longer block into a sep method?
maybe
computeEntityTypeWhereClause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -359,6 +389,12 @@ private static String criterionToString(@Nonnull CriterionArray criterionArray) | |||
return joiner.length() <= 2 ? "" : joiner.toString(); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
doTestFindRelatedEntitiesEntityType(anyType, null, downstreamOf, outgoingRelationships, service, downstreamOfDatasetOneRelatedEntity); | ||
|
||
service.addEdge(new Edge(datasetOneUrn, nullUrn, downstreamOf)); | ||
syncAfterWrite(); | ||
doTestFindRelatedEntitiesEntityType(anyType, "null", downstreamOf, outgoingRelationships, service, nullRelatedEntity); | ||
doTestFindRelatedEntitiesEntityType(anyType, ImmutableList.of("null"), downstreamOf, outgoingRelationships, service, nullRelatedEntity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? Immutable list of null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same as above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! in the test there is actually an entity type of "null"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counterintuitive but okay :)
metadata-io/src/test/java/com/linkedin/metadata/graph/GraphServiceTestBase.java
Show resolved
Hide resolved
@@ -67,7 +68,7 @@ private RelatedEntitiesResult getRelatedEntities(String rawUrn, List<String> rel | |||
start = start == null ? 0 : start; | |||
count = count == null ? MAX_DOWNSTREAM_CNT : count; | |||
|
|||
return _graphService.findRelatedEntities("", newFilter("urn", rawUrn), "", QueryUtils.EMPTY_FILTER, | |||
return _graphService.findRelatedEntities(ImmutableList.of(), newFilter("urn", rawUrn), ImmutableList.of(), QueryUtils.EMPTY_FILTER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like its going to become confusing to provide either null or list... one other thing we couild do to simplify is just add another method that doesn't take either source types or dest types!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed- made null all and empty list none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good!
Few questions or comments about simplifying things. Otherwise looks good
Our graph service historically queried for all entity types. However, the set of relationship types it queries is the set of relationship types that may exist in some lineage. This means entities that are not included in lineage were getting pulled in. Rather than querying for specific entity->relationship type->entity pairs, I opted for a simper approach of changing the graph service interface to accept a list of entity types.
This approach sets us up to be able to select the set of entity types we are interested in querying as a config or from the UI.
This is a breaking change for anyone who has built their own implementation of the GraphService interface
Checklist