-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate relations after importing aliases #81
Comments
This is -- to some extent -- the code that causes it: https://github.com/granoproject/grano/blob/master/grano/logic/entities.py#L136 The question is, how does that code decide when to delete duplicate links - because it may want to consider more than just source and target. The only fully logical solution I can see is to load all entities first, then de-dupe and then load relations. But that would be a major refactor. |
Would it not be possible to merge relations based on the uniqueness constraints in the schemata? |
Hm, but the uniqueness constraints aren't actually in the schema; they're in the loaders. Which may be a problem anyway: if the schema knew about de-dupe, we could just POST whole objects without checking for them first, which would halve the number of HTTP requests we need to do to load a dataset. |
I was thinking of something along the lines of a grano command that takes the schema file as an argument and de-dupes the relations. What are good reasons for keeping grano ignorant of uniqueness constraints? Simpler code? |
Well there could be different uniqueness constraints for different data sources, but that actually seems more like a bug now that I think of it. |
Perhaps for now I can add a relation de-duping command to granoloader. It should be able to merge relations efficiently enough by paging through relations ordered by unique fields |
No description provided.
The text was updated successfully, but these errors were encountered: