Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4917 - Remove O(n) queries in m2m updates #5093

Closed
wants to merge 5 commits into from
Closed

#4917 - Remove O(n) queries in m2m updates #5093

wants to merge 5 commits into from

Conversation

orf
Copy link
Contributor

@orf orf commented Apr 24, 2017

Re: #4917

Lets see if the tests pass.

@tomchristie
Copy link
Member

Not looked into this in any detail, but the tests are failing against hyperlinked relationships. Which can also use the pk only optimisation... https://github.com/orf/django-rest-framework/blob/1d990eaba10d542978cbf35d57497858e6f0c3a7/rest_framework/relations.py#L284

@orf
Copy link
Contributor Author

orf commented Apr 24, 2017

I removed that check and added an isinstance check for PrimaryKeyRelatedField. I think for this case it's safe to do, and I think that would be a common case. To support hyperlinked relationships it would take a lot of work.

for item in data
]
if isinstance(self.child_relation, PrimaryKeyRelatedField):
values = list(self.child_relation.get_queryset().filter(pk__in=data))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use values_list('pk', flat=True) here to just fetch the ids and skip object instantiation?

Copy link
Contributor Author

@orf orf Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we then return the values below (line 507), which need to be full model instances.

]
if isinstance(self.child_relation, PrimaryKeyRelatedField):
values = list(self.child_relation.get_queryset().filter(pk__in=data))
missing_primary_keys = set(v.pk for v in values) - set(data)
Copy link

@EmreSoner EmreSoner Apr 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use set for unique list, why not use distinct on values filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not bothered about distinct values, Model.objects.filter(pk__in=[1, 1]) will only return 1 model AFAIK. We use set() here to just easily find any differences between the returned list of primary keys and the given list of primary keys.

Copy link
Collaborator

@carltongibson carltongibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea here but not that it just solves the issue for PrimaryKeyRelatedField.

The if isinstance(self.child_relation, PrimaryKeyRelatedField) special case looks to me like it's queuing up work for later.

to_internal_value is performing two tasks: mapping from the wire representation to lookup parameters and the lookup itself.

In the many case the lookup is always an __in on whatever the lookup field was, so it should be pretty generic.

Even HyperlinkRelatedField has two clearly separate phases: the Map and the lookup

If these were separately callableManyRelatedField should be able to build the lookup (for each RelatedField subclass) for a single fetch, as per the intention.

@orf
Copy link
Contributor Author

orf commented Apr 25, 2017

I agree that it's a special case @carltongibson, but doing anything else would be a major breaking change. Third party fields would rely on to_internal_value returning a whole 'thing' and changing that may be a fair bit of work.

Perhaps we could add another method that converts from the wire API and returns something we can use as a filter (perhaps a Q object would be best?) and have the base to_internal_value call that and do the .get() to maintain compatibility. But that wouldn't help with third party fields anyway, so... I don't know.

Again, queuing up work isn't a bad thing unless nobody wants to do the work. This MR doesn't solve it for the general case, but it solves it for the a common case. I think that's worthwhile until a better solution is implemented.

And its backwards compatible 👍

@carltongibson
Copy link
Collaborator

@orf I'm not suggesting we break to_internal_value.

If it were refactored to look a bit like this:

def to_internal_value(...):
   # get lookup
   # use that to get object

... then ManyRelatedField could do use those methods to do a little bit of magic such as yours to avoid the O(n) separate lookups.

@orf
Copy link
Contributor Author

orf commented Apr 25, 2017

oh right, so perhaps a to_lookup function that returns a Q object perhaps? Then we can do a hasattr(child, 'to_lookup') and use that if present? I like that idea.

@carltongibson
Copy link
Collaborator

@orf I added a Proof of Concept for HyperlinkedRelatedField in #5150. (I've just added code to HyperlinkedRelatedField there, obviously it would need refactoring properly but it should communicate the idea.)

@carltongibson
Copy link
Collaborator

Closing as a known limitation in line with #5150. (Full featured PRs welcomed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants