-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SearchFilter time grows exponentially by # of search terms #4655
Comments
These docs are relevant here. At the end of the day, you're getting two different queries that return two completely different sets of results. Regardless of performance, I'd argue that the proposed changes are more correct. |
Would you like me to submit a PR? Some thoughts: |
If you believe this represents an issue in Django core then raise a ticket on Trac. It'd be worth reviewing what happens in the admin, and if this is replicable in the the search there too. I'd be surprised if the issue hadn't already come up before if that's the case. |
Start by seeing what tests fail if you do remove it. We can then take the conversation from there. |
In the past we have similar problem with a pure django project (not using django rest framework at all). We used django-tagging and searched in the tags (which are many to many to the object). We used MySQL for database engine and when query string in our form contained a lot of words then MySQL raised that it can not join more than 40 tables (or 41 I can't remember exactly). We fixed that by using Q objects and @rpkilby yes at the end you have two different SQL queries but you still have the same results set because you are using @cdosborn |
@vstoykov - The search fields per term are grouped together with GET https://localhost/api/users?search=bob,joe With the existing implementation, we should get a queryset equivalent to the following: User.objects \
.filter(Q(name__icontains='bob') | Q(groups__name__icontains='bob') \
.filter(Q(name__icontains='joe') | Q(groups__name__icontains='joe') The proposed changes would result in this query: User.objects.filter((Q(name__icontains='bob') | Q(groups__name__icontains='bob'))
& (Q(name__icontains='joe') | Q(groups__name__icontains='joe'))) I'd have to double check, but this seems to fall under the caveats described in the docs. |
@rpkilby Sorry I totally missed
This will make the situation complex. On one hand the search need to return as many as possible matching results, on other hand it should not DOS the application. Probably there should be something that can configure this ( |
From one point of view, the current behavior is a bug w.r.t to handling m2m. From the docs:
As you mentioned, if we went ahead with the changes, then applications would see fewer results. |
Hey folks, |
If anyone wants to progress this issue, I'd suggest making a pull request so we can look at the effects of this change on the current test suite, which would help highlight any problems it might have. |
Checklist
master
branch of Django REST framework.Steps to reproduce
Use
filters.SearchFilter
and include a seach_field which is a many to many lookup.Make a query against this view with several search terms.
Expected behavior
The search time would increase somewhat linearly with the # of terms.
Actual behavior
The search grows exponentially with each added term. In our application several words (3) resulted in a 30 sec query against a model that only had several hundred entries. It would take several minutes for another term and so on.
Summary
I was able to change a single block in drf and the performance became linear as I would expect. The problem and (a potential) solution are known. I wanted to bring them to your attention.
The culprit
Chaining filters in
django
on querysets doesn't behave as one would expect when dealing with ManyToMany relations. If you look at the gist below, you'll see that the second bit of sql is quite different from the first bit because of this difference.https://gist.github.com/cdosborn/cb4bdfd0467feaf987476f4aefdf7ee5
From looking at the sql, you'll notice the first bit generated a bunch of unnecessary joins. These joins result in a multiplicative factor on the number of rows that the query contains. Notice how the bottom query doesn't have the redundant joins. So what we can conclude is that chaining filters can produce unnecessary joins which can dramatically effect the performance.
So there is a bit of code in drf, which chains
filter
for each term in the search query. This explodes whenever thesearch_fields
contains a ManyToMany.A solution
Rather than chaining filters in
SearchFilter
we build up a query first, and callfilter
once.This may not be the fix you want. My guess is that the
must_call_distinct
was trying to fix this problem, but it's not sufficient. My impression is that this is a pretty serious issue that django needs to resolve.The text was updated successfully, but these errors were encountered: