Query explosion even on simple models with no included serializers #337

aidanlister · 2017-04-14T12:14:48Z

I love the idea of this module but it seems like it's impossible to build fast APIs with it.

Let's say I have Books and Authors. I fetch a list of books, even with no "included" specified I get a query for each author, just to display the FK?

As a broader issue, as Author increases in complexity developers will add more included_serializers to that serialiser, which inadvertently affects my Books viewset with a query explosion.

Could we define a lambda for each included_serializer to alter the query? Anything more robust than magic.

mblayman · 2017-04-14T13:29:37Z

@aidanlister thanks for your concern about performance. As a production user of DJA, I want my site to be performant too. The built-in DJA ModelViewSet uses a prefetch_related on querysets to avoid an N+1 query bug when including additional models. Do you have a specific example and/or data to explain the "query explosion" that you're referring to?

FYI, there was a recent bug that was fixed (which I don't think has been released yet) that was preventing the prefetch_related code from executing. Ideally, a DJA model viewset should only have to do 1 db query for each included model type. If that's not the behavior you're seeing in practice, then maybe there is another bug.

aidanlister · 2017-04-14T23:52:26Z

@mblayman so load up the example app, generate some fake data:

factories.CommentFactory.bulk_create(50)

Then look at the comments list:

We can see there's 5 comments, but 15 queries, and no included relationships ... it's doing a query just to display the FK.

This is the latest develop code.

aidanlister · 2017-04-15T00:58:26Z

I've added a failing test,
uptick@66609ae

mblayman · 2017-04-15T02:38:37Z

Thanks @aidanlister, that's really helpful info. I agree that seems wrong for sure.

It seems like the next steps are to track down those excessive queries and find a way to delay evaluation or group things together to reduce the number to where it should be.

aidanlister · 2017-04-15T02:43:34Z

@mblayman I don't think it's about delaying the evaluation, we don't need a query to just show the FK.

I'll add a series of performance test cases to properly test this, but have you thought about providing a way of specifying the select related / prefetch related via a lambda or method on the serializer? Forcing it to be explicit seems better than magic.

mblayman · 2017-04-15T02:48:31Z

Truthfully, I haven't given it a lot of thought. I stepped into a maintainer mode less than a couple of months ago. I'm not an expert in all the internal workings of the project, though I would like to understand it better.

Explicit over implicit definitely fits the Python ethos. I suspect that a reasonable default would be needed to maintain backwards compatibility. Adding a method to help control the select_related / prefetch_related aspects of the serializer sounds good.

I'm trying to fire up the example now so I can look at the queries and see where the library is being wasteful.

mblayman · 2017-04-15T03:50:44Z

Ok, I got the example app going and debug toolbar is showing me that each comment in the comment listing is adding two queries. This is definitely an N+1 bug because debug toolbar shows the extra queries come from fetching the author and entry from each comment to get the respective IDs. I think the problem is in the example app's viewset code.

With 4 comments, I had 12 queries. Some are unavoidable because of the need to get the count and use the paginator (as you noted in your test). 8 queries came from each of the 4 comments fetching author and entry.

I went to the CommentViewSet and changed the queryset:

    queryset = Comment.objects.all().select_related('author', 'entry')

After this change, the query count dropped to 5 and stayed there even when I added more comments.

I do see the possibility to optimize the model serializer to use entry_id instead of entry.id (for example). That would be a nice benefit to avoid going to the entry db table entirely since the ID is stored in the foreign key field of the comment model. I think this pattern could generalize.

aidanlister · 2017-04-15T03:59:08Z

Ahhhh, yes you're right, it's the author.id that's causing it to blow out. If we can fix that, that solves a good chunk of my issue.

aidanlister · 2017-04-15T04:14:20Z

Looks like line 120 in renderers.py,
resolved, relation_instance = utils.get_relation_instance(resource_instance, source, field.parent)

?

aidanlister · 2017-04-15T04:53:56Z

Actually I take that back ... the renderer has no effect.

Line 143, Json API ModelSerializer has this:
serializer_related_field = ResourceRelatedField

Somewhere in ResourceRelatedField ...

aidanlister · 2017-04-15T06:27:00Z

So, on ResourceRelatedField, we set use_pk_only_optimization = False,

class ResourceRelatedField():
    def use_pk_only_optimization(self):
        # We need the real object to determine its type...
        return False

This means that this method from the parent class (PrimaryKeyRelatedField) runs:

    def get_attribute(self, instance):
        if self.use_pk_only_optimization() and self.source_attrs:
            # Optimized case, return a mock object only containing the pk attribute.
            try:
                instance = get_attribute(instance, self.source_attrs[:-1])
                value = instance.serializable_value(self.source_attrs[-1])
                if is_simple_callable(value):
                    # Handle edge case where the relationship `source` argument
                    # points to a `get_relationship()` method on the model
                    value = value().pk
                return PKOnlyObject(pk=value)
            except AttributeError:
                pass

        # Standard case, return the object instance.
        return get_attribute(instance, self.source_attrs)

The optimized case is what stops the query explosion ordinarily, it creates a mock object (PKOnlyObject) with only the PK set so that there's no object lookup.

@jsenecal looks like you made the commit two years ago, any insight?

23d0af6

aidanlister · 2017-04-15T10:01:55Z

I think I'm getting there ... this has resolved the number of queries, but I had to kill some functionality.

uptick@94f79c4

aidanlister · 2017-04-15T11:28:27Z

I think I've gone as far as I can go with this ...

This is a simple pattern I've used for managing the queryset based on what is included, I think this makes sense:

class CommentViewSet(viewsets.ModelViewSet):
    queryset = Comment.objects.all()
    serializer_class = CommentSerializer

    def get_queryset(self):
        qs = super().get_queryset()
        if 'author' in self.request.GET.get('include', '').split(','):
            qs = qs.prefetch_related('author')
        return qs

The other location I mentioned before was also causing an explosion, the simplest fix is removing the get links functionality:

field_links = field.get_links(resource_instance)

aidanlister · 2017-04-15T13:05:13Z

This is where I've finished up:

https://github.com/django-json-api/django-rest-framework-json-api/compare/develop...ABASystems:hotfix/performance?expand=1

I've eliminated the query explosion for the simple case ... I've done this by switching off the links at the settings level, and by passing the type of the object around rather than expecting the full instance to be available.

Rather than disabling the links at the settings level we should be able to disable fetching them if the resource type isn't in included, but I didn't get that far.

On the more complicated example, comments?include=author, the bug manifests itself again with a query for each AuthorBio just to display the FK ...

For now, I'm giving up, I don't get the feeling that this is easily solvable.

@jerel @jsenecal any insight onto how I might fix this would be much appreciated.

aidanlister · 2017-04-22T06:05:34Z

@mblayman have you had a chance to take a look at this yet?

mblayman · 2017-04-22T12:47:35Z

No, I stopped looking at it. I thought the conversation reached a conclusion for your case, and you were reaching out to the other devs. I don't see anything directly actionable with this issue. The theme of this issue seems to be "find a way to use fewer queries." While I totally agree with the goal, without a PR and some code that attempts to reduce the query count, I don't see what we could do next with this issue.

aidanlister · 2017-04-23T08:11:43Z

@mblayman ah, I assumed when you said you were the maintainer that you meant the package was actively maintained.

The package has an m*(n+1) bug where m is the number of relations a model has, n is the page size, making it completely unsuitable for any production environment or large data sets.

So I guess I'm worried if the package isn't being actively maintained we might need to rethink our API strategy.

mblayman · 2017-04-23T13:16:35Z

As a community effort, this project depends on contributions from the community. It is actively maintained and had a release on Friday for 2.2 which includes many months of contributions from a wide variety of individuals.

I respectfully disagree with your conclusion that it is "completely unsuitable for any production environment or large data sets" (emphasis added). Aside from being hyperbolic (which I don't think is fruitful for technical conversation), I see this claim as false, knowing that I and many others have deployed DJA in a production setting.

IMO, the problem you are reporting is a limitation that is not unique to DJA. Selecting an appropriate queryset is true for DJA and DRF and Django underneath it. I think this is an area where Django (and Python in general) prefers being explicit over implicit. Rather than trying to intuit the appropriate selected_related additions, it is up to the developer to include this in the queryset. This strategy prevents code from over-selecting from other tables and bloating a db query. I thought I demonstrated that as I walked through how to eliminate the N+1 query bug from the example app. If I failed to make that clear enough, I apologize.

I think your comment from a few days ago (#337 (comment)) gets at the heart of the piece that can be solved by a PR. If someone could find a way to make sure the ID is fetched from model_id instead of model.id, that would be awesome. I'd be happy to review a PR fixing this issue, however, I don't have the time to fix this myself.

jsenecal · 2017-04-23T20:37:12Z

A long time ago, I needed this project to perform the required tasks needed to be able to use this package for a production environment. As the project was not mature enough to perform that task I decided I should contribute to it and have the whole world benefit from it. This is what I think Opensource should be, and not having people expect everyone to do the hard work for them and complain about x or y feature that could be missing even without knowing the actual work involved into implementing those features. This project was created to provide a Jsonapi compatible output for DRF, I think it does that very well. I had to let go of the project because our production project took a turn where we couldn't use Django resf framework anymore, but hey, it was open source, so people took it from there and it continued to evolve. This is what's great about Opensource projects communities. - what's not so great is actually having people complain without any form of respect for the work that was accomplished. ... My two cents.

…

On Sun, Apr 23, 2017, 09:16 Matt Layman, ***@***.***> wrote: As a community effort, this project depends on contributions from the community. It *is* actively maintained and had a release on Friday for 2.2 which includes many months of contributions from a wide variety of individuals. I respectfully disagree with your conclusion that it is "*completely* unsuitable for *any* production environment or large data sets" (emphasis added). Aside from being hyperbolic (which I don't think is fruitful for technical conversation), I see this claim as false, knowing that I and many others have deployed DJA in a production setting. IMO, the problem you are reporting is a limitation that is not unique to DJA. Selecting an appropriate queryset is true for DJA and DRF and Django underneath it. I think this is an area where Django (and Python in general) prefers being explicit over implicit. Rather than trying to intuit the appropriate selected_related additions, it is up to the developer to include this in the queryset. This strategy prevents code from over-selecting from other tables and bloating a db query. I thought I demonstrated that as I walked through how to eliminate the N+1 query bug from the example app. If I failed to make that clear enough, I apologize. I think your comment from a few days ago (#337 (comment) <#337 (comment)>) gets at the heart of the piece that *can* be solved by a PR. If someone could find a way to make sure the ID is fetched from model_id instead of model.id, that would be awesome. I'd be happy to review a PR fixing this issue, however, I don't have the time to fix this myself. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#337 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABfg5nrNtcAIbxzV3ir4JKzt35HUpHSHks5ry080gaJpZM4M9swu> .

jsenecal · 2017-04-23T20:44:01Z

To actually answer your original question: Retrieving only the primary key of related objects was preventing any kind of querying about those related objects the way that is required by the JsonAPI format. Unless this was fixed by change in the drf API, I don't think there is any other solution to achieve the same results but we are open to suggestions. On Sun, Apr 23, 2017, 16:37 Jonathan Senecal, <[email protected]> wrote:

…

A long time ago, I needed this project to perform the required tasks needed to be able to use this package for a production environment. As the project was not mature enough to perform that task I decided I should contribute to it and have the whole world benefit from it. This is what I think Opensource should be, and not having people expect everyone to do the hard work for them and complain about x or y feature that could be missing even without knowing the actual work involved into implementing those features. This project was created to provide a Jsonapi compatible output for DRF, I think it does that very well. I had to let go of the project because our production project took a turn where we couldn't use Django resf framework anymore, but hey, it was open source, so people took it from there and it continued to evolve. This is what's great about Opensource projects communities. - what's not so great is actually having people complain without any form of respect for the work that was accomplished. ... My two cents. On Sun, Apr 23, 2017, 09:16 Matt Layman, ***@***.***> wrote: > As a community effort, this project depends on contributions from the > community. It *is* actively maintained and had a release on Friday for > 2.2 which includes many months of contributions from a wide variety of > individuals. > > I respectfully disagree with your conclusion that it is "*completely* > unsuitable for *any* production environment or large data sets" > (emphasis added). Aside from being hyperbolic (which I don't think is > fruitful for technical conversation), I see this claim as false, knowing > that I and many others have deployed DJA in a production setting. > > IMO, the problem you are reporting is a limitation that is not unique to > DJA. Selecting an appropriate queryset is true for DJA and DRF and Django > underneath it. I think this is an area where Django (and Python in general) > prefers being explicit over implicit. Rather than trying to intuit the > appropriate selected_related additions, it is up to the developer to > include this in the queryset. This strategy prevents code from > over-selecting from other tables and bloating a db query. I thought I > demonstrated that as I walked through how to eliminate the N+1 query bug > from the example app. If I failed to make that clear enough, I apologize. > > I think your comment from a few days ago (#337 (comment) > <#337 (comment)>) > gets at the heart of the piece that *can* be solved by a PR. If someone > could find a way to make sure the ID is fetched from model_id instead of > model.id, that would be awesome. I'd be happy to review a PR fixing this > issue, however, I don't have the time to fix this myself. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#337 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABfg5nrNtcAIbxzV3ir4JKzt35HUpHSHks5ry080gaJpZM4M9swu> > . >

aidanlister · 2017-04-26T10:14:14Z

@jsenecal I'm not sure who's complaining about the package? Several of our developers have made contributions to this package over the last six months, I'm very glad it's available. In fact, our product depends on it.

That's why I'm pretty worried that a critical issue like this was commented on with "I don't see anything directly actionable with this issue." ... perhaps my comment was a bit hyperbolic, but if I re-state it as "completely unsuitable for any production environment with large data sets" you might understand where I'm coming from.

Here's a scenario:

If you have a single model, e.g. Book, which has four relations e.g. Author, Publisher, CopyrightHolder, Category.

To display 25 books in DRF without any includes, I would need a single query: SELECT * FROM book.

To display 25 books DRF-JSONAPI without any includes, I would need either:
a) 1 query ala SELECT * FROM books LEFT JOIN author LEFT JOIN publisher LEFT JOIN CopyrightHolder LEFT JOIN Category
b) 4 queries with prefetches.

Let's say I have 1M books, 50k authors, 10k categories, 10k copyrightholders. In the select_related scenario, you've just created a in-memory table with 1e18 rows ... do this a few times per second and you have melted your database. All to display 25 rows, with no included relationships. So select_related is only going to work if you have a small dataset or a small volume of traffic.

... so maybe we can do the prefetch option. Sure now we need 4x more queries, but at least they don't cause the database to melt down.

However, this doesn't scale with the app complexity. If you want to include the Author, Category and CopyrightHolder with the books, then you have to prefetch any relationships those models might have! Just to display the FK!

You start ending up writing your ViewSet's like this:

class BookViewSet(ModelViewSet):
    queryset = Book.objects.all().prefetch_related(
        'book__author__tags',
        'book__author__client__tags',
        'book__author__client__primary_contact',
        'book__author__client__billingcard',
        'book__author__client__account',
        'book__copyrightholder__client__account_manager',
        'book__copyrightholder__client__clientgroup__accounts',
        'book__copyrightholder__branch__groups',
        'book__copyrightholder__branch__user_set',
        'book__publisher__address',
        'book__publisher__servicecontract_set',
        'book__publisher__account_manager',
        'book__property__billingcard',
        'book__service_group',
        'book__type__standard',
        'book__category_set__type',
        'book__type__categorytype_set',
    )

This is not a contrived example. That's an actual ModelViewSet we have (though I had to slightly obfuscate the model names).

Once you have even a moderately complex app, it becomes impossible to stop query explosions as new relationships get added -- every model that might include that model's API needs to be updated.

So while I agree with you that my earlier comment was to some extent hyperbole, it depends on your definition of production, and if this isn't fixable then we're in deep shit and need to rethink our API strategy (which is of course, not your problem).

aidanlister · 2017-04-26T10:33:04Z

@mblayman you talk about trying to serialize entry_id not entry.id but you need to understand how the serializer works, take a look at how DRF itself solves this issue see use_pk_only_optimization and PKOnlyObject.

@jsenecal have you got any time to help our team fix this? I've gone as far as I can. Surely somehow we can get the type from the relationship itself (not the model) and pass that to be used by the down stream serializer? I tried here... uptick@94f79c4

Then maybe we can avoid doing work on models that aren't included, e.g. get_links shouldn't be called all the way down the stack ... uptick@94f79c4#diff-e2e58e3e5c345fc3b14c160b3a00a64cL119

If that can't be done easily, would it be reasonable to create a fork of this package as a "fast mode" which aims to implement less of the spec, but does it in a reasonably performant way?

mblayman · 2017-04-26T13:42:57Z

First, please allow me to apologize. Sometimes text really sucks as a communication format. 😄 While I wrote "I don't see anything directly actionable with this issue," it would have been more accurate to write "I see a few performance problems here and I don't know which of them we can/should focus on specifically."

Second, what is the outcome that you're hoping for? It's pretty clear you're looking for a fix to these performance problems. You have provided a heroic amount of data and I appreciate that. I think the remaining problem is that you're hoping one of the maintainers will fix the performance issues. That's where I think this issue is crashing into the waves of open source. A quick peek at the contributor graph will reveal that the biggest contributors have not made code contributions in over 6 months (jerel's most recent stuff was doing the release work for 2.2). I was trying to be truthful calling the project "active." It is active in the sense that PRs get reviewed and merged and issues get commented on. It is not active in the sense that some person(s) is working on the reported issues with any regularity.

I think that leads us back to a pull request. From what you've written, it seems like your team is acutely feeling the pain of performance issues in your business. You've demonstrated really great knowledge of what the problem is and where it occurs. I will happily review any PRs that start to fix these performance issues. That's as much time as I can personally offer as this is a side project for me and no one is paying me to do any of this. 😄

Thanks for your efforts so far.

jsenecal · 2017-04-26T22:21:04Z

I'm all with @mblayman side on this as I unfortunately cannot currently commit on any availability in the near future. Hopefully this community wil be able to make good use of the submitted code and solve your issues in a timely manner.

…

On Wed, Apr 26, 2017, 09:42 Matt Layman, ***@***.***> wrote: First, please allow me to apologize. Sometimes text really sucks as a communication format. 😄 While I wrote "I don't see anything directly actionable with this issue," it would have been more accurate to write "I see a few performance problems here and I don't know which of them we can/should focus on specifically." Second, what is the outcome that your hoping for? It's pretty clear you're looking for a fix to these performance problems. You have provided a heroic amount of data and I appreciate that. I think the remaining problem is that you're hoping one of the maintainers will fix the performance issues. That's where I think this issue is crashing into the waves of open source. A quick peek at the contributor graph will reveal that the biggest contributors have not made code contributions in over 6 months (jerel's most recent stuff was doing the release work for 2.2). I was trying to be truthful calling the project "active." It is active in the sense that PRs get reviewed and merged and issues get commented on. It is not active in the sense that some person(s) is working on the reported issues with any regularity. I think that leads us back to a pull request. From what you've written, it seems like your team is acutely feeling the pain of performance issues in your business. You've demonstrated really great knowledge of what the problem is and where it occurs. I will *happily* review any PRs that start to fix these performance issues. That's as much time as I can personally offer as this is a side project for me and no one is paying me to do any of this. 😄 Thanks for your efforts so far. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#337 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABfg5omUU28T4uSVt62m7Qa_3vl5Ugscks5rz0nigaJpZM4M9swu> .

aidanlister · 2017-05-02T21:39:48Z

@mblayman my goals were:

a) have a maintainer see the issue and fix it
b) if there's no maintainer time then get a point in the right direction
c) failing that, ensure that if we do commit the time and money to fixing the issue that it's a sensible investment, e.g. that the package is maintained and the maintainers are reasonable.

We've decided to go ahead and put some more senior developer time to fixing the issue, but I would like to have one of our team be added as a maintainer at some point, either @bor3ham, @jarekwg, or @sksavant once you're comfortable with their contributions and maintainer style.

@sksavant will follow up on this issue if he's able to come up with a solution.

mblayman · 2017-05-02T21:58:57Z

Awesome! I'm excited to have other folks involved and look forward to any future contributions.

emojiface-ops · 2017-06-14T03:03:40Z

@aidanlister appreciate your work on this issue, curious if you've gotten traction fixing the performance problems.

aidanlister · 2017-06-14T10:28:45Z

Yes, actually the work I've done in this thread covers most of the issue, we expect to have a new production version up in the next four weeks.

emojiface-ops · 2017-06-14T17:40:56Z

@aidanlister thanks for the quick response, I'll check out your fork. Are you planning to commit the "production version" on your fork?

aidanlister · 2017-06-15T09:13:50Z

Yes, it should go through to our fork when it's finished ... and I'll submit a PR when we finish testing.

aidanlister · 2017-07-22T02:30:10Z

Looks great! Let's get it merged into develop and I'll rebase my WIP to look at the two parts of the query explosion.

aidanlister · 2017-07-22T06:50:05Z

@jsenecal it would be amusing if we both caused and then later reported this explosion ... that commit you identified came from @jarekwg and @bor3ham (both from Uptick/ABAS). Here's the repository that PR came from:

https://github.com/jarekwg/django-rest-framework-json-api/commits/optimize/relationship-extraction

Issue #337

jsenecal · 2017-07-22T07:07:26Z

I will now sign off but we need to open a separate issue as this is not yet completely resolved (we still have n*relation query explosion if queryset does not use prefetch_related)

aidanlister · 2017-07-22T07:08:43Z

👍

aidanlister · 2017-07-22T07:09:17Z

Keep this issue open? The PR is good but doesn't actually address this yet :)

jsenecal · 2017-07-22T07:11:12Z

I guess you are right - Lets keep track of the discussion here

aidanlister · 2017-07-22T10:47:05Z

Here's my progress on the explosion now:
develop...ABASystems:wip/find-query-explosions

Note to anyone else playing along, you can run the project with django-admin.py shell --settings=example.settings.dev and run tests with py.test -s -k test_performance

This code gets the first test to pass (test_query_count_no_includes) which is great ... but breaks on a simple include (test_query_count_include_author) which I haven't been able to debug yet (seems to start from extract_included)

xgilest · 2017-07-24T08:20:14Z

Hi @mcanaves is working on a solution for the problem with the renderer quering to the database for checking the id of foreing keys exists.
As stated earlier this meant for us that the API was to slow to be operational, hopefully with the great effort done by the community latelly, this will change.

aidanlister · 2017-07-29T10:57:22Z

How's everyone going on this? Harder than it looks huh!

jsenecal · 2017-07-29T13:34:26Z

I am actually waiting for Xavi to submit his work, no need to work in parallel :/

…

On Sat, Jul 29, 2017, 06:57 Aidan Lister, ***@***.***> wrote: How's everyone going on this? Harder than it looks huh! — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#337 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABfg5sunYs3T0N9IfNk3DwEvFPV7iLN8ks5sSxATgaJpZM4M9swu> .

gregorypease280 · 2017-10-14T00:02:39Z

anyone still actively working on this?

aidanlister · 2017-10-18T04:19:35Z

No, not as far as I am aware. I wasn't able to progress it nor was @sksavant ... it's just whack-a-mole pushing the problem down the line.

DRF / JSON API has gotten so slow for us we're starting to think about something like Go, https://github.com/bor3ham/reja

santiavenda2 · 2017-12-07T03:44:22Z

Hi guys,

Some time ago we did this PR trying to fix some of the problems that you mentioned during this conversation.

Let me know if the changes makes sense and if they are ready to be merged.

All tests are running fine and we have been using this version in production without having any kind of problems.

mblayman · 2017-12-12T16:16:02Z

Ack, my bad, @santiavenda2. I missed that the PR posted was related to this. I'll try to get it reviewed soon (hopefully this weekend if I have some free time). After all the discussion that happened here, I certainly did not intentionally overlook the PR. Thanks for the poke.

aidanlister · 2017-12-13T12:14:25Z

@santiavenda2 neat, we will take a look too. Anything we should watch out for?

santiavenda2 · 2017-12-14T01:14:21Z

@aidanlister yes, these are the main changes
https://github.com/django-json-api/django-rest-framework-json-api/pull/374/files#diff-e2e58e3e5c345fc3b14c160b3a00a64cR127
https://github.com/django-json-api/django-rest-framework-json-api/pull/374/files#diff-e2e58e3e5c345fc3b14c160b3a00a64cR259
https://github.com/django-json-api/django-rest-framework-json-api/pull/374/files#diff-8f2fdb5cb1d819dce410d6b74d0b77b6R172

We tried to make two improvements:

try to not fetch related objects when we already have the id of the related object and we know the object type.
also we avoid fetching related objects if the related object is None.

You can see these improvements in this viewset
As you can see, the select_related and some of the prefetch_for_includes are not needed any more

Other changes are just code quality improvements and fixes for code compatibility with old django versions.

mblayman · 2018-01-09T02:04:17Z

With the work done in #374, do we have enough in place to call this issue fixed? Are there areas where DJA still issues too many queries?

sliverc · 2018-06-08T11:53:15Z

Query explosion on simple models without inclusion looks now fine for me. I do still see issues when using included serializers though.

views.AutoPrefetchMixin does the lookup for direct included relationships. Such a relationship can have reverse relationships though and each of them still execute a query. I currently use views.PrefetchForIncludesHelperMixin as a workaround but AutoPrefetchMixin should prefetch reverse included relationships automatically though.

timcbaoth · 2018-08-21T11:35:23Z

I stumbled upon this thread when I noticed a query explosion for a project at work. I found that there was a bug in https://github.com/django-json-api/django-rest-framework-json-api/pull/374/files#diff-e2e58e3e5c345fc3b14c160b3a00a64cR127 which resulted in the optimization of not querying forward relations not being applied.

I will provide a PR shortly, if that is OK.

sliverc · 2021-04-18T17:47:43Z

Most of the concern in this PR actually have been addressed. There are still ways to reduce number of queries. Currently in this thread it is not clear what more needs to be done though.

So closing this issue and forward the conversation to discussion #921 and encourage everyone with an idea on reducing number of queries to engage there. Out of that discussion concrete new issues can then be extracted to work on.

jsenecal closed this as completed in #366 Jul 22, 2017

jsenecal added a commit that referenced this issue Jul 22, 2017

Performance helpers

779cd9b

Issue #337

jsenecal reopened this Jul 22, 2017

jsenecal self-assigned this Jul 22, 2017

jsenecal added bug enhancement labels Jul 22, 2017

timcbaoth mentioned this issue Aug 21, 2018

Do not lazy load forward relations #461

Merged

5 tasks

sliverc mentioned this issue Mar 12, 2019

Do select_related for ?include #591

Closed

sliverc closed this as completed Apr 18, 2021

jarekwg mentioned this issue Jul 22, 2021

Feature/nested prefetching #964

Closed

Query explosion even on simple models with no included serializers #337

Query explosion even on simple models with no included serializers #337

Comments

aidanlister commented Apr 14, 2017

mblayman commented Apr 14, 2017

aidanlister commented Apr 14, 2017 • edited Loading

aidanlister commented Apr 15, 2017 • edited Loading

mblayman commented Apr 15, 2017

aidanlister commented Apr 15, 2017

mblayman commented Apr 15, 2017 • edited Loading

mblayman commented Apr 15, 2017

aidanlister commented Apr 15, 2017

aidanlister commented Apr 15, 2017

aidanlister commented Apr 15, 2017

aidanlister commented Apr 15, 2017

aidanlister commented Apr 15, 2017

aidanlister commented Apr 15, 2017

aidanlister commented Apr 15, 2017 • edited Loading

aidanlister commented Apr 22, 2017

mblayman commented Apr 22, 2017

aidanlister commented Apr 23, 2017

mblayman commented Apr 23, 2017

jsenecal commented Apr 23, 2017 via email

jsenecal commented Apr 23, 2017 via email

aidanlister commented Apr 26, 2017 • edited Loading

aidanlister commented Apr 26, 2017

mblayman commented Apr 26, 2017 • edited Loading

jsenecal commented Apr 26, 2017 via email

aidanlister commented May 2, 2017

mblayman commented May 2, 2017

emojiface-ops commented Jun 14, 2017

aidanlister commented Jun 14, 2017

emojiface-ops commented Jun 14, 2017

aidanlister commented Jun 15, 2017

aidanlister commented Jul 22, 2017

aidanlister commented Jul 22, 2017

jsenecal commented Jul 22, 2017

aidanlister commented Jul 22, 2017

aidanlister commented Jul 22, 2017

jsenecal commented Jul 22, 2017

aidanlister commented Jul 22, 2017

xgilest commented Jul 24, 2017

aidanlister commented Jul 29, 2017

jsenecal commented Jul 29, 2017 via email

gregorypease280 commented Oct 14, 2017

aidanlister commented Oct 18, 2017

santiavenda2 commented Dec 7, 2017

mblayman commented Dec 12, 2017

aidanlister commented Dec 13, 2017

santiavenda2 commented Dec 14, 2017 • edited Loading

mblayman commented Jan 9, 2018

sliverc commented Jun 8, 2018

timcbaoth commented Aug 21, 2018

sliverc commented Apr 18, 2021

aidanlister commented Apr 14, 2017 •

edited

Loading

aidanlister commented Apr 15, 2017 •

edited

Loading

mblayman commented Apr 15, 2017 •

edited

Loading

aidanlister commented Apr 15, 2017 •

edited

Loading

aidanlister commented Apr 26, 2017 •

edited

Loading

mblayman commented Apr 26, 2017 •

edited

Loading

santiavenda2 commented Dec 14, 2017 •

edited

Loading