Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency of "in" with BaseList #1116

Closed
kengruven opened this issue Sep 21, 2015 · 4 comments
Closed

Efficiency of "in" with BaseList #1116

kengruven opened this issue Sep 21, 2015 · 4 comments

Comments

@kengruven
Copy link

Suppose I have something like this:

    class Item(mongoengine.Document):
        title = StringField()

    class Container(mongoengine.Document):
        items = ListField(ReferenceField('Item'))

Now suppose I do:

    container_x = Container.objects.get(id=some_id_from_user)
    item_y = Item.objects.get(id=another_id_from_user)

    print(container_x._data)  # it's a list of DBRefs here

    is_y_in_x = item_y in container_x.items  # this loads container_x.items

    print(container_x._data)  # it's a list of Items here

The result is correct, but it's less efficient than I thought it would be. I didn't anticipate that doing a "Document in Document.ListField" expression would load cause the items in the ListField to be loaded.

I don't see why the loading needs to occur here. The item_y has an ID (a UUID, in fact). The container_x._data['items'] is a list of IDs (DBRefs, which are a class and an ID). This should be enough to determine list membership.

I think this would just involve writing a BaseList.contains method that checks for this case, but I'm no expert on MongoEngine internals.

@touilleMan
Copy link
Member

I think your example is broken:

is_y_in_x = item_y in container_x  # this loads container_x.items

shouldn't it be:

is_y_in_x = item_y in container_x.items

Anyway, I think you're right there we should be able to optimise this. For the moment I've created an unittest to show the trouble (see touilleMan@a02cc83)

I think about 2 way of doing this

  • Subclass the list containing references
  • Make the Document much more lazy by not fetching anything on creation

My guess is solution 2 is more difficult to implement but will likely offer more performances improvements (for example when doing doc.ref_on_other_doc.id with ref_on_other_doc a ReferenceField a query to the database is currently done which is useless)

@MRigal @thedrow Any idea if this is achievable ?

@kengruven
Copy link
Author

I think your example is broken: ...

Oops, yes, you're completely right. I copied it over incorrectly when markdown-izing it.

(for example when doing doc.ref_on_other_doc.id with ref_on_other_doc a ReferenceField a query to the database is currently done which is useless)

This particular case looks like issue #298. But yeah, that would be great to have, too!

@thedrow
Copy link
Contributor

thedrow commented Sep 28, 2015

Can you please elaborate on option 2?
What exactly would you make more lazy in the document class, when and why?

@wojcikstefan
Copy link
Member

#298 is the way to solve this issue. Let's track the progress there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants