-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Only generate ModelSerializer fields once #7093
Conversation
thanks for tackling this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally in favor of this PR, left a few comments below.
I think the only major concern is if there are any common use cases that rely on any of the serializer instance attributes. e.g., depending on partial
or the context
.
# Methods for determining the set of field names to include... | ||
|
||
def get_field_names(self, declared_fields, info): | ||
@classmethod | ||
def get_field_names(cls, declared_fields, info): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these methods are part of the public API, we can expect that users have overridden them as instance methods. Since we're transitioning to classmethod
s, we should provide a user-friendly warning informing them of the change. Should be fairly straightforward as:
# this will fail if `get_field_names` is an instance method.
assert inspect.ismethod(cls.get_field_names), "<helpful message>"
# or
assert inspect.ismethod(type(self).get_field_names), "<helpful message>"
This should probably be raised sometime during class creation (possibly the aforementioned metaclass).
Thanks for the comments @rpkilby. I remembered from a previous PR that DRF has a requirement to work even before Django is initialized (checked by I changed the name to Regarding the compat concerns, I left it out for now, to get feedback on the "end state" first. But if we agree on this approach, I'll come up with a proposal for the transition. I think we'll want at least one deprecation cycle where the |
One quick comment - DRF itself needs to be importable, however this does not extend to user-defined serializers. Building fields would only happen in user-defined serializer classes, so these changes don't affect DRF importability. |
Here is a commit which does this (not for this PR): bluetech@067147c |
Hi @rsiemens. At this point, it's an issue of bandwidth of the maintenance team. While the changes aren't complicated, they are non-trivial (the public-facing serializer API has been updated). We need to consider things like deprecations for users who have overridden these methods, etc. I'm adding to the 3.12 milestone to help ensure this is looked at, but no guarantees. |
Thanks for the update! Happy to lend a hand if needed. |
My concern would be that the change footprint may be too big on this. It looks pretty risky. |
This definitely carries some breakage risk. I figure such risk is not acceptable at this stage of DRF's life. So I'm going to close this now -- thanks for considering! FWIW, we use an internal fork of DRF with this patch and some others to improve performance, though it also removes some features we don't use which got in the way. My plan is to make the field instances immutable which would also remove the |
I think there are generally two cases to consider:
For the former, I'm not too concerned. We can detect whether the method is a class or an instance method and raise a warning notifying the user of the change. In a lot of these cases, they probably just need to wrap the method in For the latter, there isn't really a good option. If their code is dependent on instance variables, then there isn't an easy migration path to the new classmethods. That said not sure how common this is, and per-instance field modifications could probably be moved to another serializer method. * I actually do have an idea for a clean deprecation path, but whether or not it's a good idea remains to be seen. |
Serialization performance in general and for ModelSerializer specially has been always unsatisfying for a long time, I think going down this road is an important and a great step even if the road wasn't the best one, I think the most would accept a little bit of small road bumps for such a great addition. |
The low risk approach here is to begin this as a third-party package. Once it's shown to work/be stable there's a case to be made for a change
I think that's the key right? (We really can't just break things...) |
Problem
References #2504, #5614.
We have several large projects which use DRF serializers. ModelSerializers are a great feature which saves a lot of time and mistakes. However, as the projects grow and the models grow, the slowness of ModelSerializer starts to show. The usual solution for this problem is "use something else", but we prefer to try and improve ModelSerializer's performance first, if possible.
The following is a somewhat realistic benchmark, comparing serialization of one of our models using a ModelSerializer, a regular serializer (it's just the
str(ModelSerializer())
), and a custom serializer.Benchmark
The results (before this PR):
From the profile, it is clear that the difference between
model
andregular
is due to ModelSerializer re-generating the model fields each time it is instantiated.Solution
This PR only generates the fields once, when the class is defined, using a metaclass. With this change, the benchmark results are
The PR contains breaking changes, in that all of the codes participating in generating the fields now becomes classmethods/class attributes instead of self attributes/methods. This is necessary in order to indicate that the code only runs one, and must not use anything from
self
because it is incidental. Specifically, the fields must be defined on the class now (they almost always are, already):and the following methods become classmethods (breaking for
ModelSerializer
subclasses which override them):Additionally, errors in the ModelSerializer definition (like e.g. not setting
fields = [...]
) are now raised on definition time, not on first instantiation.The DRF tests pass without any changes, except for adapting to when the errors are raised.
Possible further work
On the deserialization side, I believe the validators are generated every time. They can be cached too.
From profiling the benchmark after the changes, the major remaining slowdown is the
copy.deepcopy
of the fields on each instantiation. For reference, changing to a shallow copy (of each field) instead of a deepcopy brings themodel
time to1.3842s
(was attempted already before, #4587). The optimal solution would be to make the fields immutable, thus not requiring copy at all -- but that is a larger breaking change presumably.