Cache/pickle performance for Serializer data (many=True) #2454

alvinchow86 · 2015-01-23T20:02:56Z

I just migrated my project from DRF2.4 to DRF3.0 (exciting stuff!) and noticed slower performance for some endpoints. After extensive debugging and inspection, I think I've narrowed it down. Basically we are caching some large serializer data (using django.core.cache, which uses pickle).

There are two changes in DRF3, one is using Python 2.7's OrderedDict instead of Django's SortedDict, and the other is the ReturnDict and ReturnList wrapper for serializer.data. From my own basic experiments, it seems that as far as pickling/unpickling performance, the standard dict is fastest, as expected. SortedDict is a bit slower (**~ 1.4x**), and surprisingly, OrderedDict is much slower (~2.5x) ! DRF2.4 was using SortedDict, and this would account for why our unpickling (and reading cached data) was noticeably slower in DRF3.0

This commit e59b3d1 fixed pickling , referenced in #2360. I believe it also (intentionally or unintentionally) fixed the pickling performance issue, since the __reduce__() just converts a ReturnDict into a standard dict. However the issue in the case of instantiating a many=True Serializer, as ReturnList still contains a list of OrderedDict instances.

One could argue that if a single item serializer output is pickled as dict, then a list output might as well do the same thing.

Some possible fixes I can think of:

Allow user to specify what dictionary class they want to use, at least for the final serialized data output. For JSON response, it's not as if order matters anyway.
Make ReturnList somehow return an array of ReturnDict's, or some other wrapper that changes the pickling behavior..

(We could also argue we shouldn't use pickle to begin with.. json may actually be a better and faster alternative. Still, that would mean porting over a lot of code, and a lot of people are probably going to use Django's built-in caching)

FYI I am using Python 2.7, and experiments were done OS X Yosemite, but performance issues were observed on a Linux server deployment as well. Happy to share more data from my experiments if it's useful.

The text was updated successfully, but these errors were encountered:

tomchristie · 2015-01-23T22:06:36Z

Interesting stuff, thanks!
Will comment more fully in due course.

tomchristie · 2015-01-26T09:26:07Z

See also discussion on #1994.

carltongibson · 2017-11-21T08:13:26Z

I've opened #5614 to investigate Benchmarking and Performance Improvements. I'm going to close this as blocked pending that. As and when we get a decent benchmarking solution in place we will revisit this and the other related performance issues.

tomchristie added the Needs further review label Jan 26, 2015

alvinchow86 mentioned this issue Jan 26, 2015

Document explicit speedup approaches. #1994

Closed

tomchristie mentioned this issue Oct 18, 2016

Serializer - one level copy instead of deepcopy #4587

Closed

carltongibson mentioned this issue Nov 21, 2017

Benchmarking and Performance Improvements #5614

Closed

carltongibson closed this as completed Nov 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache/pickle performance for Serializer data (many=True) #2454

Cache/pickle performance for Serializer data (many=True) #2454

alvinchow86 commented Jan 23, 2015

tomchristie commented Jan 23, 2015

tomchristie commented Jan 26, 2015

carltongibson commented Nov 21, 2017

Cache/pickle performance for Serializer data (many=True) #2454

Cache/pickle performance for Serializer data (many=True) #2454

Comments

alvinchow86 commented Jan 23, 2015

tomchristie commented Jan 23, 2015

tomchristie commented Jan 26, 2015

carltongibson commented Nov 21, 2017