[WIP] Protobuf Performance Refactor #230

craiglabenz · 2021-05-04T22:57:11Z

Aims to speed up naive usage on proto-plus-python as a wrapper around primitive protobuf objects.

This PR replaces attribute misses on proto-plus objects, which wound up in the __getattr__ override to query the hidden protobuf object directly, with descriptors which do the same inside a caching layer to optimize repeated access. This descriptor/caching layer also aims to avoid unnecessary marshaling altogether when circumstances allow, further reducing the runtime of basic tasks.

The following is a performance chart of several simple tasks executed against the same proto definitions, first as raw protobuf objects and again as a proto-plus object. The two definitions are:

Note: All tasks were run 10,000 times.

Generate a(n)...	pb2	proto-plus (before refactor)	proto-plus (with refactor)	refactor impact
empty instance	3ms	10ms	7ms	⬇️ 30% 🟢
populated instance	9ms	168ms	118ms	⬇️ 30% 🟢
populated nested objs	38ms	211ms	163ms	⬇️ 23% 🟢
populated nested objs (1x attribute reads)	39ms	326ms	419ms	⬆️ 29% 🔴
populated nested objs (10x attribute reads)	101ms	2009ms	706ms	⬇️ 65% 🟢
populated w serialization	20ms	195ms	280ms	⬆️ 43% 🔴
populated w lots of fields	25ms	624ms	440ms	⬇️ 30% 🟢
MapFields populated during init	124ms	799ms	626ms	⬇️ 21% 🟢
MapFields populated after init	--	1079ms	369ms	⬇️ 66% 🟢
deserializing objects	116ms	204ms	180ms	⬇️ 12% 🟢

When used as a dependency in python-datastore, the following proto-plus versions have these performance hydrating loaded 500 entities (network I/O is excluded):

datastore version	notes	before #230	with #230
1.15.3	before proto-plus	360ms	N/A
2.0.0	with proto-plus	675ms	800ms
2.1.2	proto-plus usage optimized via googleapis/python-datastore#155)	550ms	750ms

Resolves #222

codecov · 2021-05-04T22:58:04Z

Codecov Report

Merging #230 (d723f18) into master (6a43682) will decrease coverage by 5.09%.
The diff coverage is 84.22%.

@@             Coverage Diff             @@
##            master     #230      +/-   ##
===========================================
- Coverage   100.00%   94.90%   -5.10%     
===========================================
  Files           20       20              
  Lines          862     1158     +296     
  Branches       149      238      +89     
===========================================
+ Hits           862     1099     +237     
- Misses           0       32      +32     
- Partials         0       27      +27

Impacted Files	Coverage Δ
proto/modules.py	`100.00% <ø> (ø)`
proto/marshal/marshal.py	`73.93% <41.42%> (-26.07%)`	⬇️
proto/message.py	`98.00% <92.75%> (-2.00%)`	⬇️
proto/marshal/collections/maps.py	`96.55% <94.28%> (-3.45%)`	⬇️
proto/fields.py	`97.39% <96.12%> (-2.61%)`	⬇️
proto/enums.py	`100.00% <100.00%> (ø)`
proto/marshal/collections/repeated.py	`100.00% <100.00%> (ø)`
proto/marshal/rules/struct.py	`93.33% <0.00%> (-6.67%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e25dd7d...d723f18. Read the comment docs.

craiglabenz · 2021-05-14T17:13:41Z

Note for code reviewers: There are a few spots in this PR that are pseudo-experiments, and they're decorated with GitHub comments, but please feel free to critique it all!

craiglabenz

Some (hopefully) helpful mile-markers for code review.

craiglabenz · 2021-05-14T17:55:05Z

proto/fields.py

+        value = self._set_coercion(value)
+        value = self._hydrate_dicts(value)
+
+        # Warning: `always_commit` is hacky!


As the comment says, this is hacky. Sadly, it is a lynch-pin for the whole refactor.

craiglabenz · 2021-05-14T18:01:18Z

proto/marshal/marshal.py

+    def _throw_value_error(proto_type, primitive):
+        raise ValueError(f"Unacceptable value {type(primitive)} for type {proto_type}")
+
+    def validate_primitives(self, field, primitive):


This function is unfortunate, but is the only way to recreate immediate validation when attempting something like my_obj.some_string_field = 123.

craiglabenz · 2021-05-14T18:02:49Z

proto/message.py

@@ -269,6 +287,14 @@ def __new__(mcls, name, bases, attrs):
    def __prepare__(mcls, name, bases, **kwargs):
        return collections.OrderedDict()

+    def add_to_class(cls, name, value):


Note: This whole contribute_to_class pattern is heavily (or, more like entirely) borrowed from Django.

tseaver

Ran out of time today to look at more than proto/fields.py. Hope the review is helpful anyway.

proto/fields.py

busunkim96 · 2021-05-17T15:06:13Z

@craiglabenz Thank you for putting this together!

@software-dov Could you take a look?

software-dov

First pass, going to need a few others to fully understand what's going on.

proto/fields.py

software-dov · 2021-05-17T17:52:35Z

proto/fields.py

+            # For the most part, only primitive values can be returned natively, meaning
+            # this is either a Message itself, in which case, since we're dealing with the
+            # underlying pb object, we need to sync all deferred fields.


This comment is a little confusing. The 'either' has no second branch.

proto/marshal/collections/maps.py

proto/marshal/marshal.py

proto/message.py

software-dov · 2021-05-17T18:57:27Z

Can you provide a link to your benchmark code? I've had a hard time creating good benchmarks that mimic real world message manipulation patterns.

software-dov

Second review round, looking great!

software-dov · 2021-05-20T23:22:40Z

proto/fields.py

+class _NoneType:
+    def __bool__(self):
+        return False
+
+    def __eq__(self, other):
+        """All _NoneType instances are equal"""
+        return isinstance(other, _NoneType)
+
+
+_none = _NoneType()


Outrageous nitpicking: we're not checking for None so much as trying to use a unique canary distinct from regular None. This could be condensed into

_canary = object() ... value = getattr(instance, self.instance_attr_name, _canary) if self.field.can_get_natively and value is not _canary:

software-dov · 2021-05-20T23:27:24Z

proto/marshal/marshal.py

 max_int_32 = 1<<31 - 1
-min_int_32 = max_int_32 * -1
+min_int_32 = (max_int_32 * -1) + 1


I'm not sure this is correct. Isn't the correct min
(max_int_32 * -1) - 1?

proto/marshal/marshal.py

proto/fields.py

software-dov · 2021-05-21T00:01:38Z

proto/message.py

+                if obj and hasattr(obj, '_update_pb'):
+                    obj._update_pb()
+                if obj and hasattr(obj, '_update_nested_pb'):
+                    obj._update_nested_pb()


It took me a few readthroughs to understand why a field value may be a message and not have _update_pb or _update_nested_pb. I bet that if we changed the conditional logic a bit it might be more clear.

if obj and isinstance(obj, Message): obj._update_pb() obj._update_nested_pb()

I still need to mull over this logic a little bit. It seems like we might be able to skip fields under certain circumstances. Also, I'm not sure if the ordering matters. I'll need to do some experiments and come up with examples and or counterexamples.

proto/message.py

software-dov · 2021-05-21T00:07:22Z

proto/message.py

+        """Loops over any stale fields and syncs them to the underlying pb2 object.
+        """
+        merge_params = {}
+        for field_name in getattr(self, '_stale_fields', set()):


Same here: we can just take the existence of _stale_fields as given.

LIkewise here, constructing a set is expensive compared to using an already created (or interned) value.

craiglabenz · 2021-05-25T16:09:16Z

Closing this PR because it optimizes use cases that are different from what we see in client libraries like python-datastore and python-firestore.

Specifically, caching attributes is only helpful if they are read > 1 time, yet python-datastore converts all proto-plus instances into dictionary-like objects; meaning each field is read exactly once before all future reads are performed against that dictionary.

If you find this and would like to know more, reach out.

craiglabenz requested a review from a team as a code owner May 4, 2021 22:57

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label May 4, 2021

performance refactor for handoff to protobuf layer

a085789

craiglabenz force-pushed the 222-pb-performance branch from 5746a77 to a085789 Compare May 14, 2021 17:02

craiglabenz requested review from software-dov, crwilcox and busunkim96 May 14, 2021 17:12

craiglabenz added 3 commits May 14, 2021 10:36

added comments and docstrings

ff3c6f0

removed relic from abandoned experiment

1109927

more clean up on abandoned efforts

d04f72e

craiglabenz commented May 14, 2021

View reviewed changes

tseaver reviewed May 14, 2021

View reviewed changes

software-dov reviewed May 17, 2021

View reviewed changes

craiglabenz added 2 commits May 18, 2021 13:09

responded to code review

f57c932

responses to code review

fe17aae

software-dov reviewed May 21, 2021

View reviewed changes

responded to more code review

d723f18

craiglabenz closed this May 25, 2021

craiglabenz deleted the 222-pb-performance branch May 25, 2021 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Protobuf Performance Refactor #230

[WIP] Protobuf Performance Refactor #230

craiglabenz commented May 4, 2021 •

edited

Loading

codecov bot commented May 4, 2021 •

edited

Loading

craiglabenz commented May 14, 2021

craiglabenz left a comment

craiglabenz May 14, 2021

craiglabenz May 14, 2021

craiglabenz May 14, 2021

tseaver left a comment

busunkim96 commented May 17, 2021

software-dov left a comment

software-dov May 17, 2021

software-dov commented May 17, 2021

software-dov left a comment

software-dov May 20, 2021

software-dov May 20, 2021

software-dov May 21, 2021

software-dov May 21, 2021

tseaver May 22, 2021

craiglabenz commented May 25, 2021

[WIP] Protobuf Performance Refactor #230

[WIP] Protobuf Performance Refactor #230

Conversation

craiglabenz commented May 4, 2021 • edited Loading

codecov bot commented May 4, 2021 • edited Loading

Codecov Report

craiglabenz commented May 14, 2021

craiglabenz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tseaver left a comment

Choose a reason for hiding this comment

busunkim96 commented May 17, 2021

software-dov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

software-dov commented May 17, 2021

software-dov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

craiglabenz commented May 25, 2021

craiglabenz commented May 4, 2021 •

edited

Loading

codecov bot commented May 4, 2021 •

edited

Loading