-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Protobuf Performance Refactor #230
Conversation
Codecov Report
@@ Coverage Diff @@
## master #230 +/- ##
===========================================
- Coverage 100.00% 94.90% -5.10%
===========================================
Files 20 20
Lines 862 1158 +296
Branches 149 238 +89
===========================================
+ Hits 862 1099 +237
- Misses 0 32 +32
- Partials 0 27 +27
Continue to review full report at Codecov.
|
5746a77
to
a085789
Compare
Note for code reviewers: There are a few spots in this PR that are pseudo-experiments, and they're decorated with GitHub comments, but please feel free to critique it all! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some (hopefully) helpful mile-markers for code review.
value = self._set_coercion(value) | ||
value = self._hydrate_dicts(value) | ||
|
||
# Warning: `always_commit` is hacky! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the comment says, this is hacky. Sadly, it is a lynch-pin for the whole refactor.
def _throw_value_error(proto_type, primitive): | ||
raise ValueError(f"Unacceptable value {type(primitive)} for type {proto_type}") | ||
|
||
def validate_primitives(self, field, primitive): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is unfortunate, but is the only way to recreate immediate validation when attempting something like my_obj.some_string_field = 123
.
@@ -269,6 +287,14 @@ def __new__(mcls, name, bases, attrs): | |||
def __prepare__(mcls, name, bases, **kwargs): | |||
return collections.OrderedDict() | |||
|
|||
def add_to_class(cls, name, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This whole contribute_to_class
pattern is heavily (or, more like entirely) borrowed from Django.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran out of time today to look at more than proto/fields.py
. Hope the review is helpful anyway.
@craiglabenz Thank you for putting this together! @software-dov Could you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, going to need a few others to fully understand what's going on.
proto/fields.py
Outdated
# For the most part, only primitive values can be returned natively, meaning | ||
# this is either a Message itself, in which case, since we're dealing with the | ||
# underlying pb object, we need to sync all deferred fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a little confusing. The 'either' has no second branch.
Can you provide a link to your benchmark code? I've had a hard time creating good benchmarks that mimic real world message manipulation patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second review round, looking great!
class _NoneType: | ||
def __bool__(self): | ||
return False | ||
|
||
def __eq__(self, other): | ||
"""All _NoneType instances are equal""" | ||
return isinstance(other, _NoneType) | ||
|
||
|
||
_none = _NoneType() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outrageous nitpicking: we're not checking for None
so much as trying to use a unique canary distinct from regular None
. This could be condensed into
_canary = object()
...
value = getattr(instance, self.instance_attr_name, _canary)
if self.field.can_get_natively and value is not _canary:
proto/marshal/marshal.py
Outdated
max_int_32 = 1<<31 - 1 | ||
min_int_32 = max_int_32 * -1 | ||
min_int_32 = (max_int_32 * -1) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is correct. Isn't the correct min
(max_int_32 * -1) - 1
?
proto/message.py
Outdated
if obj and hasattr(obj, '_update_pb'): | ||
obj._update_pb() | ||
if obj and hasattr(obj, '_update_nested_pb'): | ||
obj._update_nested_pb() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a few readthroughs to understand why a field value may be a message and not have _update_pb
or _update_nested_pb
. I bet that if we changed the conditional logic a bit it might be more clear.
if obj and isinstance(obj, Message):
obj._update_pb()
obj._update_nested_pb()
I still need to mull over this logic a little bit. It seems like we might be able to skip fields under certain circumstances. Also, I'm not sure if the ordering matters. I'll need to do some experiments and come up with examples and or counterexamples.
"""Loops over any stale fields and syncs them to the underlying pb2 object. | ||
""" | ||
merge_params = {} | ||
for field_name in getattr(self, '_stale_fields', set()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: we can just take the existence of _stale_fields
as given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LIkewise here, constructing a set is expensive compared to using an already created (or interned) value.
Closing this PR because it optimizes use cases that are different from what we see in client libraries like python-datastore and python-firestore. Specifically, caching attributes is only helpful if they are read > 1 time, yet python-datastore converts all proto-plus instances into dictionary-like objects; meaning each field is read exactly once before all future reads are performed against that dictionary. If you find this and would like to know more, reach out. |
Aims to speed up naive usage on proto-plus-python as a wrapper around primitive protobuf objects.
This PR replaces attribute misses on proto-plus objects, which wound up in the
__getattr__
override to query the hidden protobuf object directly, with descriptors which do the same inside a caching layer to optimize repeated access. This descriptor/caching layer also aims to avoid unnecessary marshaling altogether when circumstances allow, further reducing the runtime of basic tasks.The following is a performance chart of several simple tasks executed against the same proto definitions, first as raw protobuf objects and again as a proto-plus object. The two definitions are:
Note: All tasks were run 10,000 times.
When used as a dependency in python-datastore, the following proto-plus versions have these performance hydrating loaded 500 entities (network I/O is excluded):
via googleapis/python-datastore#155)
Resolves #222