-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: lookup attribute instead of performing a deepcopy #226
fix: lookup attribute instead of performing a deepcopy #226
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Codecov Report
@@ Coverage Diff @@
## main #226 +/- ##
========================================
Coverage ? 100.00%
========================================
Files ? 22
Lines ? 1004
Branches ? 227
========================================
Hits ? 1004
Misses ? 0
Partials ? 0 Continue to review full report at Codecov.
|
@googlebot I signed it! |
It looks like the cpp runtime is causing problems because the type it exposes doesn't have the same python-visible fields. There's a couple of ways we can try solving this problem:
--- a/proto/marshal/marshal.py
+++ b/proto/marshal/marshal.py
@@ -155,12 +155,12 @@ class BaseMarshal:
# Return a view around it that implements MutableSequence.
value_type = type(value) # Minor performance boost over isinstance
if value_type in compat.repeated_composite_types:
- return RepeatedComposite(value, marshal=self)
+ return RepeatedComposite(value, marshal=self, proto_type=proto_type)
if value_type in compat.repeated_scalar_types:
if isinstance(proto_type, type):
return RepeatedComposite(value, marshal=self, proto_type=proto_type)
else:
- return Repeated(value, marshal=self)
+ return Repeated(value, marshal=self, proto_type=proto_type) |
How about we go with 2 to unblock folks, and follow up with the protobuf folks to figure out which of 1, 3, or 4 is the best option? Is there a person we can tag in to this PR? I am somewhat uncomfortable with relying on a private attribute from |
Also +1 to option 2. |
Do we have a sense of what percentage of libraries would benefit from this change, versus trip the try/catch and actually run slower? I certainly don't, but I do know that As I'm thinking about this further, I suppose the answer is to only bump I also don't know anything about the cpp layer and would love to learn more. |
@craiglabenz The cpp option is opt-in according to the documentation, although I saw some comments on issues that seemed to suggest it was default. https://developers.google.com/protocol-buffers/docs/reference/python-generated#cpp_impl I think it would be helpful to clarify with a protobuf person who knows for sure. |
We could also check for the attr without a try/catch if we are concerned about the cost of that. |
I am increasingly suspicious that this deepcopy is not the guilty party. In testing the implications of this change and the snippet proposed in the comment above, I just discovered that all of python-datastore's slowness must be coming from elsewhere, because my scratch code that recreates said slowness never reaches that deepcopy. |
Responding to a bunch of things all at once.
There's a couple misconceptions in this question. The python/cpp protobuf runtimes are agnostic of the client libraries themselves: they are determined by platform and environment variables. The cpp runtime is much faster in certain real world workloads, but I don't have a good idea of what the breakdown in user environments between the cpp and py runtimes is.
This is a little bit complicated. If a cpp runtime for a particular platform exists and is installed, it is now the default version. We can see this via code archaeology in api_implementation.py:40.
That's entirely possible. Designing a good, helpful benchmark that mimics real world uses is tricky. I gave up after about two hours trying to recreate the benchmark linked in the issue for this PR. On the other hand, |
What are people's required timelines for dealing with this performance regression? Clearly sooner is better, but is anything being blocked, or has conversion to the new client libraries been halted, or anything? |
canary = copy.deepcopy(self.pb).add() | ||
return type(canary) | ||
# We have no members in the list, so we get the type from the attributes. | ||
return self.pb._message_descriptor._concrete_class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ported this change to googleapis/python-datastore and discovered that these attributes are not universally available.
We might need to do something like this, but then assuming an equivalent change, this would LGTM.
if hasattr(self.pb, '_message_descriptor') and hasattr(self.pb._message_descriptor, '_concrete_class'):
return self.pb._message_descriptor._concrete_class
canary = copy.deepcopy(self.pb).add()
return type(canary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these attributes are not universally available.
This is the difference between the cpp and python protobuf runtimes. The concrete type of self.pb
is different depending on which runtime is being used; it is not dependent on the API or the client library itself.
The _message_descriptor
attribute is apparently considered an implementation detail of the python based protobuf runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I partially followed that (due to my ongoing ignorance of the cpp layer), but to clarify, are you suggesting that this line of code will work everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change will work for all client libraries IFF the application process is using the python protobuf runtime.
This is the relevant chunk of the tech stack:
- User application
- Client library manual layer (optional. Firestore has a manual layer, Texttospeech does not)
- Generated client library, aka GAPIC
- Proto plus runtime, i.e. this repo
- General protobuf runtime, either written in pure python or in cpp as a python extension, which is determined at runtime. Proto plus should be agnostic about which which is being used.
The lowest layer in the above is library is preventing the general fix from merging. The two different implementations provide different unofficial APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@software-dov can you expand on the pure python vs cpp protobuf runtime? How would I get to each of these? Which one is used by cloud libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one is used by cloud libraries?
It is chosen dynamically at runtime based on the platform (linux, macos X amd64, aarch64), version of protobuf installed, and environment variables. To be strictly general and not break cloud, any solution must be compatible with both. I would imagine, based on the environment they're running in, that most user applications tend to use the cpp runtime.
The protobuf runtime is responsible for memory layout, serialization and deserialization, and message introspection. It is the code that allocates memory and performs host-to-network and network-to-host bit conversion.
@software-dov I know of at least one instance where a user has reverted to the previous major version due to this regression relative to the monolith generator. |
I opened https://groups.google.com/g/protobuf/c/pYcq-UBixqU to get for some clarification from the Protobuf folks on the specific situations protobuf defaults to the C++ implementation (as well as when the change was made). If you're curious, you can figure out which implementation protobuf is using with this snippet. I get 'cpp' on a fresh install on my linux workstation. from google.protobuf.internal import api_implementation
print(api_implementation.Type()) Note that protobuf discourages using Are folks alright with the code in this PR coupled with a fallback to deepcopy to handle the CPP case ( |
Hello guys, what is happening with this PR? If it can be fixed with simple |
Waiting for #312 |
Closes #224.