-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regressions in Python in version 3.18.0 #9014
Comments
If you run a similar test with the below generated _pb2 you can see an even more pronounced regression:
That is a 61% regression in class initialization and a 157% regression when setting fields. # -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: protobuf_fixture.proto
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)
_sym_db = _symbol_database.Default()
DESCRIPTOR = _descriptor.FileDescriptor(
name="protobuf_fixture.proto",
package="",
syntax="proto3",
serialized_options=None,
create_key=_descriptor._internal_create_key,
serialized_pb=b'\n\x16protobuf_fixture.proto"\x1f\n\x0fProtobufFixture\x12\x0c\n\x04name\x18\x01 \x01(\tb\x06proto3',
)
_PROTOBUFFIXTURE = _descriptor.Descriptor(
name="ProtobufFixture",
full_name="ProtobufFixture",
filename=None,
file=DESCRIPTOR,
containing_type=None,
create_key=_descriptor._internal_create_key,
fields=[
_descriptor.FieldDescriptor(
name="name",
full_name="ProtobufFixture.name",
index=0,
number=1,
type=9,
cpp_type=9,
label=1,
has_default_value=False,
default_value=b"".decode("utf-8"),
message_type=None,
enum_type=None,
containing_type=None,
is_extension=False,
extension_scope=None,
serialized_options=None,
file=DESCRIPTOR,
create_key=_descriptor._internal_create_key,
),
],
extensions=[],
nested_types=[],
enum_types=[],
serialized_options=None,
is_extendable=False,
syntax="proto3",
extension_ranges=[],
oneofs=[],
serialized_start=26,
serialized_end=57,
)
DESCRIPTOR.message_types_by_name["ProtobufFixture"] = _PROTOBUFFIXTURE
_sym_db.RegisterFileDescriptor(DESCRIPTOR)
ProtobufFixture = _reflection.GeneratedProtocolMessageType(
"ProtobufFixture",
(_message.Message,),
{
"DESCRIPTOR": _PROTOBUFFIXTURE,
"__module__": "protobuf_fixture_pb2"
# @@protoc_insertion_point(class_scope:ProtobufFixture)
},
)
_sym_db.RegisterMessage(ProtobufFixture)
# @@protoc_insertion_point(module_scope) |
cc @haberman |
Ben, are you using pure python or cpp extension? You can print the api_implementation.Type() to know the version
|
@anandolee this is the |
@BenRKarl could you include the code that is using |
@haberman sure thing, assuming you have the above generated _pb2 saved in a separate file called from protobuf_fixture_pb2 import ProtobufFixture
from time import time
init_start = time()
pplus = ProtobufFixture()
init = time() - init_start
set_start = time()
pplus.name = "Test"
set_time = time() - set_start
print(f"{init * 1000}, {set_time * 1000}") |
@anandolee and I have been looking into this and we've had trouble reproducing this regression. I modified your script a bit to use from protobuf_fixture_pb2 import ProtobufFixture
import google.protobuf as pb
import timeit
def BM(func):
timer = timeit.Timer(func, globals=globals())
iters, time = timer.autorange()
ns_per_iter = time / iters * (10 ** 9)
return f"{int(ns_per_iter)} ns/iter"
msg = ProtobufFixture()
create = BM('msg = ProtobufFixture()')
assign = BM('msg.name = "Test"')
print(f"Version: {pb.__version__}, create: {create}, assign: {assign}") I tried this with 3.18.0 and 3.17.3, with both the pure-Python and Python/C++ implementations. My results were:
The differences I'm seeing are minor, nothing approaching the 61%/157% you saw. I'm not sure why we would be getting different results from you. What results do you get from my script? |
@haberman , @anandolee , I'm seeing similar results to yours, but I was able to create a test case that demonstrates the differences between versions more definitively. This test requires that you:
Results:
Since |
With the larger input file
|
I was able to reproduce this, thanks. Here are profiles before and after. In 3.18.0 we have a costly 3.17.3:
3.18.0:
|
@haberman I'm actually continuing to see regressions in our benchmark framework with
I'm running this script on a gLinux machine, using Python 3.7.0 and a new virtual environment where I switch between |
Seems like the problem might be that
If we can get this back to defaulting to |
Yes I also had suspected that perhaps you were getting pure Python parsing. If I wonder if the 3.18.1 release did not properly build and upload the C++ extension for some reason. I'll look into this. |
If it helps, in my local |
Yes that would definitely explain it. It's odd though, because I just tried in a bare venv and it worked for me:
What do you get when running these commands? |
Using the same virtual env I was using for testing earlier:
So similar behavior here. I think the main difference is that I'm using |
I realized I had
|
@haberman I ran some additional tests with
The problem is that I can't pin down the pattern the causes the library to default to one implementation or the other, I've tested different language versions, different installation patterns (i.e. |
One thing to add, manually specifying
So currently there's no reliable workaround. |
What version of protobuf and what language are you using?
Version: 3.18.0
Language: Python (version 3.7.0)
What operating system (Linux, Windows, ...) and version?
gLinux
What did you do?
Run the below script using
protobuf
version3.17.3
then run it again usingprotobuf
version3.18.0
and compare the results. I ran this 30 times per version, each in a fresh python interpreter, and compared the averages.What did you expect to see
I expect to see the performance to be similar for each version.
What did you see instead?
That is a 21% regression in class initialization and a 14% regression when setting fields.
Anything else we should know about your project / environment
We have an internal performance framework that shows that these regressions are significant at a larger scale as well, (i.e. reading and writing several fields, processing larger requests with nested fields) and we began seeing degradations around the time that
3.18.0
was published.The text was updated successfully, but these errors were encountered: