-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protobuf compatibility functionality implementation. #13
Conversation
@hackaugusto please review and provide feedback |
karapace/protobuf/one_of_element.py
Outdated
for field in other.fields: | ||
other_tags[field.tag] = field | ||
|
||
for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this one just result in list(set(other_tags.keys())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this expression means union of copies of two lists without duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it came from the map keys right? map keys are inherently unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we merge two lists of unique keys. Keys in both lists may be the same so direct merge of this two lists like list(keys)+list(otherkeys) will have duplicates.
FEW_FIELDS_CONVERTED_TO_ONE_OF = auto() | ||
|
||
# protobuf compatibility issues is described in at | ||
# https://yokota.blog/2021/08/26/understanding-protobuf-compatibility/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I might be missing something here, I thought schema registry support BACKWARD, FORWARD, and FULL compatibility? The compatible check in the article seems to refer only to BACKWARD compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes You are right. But BACKWARD, FORWARD and FULL are compatibility manners.
compare().is_compatible() operation is non-commutative. So manners define "direction" of comparison
Following pseudocode show how:
old_schema.compare(new_schema).is_compatible # BACKWARD
new_schema.compare(old_schema).is_compatible #FORWARD
old_schema.compare(new_schema).is_compatible && new_schema.compare(old_schema).is_compatible #FULL
See karapace code:
karapace/karapace/compatibility/__init__.py
Lines 135 to 155 in c88847a
elif old_schema.schema_type is SchemaType.PROTOBUF: | |
if compatibility_mode in {CompatibilityModes.BACKWARD, CompatibilityModes.BACKWARD_TRANSITIVE}: | |
result = check_protobuf_compatibility( | |
reader=new_schema.schema, | |
writer=old_schema.schema, | |
) | |
elif compatibility_mode in {CompatibilityModes.FORWARD, CompatibilityModes.FORWARD_TRANSITIVE}: | |
result = check_protobuf_compatibility( | |
reader=old_schema.schema, | |
writer=new_schema.schema, | |
) | |
elif compatibility_mode in {CompatibilityModes.FULL, CompatibilityModes.FULL_TRANSITIVE}: | |
result = check_protobuf_compatibility( | |
reader=new_schema.schema, | |
writer=old_schema.schema, | |
) | |
result = result.merged_with(check_protobuf_compatibility( | |
reader=old_schema.schema, | |
writer=new_schema.schema, | |
)) |
@juha-aiven @hackaugusto would be great to get your feedback on this piece of functionality. Thanks |
@amrutha-shanbhag I did not have time to review the PR this week. Sorry, I will have a look at it in the upcoming days. |
All good, thanks for letting us know. Looking forward for your review. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass, mostly style suggestions. I will have a second read of the code more in-depth. Thanks for the effort on this feature :)
karapace/protobuf/compare_result.py
Outdated
self.modification: Modification = modification | ||
self.path: str = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types of the attributes are inferred from the method's declaration. For example:
class A:
def __init__(self, arg: int) -> None:
self.arg = arg
reveal_type(self.arg)
reveal_type(A(1).arg)
With the example above, both are correctly inferred as builtins.int
. So this can be simply
self.modification: Modification = modification | |
self.path: str = path | |
self.modification = modification | |
self.path = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had side effects without it in my pycharm. But I can remove this definitions now.
karapace/protobuf/compare_result.py
Outdated
|
||
|
||
class CompareResult: | ||
def __init__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to always add the return type of functions/methods, even if it is just -> None
. type checking is opt-in and needs at least one type annotation. To enable type checking here, the following is necessary:
def __init__(self): | |
def __init__(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW which type checker warn on init return type? Interesting investigate reason of such behavior.
karapace/protobuf/enum_element.py
Outdated
for constant in other.constants: | ||
other_tags[constant.tag] = constant | ||
|
||
for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict.keys()
returns a set-like object
Keys views are set-like since their entries are unique and hashable
https://docs.python.org/3/library/stdtypes.html#dict-views
So you don't need to wrap the result with another set:
set(other_tags.keys()) - set(self_tags.keys())
-> other_tags.keys() - self_tags.keys()
(These two items are just suggestions, do as you prefer)
- If you don't care about duplicates, you also don't need to convert it to a list:
for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())): | |
for tag in self_tags.keys() & (other_tags.keys() - self_tags.keys()): |
>>> d1 = {1:1}
>>> d2 = {1:1, 2:2}
>>> d3 = {2:2}
>>> d1.keys() & (d2.keys() - d3.keys())
{1}
- If you do care about duplicates, then you can just chain the iterators:
from itertools import chain
for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())): | |
for tag in chain(self_tags.keys(), other_tags.keys() - self_tags.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like chain idea, will change to it.
karapace/protobuf/enum_element.py
Outdated
other_tags[constant.tag] = constant | ||
|
||
for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())): | ||
result.push_path(tag.__str__()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to call the built-in instead of the dunder method:
result.push_path(tag.__str__()) | |
result.push_path(str(tag)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
@@ -9,6 +10,8 @@ | |||
|
|||
|
|||
class FieldElement: | |||
from karapace.protobuf.compare_type_storage import CompareTypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these imports moved because to fix some circular dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
self.self_types: dict = dict() | ||
self.other_types: dict = dict() | ||
self.locked_messages: list = [] | ||
self.environment: list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type annotations here are inferred from the right hand side. Here it would be better to narrow the type further or remove the annotation.
This is what I mean by narrowing (I'm not sure what would be the correct types):
self.self_types: dict = dict() | |
self.other_types: dict = dict() | |
self.locked_messages: list = [] | |
self.environment: list = [] | |
self.self_types: Dict[<type>] = dict() | |
self.other_types: Dict[<type>] = dict() | |
self.locked_messages: List[<type>] = [] | |
self.environment: List[<type>] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type annotation add more issues with circular dependencies. This improvement must be planned as big code refactoring.
key: Optional[FieldElement] = None | ||
value: Optional[FieldElement] = None | ||
for f in type_element.fields: | ||
if f.name == 'key': | ||
key = f | ||
break | ||
for f in type_element.fields: | ||
if f.name == 'value': | ||
value = f | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a bit short if written as:
key: Optional[FieldElement] = None | |
value: Optional[FieldElement] = None | |
for f in type_element.fields: | |
if f.name == 'key': | |
key = f | |
break | |
for f in type_element.fields: | |
if f.name == 'value': | |
value = f | |
break | |
key = next((f for f in type_element.fields if f.name =='key'), None) | |
value = next((f for f in type_element.fields if f.name =='value'), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
from karapace.protobuf.message_element import MessageElement | ||
if isinstance(type_element, MessageElement): # add support of MapEntry messages | ||
if 'map_entry' in type_element.options: | ||
from karapace.protobuf.field_element import FieldElement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move the imports to the top of the function? If the worry here was runtime performance, imports are resolved during function definition, IOW, this doesn't have performance effect when calling the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be part of next circular dependencies workaround.
return string | ||
return None | ||
|
||
def other_type_name(self, t: ProtoType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this and self_type_name
are pretty much the same code, maybe add a function to share some code? Something like:
def compute_name(
t: ProtoType,
result_path: list,
package_name: str,
types: dict,
) -> Optional[str]
string = t.string
if string.startswith('.'):
name = string[1:]
if types.get(name):
return name
return None
canonical_name = list(result_path)
if package_name:
canonical_name.insert(0, package_name)
canonical_name.append(string)
while len(canonical_name) > 1:
pretender = ".".join(canonical_name)
t = types.get(pretender)
if t is not None:
return pretender
canonical_name.pop(-2)
if types.get(string) is not None:
return string
return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
canonical_name: list = list(self.result.path) | ||
if string[0] == '.': | ||
name = string[1:] | ||
if self.other_types.get(name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused about these checks. On the loop below the check is against None
, here it is against a falsy value, so it also includes the empty string. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes i write as think without simplify. Yes it must be converted to if self.other_types.get(string):
Some notes:
[mypy]
python_version = 3.7
warn_redundant_casts = True
[mypy-tests.unit.almond.flink.*]
ignore_errors = False
disallow_untyped_defs = True
disallow_incomplete_defs = True
check_untyped_defs = True
no_implicit_optional = True
warn_unused_ignores = True
warn_no_return = True
warn_unreachable = True
strict_equality = True There are some warnings, it also seems it caught a few bugs, e.g.:
from typing_extensions import Protocol
class ProtobufElement(Protocol):
def to_schema(self) -> str: ... that can be used as a type instead of the call to
|
Thats awesome, great feedback. Thanks @hackaugusto :) |
Quick update: the PR author @libretto is on leave, and will address the PR comments starting Monday. |
I use mypy in pycharm from beginning but it is not ideal... Following attributes are created on class initialization by static class methods
It is nice example but it is not needed for us yet. I converted Protobuf elements classes from square wire library with respect to classes hierarchy so I see no any advantages in using of Protocols like in example.
|
Hi @hackaugusto , looks like @libretto has addressed all your feedback. Please let us know if you have any final comments before we call this piece of work done. |
|
||
|
||
class CompareTypes: | ||
def __init__(self, self_package_name: str, other_package_name: str, result: CompareResult): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just out of curiosity, so are you going to add return type to all functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is required by PEP 484. but annotation of some functions will be not exact (instead of real class name word 'object' will be used.
in few hours I will commit it there
I'm not entirely sure about what the lint is complaining about, but can we resolve those? |
Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
I may try but this bug goes from aiven/karapace repository. So resolving this bug may touch some code which I must not touch... |
if it's too troublesome then I'm fine with leaving it as is |
Before I merge, what's with the failed unit-test 3.9? |
|
Protobuf parser and representation classes received compare and compatibility functionality. We added few unit and integration tests. Also was created additional integration test functionality which allows us to test Karapace HTTP API compatibility endpoint with external test data (for example SchemaRegistry test scenarios).
More tests must be added for better testing coverage.