Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protobuf compatibility functionality implementation. #13

Merged
merged 22 commits into from
Nov 28, 2021

Conversation

libretto
Copy link
Collaborator

@libretto libretto commented Oct 17, 2021

Protobuf parser and representation classes received compare and compatibility functionality. We added few unit and integration tests. Also was created additional integration test functionality which allows us to test Karapace HTTP API compatibility endpoint with external test data (for example SchemaRegistry test scenarios).

More tests must be added for better testing coverage.

@amrutha-shanbhag
Copy link

@hackaugusto please review and provide feedback

for field in other.fields:
other_tags[field.tag] = field

for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this one just result in list(set(other_tags.keys())?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this expression means union of copies of two lists without duplicates.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it came from the map keys right? map keys are inherently unique?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we merge two lists of unique keys. Keys in both lists may be the same so direct merge of this two lists like list(keys)+list(otherkeys) will have duplicates.

FEW_FIELDS_CONVERTED_TO_ONE_OF = auto()

# protobuf compatibility issues is described in at
# https://yokota.blog/2021/08/26/understanding-protobuf-compatibility/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I might be missing something here, I thought schema registry support BACKWARD, FORWARD, and FULL compatibility? The compatible check in the article seems to refer only to BACKWARD compatibility

Copy link
Collaborator Author

@libretto libretto Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes You are right. But BACKWARD, FORWARD and FULL are compatibility manners.
compare().is_compatible() operation is non-commutative. So manners define "direction" of comparison
Following pseudocode show how:

old_schema.compare(new_schema).is_compatible # BACKWARD
new_schema.compare(old_schema).is_compatible #FORWARD
old_schema.compare(new_schema).is_compatible && new_schema.compare(old_schema).is_compatible #FULL

See karapace code:

elif old_schema.schema_type is SchemaType.PROTOBUF:
if compatibility_mode in {CompatibilityModes.BACKWARD, CompatibilityModes.BACKWARD_TRANSITIVE}:
result = check_protobuf_compatibility(
reader=new_schema.schema,
writer=old_schema.schema,
)
elif compatibility_mode in {CompatibilityModes.FORWARD, CompatibilityModes.FORWARD_TRANSITIVE}:
result = check_protobuf_compatibility(
reader=old_schema.schema,
writer=new_schema.schema,
)
elif compatibility_mode in {CompatibilityModes.FULL, CompatibilityModes.FULL_TRANSITIVE}:
result = check_protobuf_compatibility(
reader=new_schema.schema,
writer=old_schema.schema,
)
result = result.merged_with(check_protobuf_compatibility(
reader=old_schema.schema,
writer=new_schema.schema,
))

@amrutha-shanbhag
Copy link

@juha-aiven @hackaugusto would be great to get your feedback on this piece of functionality.

Thanks

@hackaugusto
Copy link

@amrutha-shanbhag I did not have time to review the PR this week. Sorry, I will have a look at it in the upcoming days.

@amrutha-shanbhag
Copy link

@amrutha-shanbhag I did not have time to review the PR this week. Sorry, I will have a look at it in the upcoming days.

All good, thanks for letting us know. Looking forward for your review.

Thanks

Copy link

@hackaugusto hackaugusto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass, mostly style suggestions. I will have a second read of the code more in-depth. Thanks for the effort on this feature :)

Comment on lines 43 to 44
self.modification: Modification = modification
self.path: str = path

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types of the attributes are inferred from the method's declaration. For example:

class A:
    def __init__(self, arg: int) -> None:
        self.arg = arg
        reveal_type(self.arg)

reveal_type(A(1).arg)

With the example above, both are correctly inferred as builtins.int. So this can be simply

Suggested change
self.modification: Modification = modification
self.path: str = path
self.modification = modification
self.path = path

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had side effects without it in my pycharm. But I can remove this definitions now.



class CompareResult:
def __init__(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to always add the return type of functions/methods, even if it is just -> None. type checking is opt-in and needs at least one type annotation. To enable type checking here, the following is necessary:

Suggested change
def __init__(self):
def __init__(self) -> None:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW which type checker warn on init return type? Interesting investigate reason of such behavior.

for constant in other.constants:
other_tags[constant.tag] = constant

for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict.keys() returns a set-like object

Keys views are set-like since their entries are unique and hashable

https://docs.python.org/3/library/stdtypes.html#dict-views

So you don't need to wrap the result with another set:

set(other_tags.keys()) - set(self_tags.keys()) -> other_tags.keys() - self_tags.keys()


(These two items are just suggestions, do as you prefer)

  1. If you don't care about duplicates, you also don't need to convert it to a list:
Suggested change
for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):
for tag in self_tags.keys() & (other_tags.keys() - self_tags.keys()):
>>> d1 = {1:1}
>>> d2 = {1:1, 2:2}
>>> d3 = {2:2}
>>> d1.keys() & (d2.keys() - d3.keys())
{1}
  1. If you do care about duplicates, then you can just chain the iterators:
from itertools import chain
Suggested change
for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):
for tag in chain(self_tags.keys(), other_tags.keys() - self_tags.keys()):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like chain idea, will change to it.

other_tags[constant.tag] = constant

for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):
result.push_path(tag.__str__())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to call the built-in instead of the dunder method:

Suggested change
result.push_path(tag.__str__())
result.push_path(str(tag))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -9,6 +10,8 @@


class FieldElement:
from karapace.protobuf.compare_type_storage import CompareTypes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these imports moved because to fix some circular dependency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines 13 to 16
self.self_types: dict = dict()
self.other_types: dict = dict()
self.locked_messages: list = []
self.environment: list = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type annotations here are inferred from the right hand side. Here it would be better to narrow the type further or remove the annotation.

This is what I mean by narrowing (I'm not sure what would be the correct types):

Suggested change
self.self_types: dict = dict()
self.other_types: dict = dict()
self.locked_messages: list = []
self.environment: list = []
self.self_types: Dict[<type>] = dict()
self.other_types: Dict[<type>] = dict()
self.locked_messages: List[<type>] = []
self.environment: List[<type>] = []

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type annotation add more issues with circular dependencies. This improvement must be planned as big code refactoring.

Comment on lines 30 to 39
key: Optional[FieldElement] = None
value: Optional[FieldElement] = None
for f in type_element.fields:
if f.name == 'key':
key = f
break
for f in type_element.fields:
if f.name == 'value':
value = f
break

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a bit short if written as:

Suggested change
key: Optional[FieldElement] = None
value: Optional[FieldElement] = None
for f in type_element.fields:
if f.name == 'key':
key = f
break
for f in type_element.fields:
if f.name == 'value':
value = f
break
key = next((f for f in type_element.fields if f.name =='key'), None)
value = next((f for f in type_element.fields if f.name =='value'), None)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

from karapace.protobuf.message_element import MessageElement
if isinstance(type_element, MessageElement): # add support of MapEntry messages
if 'map_entry' in type_element.options:
from karapace.protobuf.field_element import FieldElement

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the imports to the top of the function? If the worry here was runtime performance, imports are resolved during function definition, IOW, this doesn't have performance effect when calling the function

Copy link
Collaborator Author

@libretto libretto Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will try

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be part of next circular dependencies workaround.

return string
return None

def other_type_name(self, t: ProtoType):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this and self_type_name are pretty much the same code, maybe add a function to share some code? Something like:

def compute_name(
    t: ProtoType,
    result_path: list,
    package_name: str,
    types: dict,
) -> Optional[str]
    string = t.string
    if string.startswith('.'):
        name = string[1:]
        if types.get(name):
            return name
        return None

    canonical_name = list(result_path)
    if package_name:
        canonical_name.insert(0, package_name)
    canonical_name.append(string)

    while len(canonical_name) > 1:
        pretender = ".".join(canonical_name)
        t = types.get(pretender)
        if t is not None:
            return pretender
        canonical_name.pop(-2)
    if types.get(string) is not None:
        return string
    return None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

canonical_name: list = list(self.result.path)
if string[0] == '.':
name = string[1:]
if self.other_types.get(name):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about these checks. On the loop below the check is against None, here it is against a falsy value, so it also includes the empty string. Is that intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes i write as think without simplify. Yes it must be converted to if self.other_types.get(string):

@hackaugusto
Copy link

hackaugusto commented Oct 28, 2021

Some notes:

  • We are slowly introducing mypy, so I would recommend enabling it for this code. This can be done with the following configuration on a top-level mypy.ini
[mypy]
python_version = 3.7
warn_redundant_casts = True

[mypy-tests.unit.almond.flink.*]
ignore_errors = False
disallow_untyped_defs = True
disallow_incomplete_defs = True
check_untyped_defs = True
no_implicit_optional = True
warn_unused_ignores = True
warn_no_return = True
warn_unreachable = True
strict_equality = True

There are some warnings, it also seems it caught a few bugs, e.g.:

karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "BYTES"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "DOUBLE"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "FLOAT"
  • Some of the existing classes conform to an interface, namely the classes that represent parsed elements have a to_schema method. This can be represented with a Protocol:
from typing_extensions import Protocol
class ProtobufElement(Protocol):
    def to_schema(self) -> str: ...

that can be used as a type instead of the call to try_to_schema

@amrutha-shanbhag
Copy link

I did a first pass, mostly style suggestions. I will have a second read of the code more in-depth. Thanks for the effort on this feature :)

Thats awesome, great feedback. Thanks @hackaugusto :)

@amrutha-shanbhag
Copy link

Quick update: the PR author @libretto is on leave, and will address the PR comments starting Monday.

@libretto
Copy link
Collaborator Author

libretto commented Nov 10, 2021

Some notes:

* We are slowly introducing mypy, so I would recommend enabling it for this code. This can be done with the following configuration on a top-level `mypy.ini`
[mypy]
python_version = 3.7
warn_redundant_casts = True

[mypy-tests.unit.almond.flink.*]
ignore_errors = False
disallow_untyped_defs = True
disallow_incomplete_defs = True
check_untyped_defs = True
no_implicit_optional = True
warn_unused_ignores = True
warn_no_return = True
warn_unreachable = True
strict_equality = True

There are some warnings, it also seems it caught a few bugs, e.g.:

karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "BYTES"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "DOUBLE"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "FLOAT"
* Some of the existing classes conform to an interface, namely the classes that represent parsed elements have a `to_schema` method. This can be represented with a Protocol:
from typing_extensions import Protocol
class ProtobufElement(Protocol):
    def to_schema(self) -> str: ...

that can be used as a type instead of the call to try_to_schema

* There some commented out code that should probably be removed, e.g.:
  https://github.com/instaclustr/karapace/blob/c88847a010494d7b67ba4736dfc3328ceb673c3a/karapace/protobuf/utils.py#L70-L75

* I left some comments on [Protobuf parser library and first part of working tests.  #7](https://github.com/instaclustr/karapace/pull/7)

I use mypy in pycharm from beginning but it is not ideal... Following attributes are created on class initialization by static class methods

karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "BYTES"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "DOUBLE"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "FLOAT"

It is nice example but it is not needed for us yet. I converted Protobuf elements classes from square wire library with respect to classes hierarchy so I see no any advantages in using of Protocols like in example.

from typing_extensions import Protocol
class ProtobufElement(Protocol):
    def to_schema(self) -> str: ...

@amrutha-shanbhag
Copy link

Hi @hackaugusto , looks like @libretto has addressed all your feedback. Please let us know if you have any final comments before we call this piece of work done.



class CompareTypes:
def __init__(self, self_package_name: str, other_package_name: str, result: CompareResult):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just out of curiosity, so are you going to add return type to all functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is required by PEP 484. but annotation of some functions will be not exact (instead of real class name word 'object' will be used.
in few hours I will commit it there

@h3nd24
Copy link

h3nd24 commented Nov 24, 2021

I'm not entirely sure about what the lint is complaining about, but can we resolve those?

@libretto
Copy link
Collaborator Author

I'm not entirely sure about what the lint is complaining about, but can we resolve those?

I may try but this bug goes from aiven/karapace repository. So resolving this bug may touch some code which I must not touch...
This issue is not appear in my local lint. I suppose because some version difference. Maybe git has newer modules or binaries...

@h3nd24
Copy link

h3nd24 commented Nov 24, 2021

if it's too troublesome then I'm fine with leaving it as is

@h3nd24
Copy link

h3nd24 commented Nov 25, 2021

Before I merge, what's with the failed unit-test 3.9?

@libretto
Copy link
Collaborator Author

Before I merge, what's with the failed unit-test 3.9?
github tests often randomly fails... I click "re-run all jobs" and now unit-test 3.9 is passed too. Sometimes few re-run needed. Seems it depends on load their hardware by other projects.

@h3nd24 h3nd24 merged commit fd90560 into master Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants