Protobuf compatibility functionality implementation. #13

libretto · 2021-10-17T17:42:58Z

Protobuf parser and representation classes received compare and compatibility functionality. We added few unit and integration tests. Also was created additional integration test functionality which allows us to test Karapace HTTP API compatibility endpoint with external test data (for example SchemaRegistry test scenarios).

More tests must be added for better testing coverage.

amrutha-shanbhag · 2021-10-17T22:07:47Z

@hackaugusto please review and provide feedback

h3nd24 · 2021-10-18T00:33:35Z

karapace/protobuf/one_of_element.py

+        for field in other.fields:
+            other_tags[field.tag] = field
+
+        for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):


doesn't this one just result in list(set(other_tags.keys())?

No, this expression means union of copies of two lists without duplicates.

but it came from the map keys right? map keys are inherently unique?

Sure, we merge two lists of unique keys. Keys in both lists may be the same so direct merge of this two lists like list(keys)+list(otherkeys) will have duplicates.

h3nd24 · 2021-10-18T02:32:34Z

karapace/protobuf/compare_result.py

+    FEW_FIELDS_CONVERTED_TO_ONE_OF = auto()
+
+    # protobuf compatibility issues is described in at
+    # https://yokota.blog/2021/08/26/understanding-protobuf-compatibility/


Sorry I might be missing something here, I thought schema registry support BACKWARD, FORWARD, and FULL compatibility? The compatible check in the article seems to refer only to BACKWARD compatibility

Yes You are right. But BACKWARD, FORWARD and FULL are compatibility manners.
compare().is_compatible() operation is non-commutative. So manners define "direction" of comparison
Following pseudocode show how:

old_schema.compare(new_schema).is_compatible # BACKWARD
new_schema.compare(old_schema).is_compatible #FORWARD
old_schema.compare(new_schema).is_compatible && new_schema.compare(old_schema).is_compatible #FULL

See karapace code:

karapace/karapace/compatibility/__init__.py

Lines 135 to 155 in c88847a

elif old_schema.schema_type is SchemaType.PROTOBUF:

if compatibility_mode in {CompatibilityModes.BACKWARD, CompatibilityModes.BACKWARD_TRANSITIVE}:

result = check_protobuf_compatibility(

reader=new_schema.schema,

writer=old_schema.schema,

)

elif compatibility_mode in {CompatibilityModes.FORWARD, CompatibilityModes.FORWARD_TRANSITIVE}:

result = check_protobuf_compatibility(

reader=old_schema.schema,

writer=new_schema.schema,

)

elif compatibility_mode in {CompatibilityModes.FULL, CompatibilityModes.FULL_TRANSITIVE}:

result = check_protobuf_compatibility(

reader=new_schema.schema,

writer=old_schema.schema,

)

result = result.merged_with(check_protobuf_compatibility(

reader=old_schema.schema,

writer=new_schema.schema,

))

amrutha-shanbhag · 2021-10-21T05:47:26Z

@juha-aiven @hackaugusto would be great to get your feedback on this piece of functionality.

Thanks

hackaugusto · 2021-10-22T15:08:05Z

@amrutha-shanbhag I did not have time to review the PR this week. Sorry, I will have a look at it in the upcoming days.

amrutha-shanbhag · 2021-10-24T21:48:07Z

@amrutha-shanbhag I did not have time to review the PR this week. Sorry, I will have a look at it in the upcoming days.

All good, thanks for letting us know. Looking forward for your review.

Thanks

hackaugusto

I did a first pass, mostly style suggestions. I will have a second read of the code more in-depth. Thanks for the effort on this feature :)

hackaugusto · 2021-10-28T10:31:06Z

karapace/protobuf/compare_result.py

+        self.modification: Modification = modification
+        self.path: str = path


The types of the attributes are inferred from the method's declaration. For example:

class A: def __init__(self, arg: int) -> None: self.arg = arg reveal_type(self.arg) reveal_type(A(1).arg)

With the example above, both are correctly inferred as builtins.int. So this can be simply

Suggested change

self.modification: Modification = modification

self.path: str = path

self.modification = modification

self.path = path

I had side effects without it in my pycharm. But I can remove this definitions now.

hackaugusto · 2021-10-28T10:32:10Z

karapace/protobuf/compare_result.py

+
+
+class CompareResult:
+    def __init__(self):


I'd suggest to always add the return type of functions/methods, even if it is just -> None. type checking is opt-in and needs at least one type annotation. To enable type checking here, the following is necessary:

Suggested change

def __init__(self):

def __init__(self) -> None:

BTW which type checker warn on init return type? Interesting investigate reason of such behavior.

hackaugusto · 2021-10-28T10:40:48Z

karapace/protobuf/enum_element.py

+        for constant in other.constants:
+            other_tags[constant.tag] = constant
+
+        for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):


dict.keys() returns a set-like object

Keys views are set-like since their entries are unique and hashable

https://docs.python.org/3/library/stdtypes.html#dict-views

So you don't need to wrap the result with another set:

set(other_tags.keys()) - set(self_tags.keys()) -> other_tags.keys() - self_tags.keys()

(These two items are just suggestions, do as you prefer)

If you don't care about duplicates, you also don't need to convert it to a list:

Suggested change

for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):

for tag in self_tags.keys() & (other_tags.keys() - self_tags.keys()):

>>> d1 = {1:1} >>> d2 = {1:1, 2:2} >>> d3 = {2:2} >>> d1.keys() & (d2.keys() - d3.keys()) {1}

If you do care about duplicates, then you can just chain the iterators:

from itertools import chain

Suggested change

for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):

for tag in chain(self_tags.keys(), other_tags.keys() - self_tags.keys()):

I like chain idea, will change to it.

hackaugusto · 2021-10-28T10:44:12Z

karapace/protobuf/enum_element.py

+            other_tags[constant.tag] = constant
+
+        for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):
+            result.push_path(tag.__str__())


Probably better to call the built-in instead of the dunder method:

Suggested change

result.push_path(tag.__str__())

result.push_path(str(tag))

hackaugusto · 2021-10-28T10:45:18Z

karapace/protobuf/field_element.py

@@ -9,6 +10,8 @@


 class FieldElement:
+    from karapace.protobuf.compare_type_storage import CompareTypes


Are these imports moved because to fix some circular dependency?

hackaugusto · 2021-10-28T10:59:45Z

karapace/protobuf/compare_type_storage.py

+        self.self_types: dict = dict()
+        self.other_types: dict = dict()
+        self.locked_messages: list = []
+        self.environment: list = []


the type annotations here are inferred from the right hand side. Here it would be better to narrow the type further or remove the annotation.

This is what I mean by narrowing (I'm not sure what would be the correct types):

Suggested change

self.self_types: dict = dict()

self.other_types: dict = dict()

self.locked_messages: list = []

self.environment: list = []

self.self_types: Dict[<type>] = dict()

self.other_types: Dict[<type>] = dict()

self.locked_messages: List[<type>] = []

self.environment: List[<type>] = []

Type annotation add more issues with circular dependencies. This improvement must be planned as big code refactoring.

hackaugusto · 2021-10-28T11:03:35Z

karapace/protobuf/compare_type_storage.py

+                key: Optional[FieldElement] = None
+                value: Optional[FieldElement] = None
+                for f in type_element.fields:
+                    if f.name == 'key':
+                        key = f
+                        break
+                for f in type_element.fields:
+                    if f.name == 'value':
+                        value = f
+                        break


This could be a bit short if written as:

Suggested change

key: Optional[FieldElement] = None

value: Optional[FieldElement] = None

for f in type_element.fields:

if f.name == 'key':

key = f

break

for f in type_element.fields:

if f.name == 'value':

value = f

break

key = next((f for f in type_element.fields if f.name =='key'), None)

value = next((f for f in type_element.fields if f.name =='value'), None)

hackaugusto · 2021-10-28T11:05:32Z

karapace/protobuf/compare_type_storage.py

+        from karapace.protobuf.message_element import MessageElement
+        if isinstance(type_element, MessageElement):  # add support of MapEntry messages
+            if 'map_entry' in type_element.options:
+                from karapace.protobuf.field_element import FieldElement


Maybe move the imports to the top of the function? If the worry here was runtime performance, imports are resolved during function definition, IOW, this doesn't have performance effect when calling the function

Ok will try

It will be part of next circular dependencies workaround.

hackaugusto · 2021-10-28T11:19:57Z

karapace/protobuf/compare_type_storage.py

+            return string
+        return None
+
+    def other_type_name(self, t: ProtoType):


It seems this and self_type_name are pretty much the same code, maybe add a function to share some code? Something like:

def compute_name( t: ProtoType, result_path: list, package_name: str, types: dict, ) -> Optional[str] string = t.string if string.startswith('.'): name = string[1:] if types.get(name): return name return None canonical_name = list(result_path) if package_name: canonical_name.insert(0, package_name) canonical_name.append(string) while len(canonical_name) > 1: pretender = ".".join(canonical_name) t = types.get(pretender) if t is not None: return pretender canonical_name.pop(-2) if types.get(string) is not None: return string return None

hackaugusto · 2021-10-28T11:22:18Z

karapace/protobuf/compare_type_storage.py

+        canonical_name: list = list(self.result.path)
+        if string[0] == '.':
+            name = string[1:]
+            if self.other_types.get(name):


I'm a little confused about these checks. On the loop below the check is against None, here it is against a falsy value, so it also includes the empty string. Is that intentional?

Sometimes i write as think without simplify. Yes it must be converted to if self.other_types.get(string):

hackaugusto · 2021-10-28T15:27:11Z

Some notes:

We are slowly introducing mypy, so I would recommend enabling it for this code. This can be done with the following configuration on a top-level mypy.ini

[mypy]
python_version = 3.7
warn_redundant_casts = True

[mypy-tests.unit.almond.flink.*]
ignore_errors = False
disallow_untyped_defs = True
disallow_incomplete_defs = True
check_untyped_defs = True
no_implicit_optional = True
warn_unused_ignores = True
warn_no_return = True
warn_unreachable = True
strict_equality = True

There are some warnings, it also seems it caught a few bugs, e.g.:

karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "BYTES"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "DOUBLE"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "FLOAT"

Some of the existing classes conform to an interface, namely the classes that represent parsed elements have a to_schema method. This can be represented with a Protocol:

from typing_extensions import Protocol
class ProtobufElement(Protocol):
    def to_schema(self) -> str: ...

that can be used as a type instead of the call to try_to_schema

There some commented out code that should probably be removed, e.g.:

karapace/karapace/protobuf/utils.py

Lines 70 to 75 in c88847a

    
           # class MyInt(int): 
        
           #    def is_valid_tag(self) -> bool: 
        
           #        return (MIN_TAG_VALUE <= self <= RESERVED_TAG_VALUE_START) or\ 
        
           #               (RESERVED_TAG_VALUE_END + 1 <= self <= MAX_TAG_VALUE + 1) 
        
           # builtins.int = MyInt

I left some comments on Protobuf parser library and first part of working tests. #7

amrutha-shanbhag · 2021-10-28T21:57:04Z

I did a first pass, mostly style suggestions. I will have a second read of the code more in-depth. Thanks for the effort on this feature :)

Thats awesome, great feedback. Thanks @hackaugusto :)

amrutha-shanbhag · 2021-11-04T05:57:39Z

Quick update: the PR author @libretto is on leave, and will address the PR comments starting Monday.

libretto · 2021-11-10T17:47:06Z

Some notes:

* We are slowly introducing mypy, so I would recommend enabling it for this code. This can be done with the following configuration on a top-level `mypy.ini`

[mypy]
python_version = 3.7
warn_redundant_casts = True

[mypy-tests.unit.almond.flink.*]
ignore_errors = False
disallow_untyped_defs = True
disallow_incomplete_defs = True
check_untyped_defs = True
no_implicit_optional = True
warn_unused_ignores = True
warn_no_return = True
warn_unreachable = True
strict_equality = True

There are some warnings, it also seems it caught a few bugs, e.g.:

karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "BYTES"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "DOUBLE"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "FLOAT"

* Some of the existing classes conform to an interface, namely the classes that represent parsed elements have a `to_schema` method. This can be represented with a Protocol:

from typing_extensions import Protocol
class ProtobufElement(Protocol):
    def to_schema(self) -> str: ...

that can be used as a type instead of the call to try_to_schema

* There some commented out code that should probably be removed, e.g.:
  https://github.com/instaclustr/karapace/blob/c88847a010494d7b67ba4736dfc3328ceb673c3a/karapace/protobuf/utils.py#L70-L75

* I left some comments on [Protobuf parser library and first part of working tests.  #7](https://github.com/instaclustr/karapace/pull/7)

I use mypy in pycharm from beginning but it is not ideal... Following attributes are created on class initialization by static class methods

karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "BYTES"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "DOUBLE"
karapace/protobuf/proto_type.py:90: error: "ProtoType" has no attribute "FLOAT"

It is nice example but it is not needed for us yet. I converted Protobuf elements classes from square wire library with respect to classes hierarchy so I see no any advantages in using of Protocols like in example.

from typing_extensions import Protocol
class ProtobufElement(Protocol):
    def to_schema(self) -> str: ...

amrutha-shanbhag · 2021-11-17T06:13:57Z

Hi @hackaugusto , looks like @libretto has addressed all your feedback. Please let us know if you have any final comments before we call this piece of work done.

h3nd24 · 2021-11-23T09:32:51Z

karapace/protobuf/compare_type_storage.py

+
+
+class CompareTypes:
+    def __init__(self, self_package_name: str, other_package_name: str, result: CompareResult):


just out of curiosity, so are you going to add return type to all functions?

yes, it is required by PEP 484. but annotation of some functions will be not exact (instead of real class name word 'object' will be used.
in few hours I will commit it there

h3nd24 · 2021-11-24T22:38:10Z

I'm not entirely sure about what the lint is complaining about, but can we resolve those?

Co-authored-by: Augusto Hack <[email protected]>

libretto · 2021-11-24T23:05:40Z

I'm not entirely sure about what the lint is complaining about, but can we resolve those?

I may try but this bug goes from aiven/karapace repository. So resolving this bug may touch some code which I must not touch...
This issue is not appear in my local lint. I suppose because some version difference. Maybe git has newer modules or binaries...

h3nd24 · 2021-11-24T23:08:23Z

if it's too troublesome then I'm fine with leaving it as is

h3nd24 · 2021-11-25T23:12:01Z

Before I merge, what's with the failed unit-test 3.9?

libretto · 2021-11-26T10:09:30Z

Before I merge, what's with the failed unit-test 3.9?
github tests often randomly fails... I click "re-run all jobs" and now unit-test 3.9 is passed too. Sometimes few re-run needed. Seems it depends on load their hardware by other projects.

libretto added 11 commits October 1, 2021 19:48

backup

a333625

backup compatibility workaround

d56b9dd

Tests and Debug workaround

31a399a

debugging workaround

810b54d

fixup file name

1fc7f39

fixup bugs with tests

cc1266d

integration test workaround backup

008fb07

add more integration tests/ tests workarond

1cead31

add more tests/test driven bugfixes

c638b5a

Merge branch 'master' into protobuf-skeleton

cba78b4

remove debug workaround

c77833b

libretto requested review from mrlika, h3nd24 and amrutha-shanbhag October 17, 2021 17:42

pylint fixup

048964c

h3nd24 reviewed Oct 18, 2021

View reviewed changes

Merge branch 'master' into protobuf-skeleton

8d314a1

hackaugusto reviewed Oct 28, 2021

View reviewed changes

libretto added 3 commits November 9, 2021 22:27

style improving workaround

1570b4c

PR/styles fixup workaround

93fc461

fixup

0b14b5b

style workarounds

cb9b72a

h3nd24 reviewed Nov 23, 2021

View reviewed changes

fixup annotations

9423ba2

libretto and others added 4 commits November 25, 2021 00:40

Update karapace/protobuf/proto_file_element.py

11db3c5

Co-authored-by: Augusto Hack <[email protected]>

Update karapace/protobuf/proto_file_element.py

924f60a

Co-authored-by: Augusto Hack <[email protected]>

fixup

08e66c4

fixup minor bugf

0d22250

h3nd24 merged commit fd90560 into master Nov 28, 2021

	elif old_schema.schema_type is SchemaType.PROTOBUF:
	if compatibility_mode in {CompatibilityModes.BACKWARD, CompatibilityModes.BACKWARD_TRANSITIVE}:
	result = check_protobuf_compatibility(
	reader=new_schema.schema,
	writer=old_schema.schema,
	)
	elif compatibility_mode in {CompatibilityModes.FORWARD, CompatibilityModes.FORWARD_TRANSITIVE}:
	result = check_protobuf_compatibility(
	reader=old_schema.schema,
	writer=new_schema.schema,
	)

	elif compatibility_mode in {CompatibilityModes.FULL, CompatibilityModes.FULL_TRANSITIVE}:
	result = check_protobuf_compatibility(
	reader=new_schema.schema,
	writer=old_schema.schema,
	)
	result = result.merged_with(check_protobuf_compatibility(
	reader=old_schema.schema,
	writer=new_schema.schema,
	))

		self.modification: Modification = modification
		self.path: str = path

	for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):
	for tag in self_tags.keys() & (other_tags.keys() - self_tags.keys()):

	for tag in list(self_tags.keys()) + list(set(other_tags.keys()) - set(self_tags.keys())):
	for tag in chain(self_tags.keys(), other_tags.keys() - self_tags.keys()):

		@@ -9,6 +10,8 @@


		class FieldElement:
		from karapace.protobuf.compare_type_storage import CompareTypes



		class CompareTypes:
		def __init__(self, self_package_name: str, other_package_name: str, result: CompareResult):

Protobuf compatibility functionality implementation. #13

Protobuf compatibility functionality implementation. #13

Conversation

libretto commented Oct 17, 2021 • edited Loading

amrutha-shanbhag commented Oct 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

libretto Oct 18, 2021 • edited Loading

Choose a reason for hiding this comment

amrutha-shanbhag commented Oct 21, 2021

hackaugusto commented Oct 22, 2021

amrutha-shanbhag commented Oct 24, 2021

hackaugusto left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

libretto Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hackaugusto commented Oct 28, 2021 • edited Loading

amrutha-shanbhag commented Oct 28, 2021

amrutha-shanbhag commented Nov 4, 2021

libretto commented Nov 10, 2021 • edited Loading

amrutha-shanbhag commented Nov 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h3nd24 commented Nov 24, 2021

libretto commented Nov 24, 2021

h3nd24 commented Nov 24, 2021

h3nd24 commented Nov 25, 2021

libretto commented Nov 26, 2021

libretto commented Oct 17, 2021 •

edited

Loading

libretto Oct 18, 2021 •

edited

Loading

hackaugusto left a comment •

edited

Loading

libretto Oct 29, 2021 •

edited

Loading

hackaugusto commented Oct 28, 2021 •

edited

Loading

libretto commented Nov 10, 2021 •

edited

Loading