Implement Protobuf support #296

amrutha-shanbhag · 2021-12-20T04:28:12Z

About this change - What it does

Validating, storing, and versioning Protobuf schemas
Protobuf schema evolution and compatibility
Protobuf messages serialization/deserialization

References: #67

Why this way

Design document outlining design decisions: https://github.com/instaclustr/karapace/pull/11/files?short_path=45bec23#diff-45bec23eda89f362ca9cda50a6be0477f6058d4e6b9cc6c5198cdc5f0488df77

This reverts commit ba5c9c1.

Revert "protbuf support skeleton"

Sync fork

… PR #1

…ed by this tests

@hackaugusto

* Add protobuf skeleton * Add skeleton files * remove unfinished tests * fixup lint errors * Changed project structure, and added one test and debugged issues for PR #1 * fixup lint issues * fixup by @hackaugusto suggestions

Protobuf parser library and first part of working tests.

…element module

* tests/test_schema.py: splitting test_schema() Split test_schema() to multiple single-purpose tests No essential functional changes in the tests * Added information how to run integration tests against Confluence stack Instructions in README.rst Docker Compose file to start the Confluence stack * Kafka REST fixed version to 6.1.1 to match Schema Registry * README.rst: clarified compatibility Changed the claim that Karapace is compatible to that aims to be compatible with 6.1.1 and added a list of known incompabilities. * Configuration Keys as table * fixed content table * Fixed small spelling bugs * test_schema.py removed assert_schema_versions from test_schema_repost, unrelated * test_schema.py added -> None to all test method signatures. * test_schema.py: added annotations to all functions * test_schema.py duplicate code removal * test_schema.py moved a comment to a an assert message * test_schema.py removed unneeded f-string wrappings * utils.py AVRO name compatible (http://avro.apache.org/docs/current/spec.html#names). Must not have '-'. * test_schema.py test_schema_version_numbering uses 'name' in the Avro to make the schema unique * test_schema.py: str format() -> f-strings * test_schema.py no more JSONs as strings, instead dicts that are dumped as JSON strings * utils.py add create_schema_name_factory, create safer names For example in Avro field names '-' is not allowed. Using underscore instead. * test_schema.py: split test_schema_versions into two tests New ones: test_schema_versions_multiple_subjects_same_schema and test_schema_versions_deleting The tests use unique schema names * test_schema.py: test_schema_remains_constant fixes Wasn't using a unique schema id. Added doc * test_schema.py removed test_enum_schema_compatibility Essentially a duplicate of test_enum_schema * test_schema.py: fix test_schema_repost Compares JSONs now, not strings. * test_schema.py test_compatibility_endpoint fix Now uses a dynamic unique schema name. Was clashing before. Added documentation on what the test does. * test_schema.py test_record_schema_compatibility_backward split into two The new ones: test_record_schema_compatibility_backward and test_record_schema_compatibility_forward * test_schema_version_number_existing_schema takes version ids from response Now compatible with SR * test_schema.py: test_schema_subject_version_schema fix Changed to use a proper Avro schema * test_schema.py: test_schema_same_subject fix No longer expects the exact same string schema to be returned. The str parsed as JSON needs to match. * Handle gracefully if no node is master eligible Karapace configuration allows configuring node to not be eligible for master. Handle gracefully ie. read-only mode if all nodes are configured non-eligible for master. * schema_registry: breaking change in an error message The error message in POST to /subject/<subject> when schema is not specified in the request changed. Fixes test_schema_subject_post_invalid to run in Karapace and against Schema Registry * schema_registry: breaking change in subjects/{}/versions/{} Fixed the error message in subjects/{}/versions/{} to match Schema Registry Now test_schema_subject_invalid_id works against SR * test_schema.py test_version_number_validation fix Error message check matches the error from SR (was breaking the test) Dynamically fetches the version number Added description for the test * Add some typing, rename eligible master flag for clarification * schema_registry: breaking change in POST subjects/{subject}/versions In the case the endpoint is submitted without body, changed the HTTP status code, error_code and message match the ones in Schema Registry. Made the necessary changes so that Karapace also returns correct values. test_schema.py: test_schema_missing_body fixed accordingly. * schema_registry: breaking changes in some HTTP error messages Now HTTP error messages match with the ones coming from Schema Registry. Adjusted test_http_headers in test_schema.py to correctly check the messages. * schema_registry: breaking change in /schemas/ids/<>/versions /schemas/ids/<schema_id:path>/versions now returns empty list in case nothing is found. This is the behaviour of SR. Karapace used to fail in this case before this change. The tests test_schema_lifecycle and test_schema_versions_deleting now works against Schema Registry (in addition to Karapace) * test_schema.py: test_schema_versions_deleting: No unique field Unique field name not needed, schema name is enough. Using a fixed one. * readme: clarified and separated readme moved documentation about development to the CONTRIBUTING.md file, and tried to make the README.rst a bit more concise. * Remove explicit master eligibility flag and utilize optional master_url * CONTRIBUTING.md small fixes Only minor changes, no essential content change: Changed some rst formattings to md Some typos fixed such as karapace -> Karapace A few small tweaks * doc: fixed grammar * KarapaceAll: startup fix When started from KarapaceAll, the __init__ of KarapaceSchemaRegistry is not called. schema_lock is initialized in __init__. Thus it's not called when using KarapaceAll. Fix is to move schema_lock init to _init() which gets called also when using KarapaceAll. * docs: locahost -> localhost Co-authored-by: Juha Mynttinen <[email protected]> Co-authored-by: Francesco <[email protected]> Co-authored-by: Tommi Vainikainen <[email protected]> Co-authored-by: Augusto Hack <[email protected]>

Co-authored-by: Augusto Hack <[email protected]>

libretto

@hackaugusto please see my comments... there are a few issues not finished yet.

Co-authored-by: Augusto Hack <[email protected]>

amrutha-shanbhag · 2022-01-27T23:46:12Z

Hi @hackaugusto ,

@libretto has addressed most of your feedback and left some comments of his own. Please let us know if there is any more feedback.

Thanks

hackaugusto

I added some comments, but those can be implemented as follow up PRs. Thank you for you contribution.

hackaugusto · 2022-01-22T15:22:41Z

karapace/protobuf/proto_type.py

+        return self.string
+
+    def hash_code(self) -> int:
+        return hash(self.string)


These are the things that come up to mind:

Hashable objects should be implemented using the __hash__ method ref

Hashing and equality should take into account all the attributes of an object, here only self.string is being used. It should also include is_map, is_scalar, key_type, and value_type. Without these attributes the result of the equality is not reliable

Hashable objects must have a constant hash result, that means they must be immutable. This implementation is on a mutable object.

hackaugusto · 2022-01-22T15:25:02Z

karapace/compatibility/protobuf/checks.py

+    log.debug("IS_COMPATIBLE %s", result.is_compatible())
+    if result.is_compatible():
+        return SchemaCompatibilityResult.compatible()
+    # TODO: maybe move incompatibility level raising to ProtoFileElement.compatible() ??


could you please remove the TODOs that are implemented and create issues for the ones that need further work?

hackaugusto · 2022-01-22T15:29:50Z

karapace/protobuf/enum_constant_element.py

+    name: str
+    tag: int
+    documentation: str = ""
+    options: List[OptionElement] = []


Mutable objects are not allowed as default values (note the error bellow is raised by the standard library but not by the attr implementation):

>>> from dataclasses import dataclass >>> >>> @dataclass ... class A: ... f: list[str] = [] ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/usr/lib64/python3.9/dataclasses.py", line 1021, in dataclass return wrap(cls) File "/usr/lib64/python3.9/dataclasses.py", line 1013, in wrap return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen) File "/usr/lib64/python3.9/dataclasses.py", line 863, in _process_class cls_fields = [_get_field(cls, name, type) File "/usr/lib64/python3.9/dataclasses.py", line 863, in <listcomp> cls_fields = [_get_field(cls, name, type) File "/usr/lib64/python3.9/dataclasses.py", line 747, in _get_field raise ValueError(f'mutable default {type(f.default)} for field ' ValueError: mutable default <class 'list'> for field f is not allowed: use default_factory

this must be implement with a field_factory:

Suggested change

options: List[OptionElement] = []

options: List[OptionElement] = field(default_factory=list)

hackaugusto · 2022-01-25T14:10:20Z

karapace/kafka_rest_apis/__init__.py

@@ -25,9 +25,9 @@
 RECORD_KEYS = ["key", "value", "partition"]
 PUBLISH_KEYS = {"records", "value_schema", "value_schema_id", "key_schema", "key_schema_id"}
 RECORD_CODES = [42201, 42202]
-KNOWN_FORMATS = {"json", "avro", "binary"}
+KNOWN_FORMATS = {"json", "avro", "protobuf", "binary"}


I know you didn't introduce this variable in your PR, but this is dead code, should be removed.

hackaugusto · 2022-01-25T14:22:44Z

karapace/protobuf/exception.py

+    def __init__(self, fail_msg: str, writer_schema=None, reader_schema=None) -> None:
+        writer_dump = pretty_print_json(str(writer_schema))
+        reader_dump = pretty_print_json(str(reader_schema))


So there may be None in writer_schema or reader_schema.

This exception is only raised here:

https://github.com/aiven/karapace/pull/296/files#diff-382cb7b49ec14a61d73c973d5e7e44e6395d89e9bf0eed062635b69797e52837R75-R77

Neither of the arguments can be None in that function (assuming the type annotations used there are correct)

And that's why we use str() call to avoid passing None to pretty_print_json

This may make the type correct, but it is not valid JSON as per the snippet on my previous comment.

So the type here shouldn't be Optional

hackaugusto · 2022-01-27T20:14:17Z

karapace/protobuf/proto_file_element.py

+        return ProtoFileElement(Location.get(path))
+
+    # TODO: there maybe be faster comparison workaround
+    def __eq__(self, other: 'ProtoFileElement') -> bool:  # type: ignore


Please remove the type ignore, the type of other should be object (this is part of the language, since any two objects can be compared)

Suggested change

def __eq__(self, other: 'ProtoFileElement') -> bool: # type: ignore

def __eq__(self, other: object) -> bool:

hackaugusto · 2022-01-27T20:15:30Z

karapace/protobuf/proto_file_element.py

+        a = self.to_schema()
+        b = other.to_schema()
+
+        return a == b


This needs to check the type of other:

Suggested change

a = self.to_schema()

b = other.to_schema()

return a == b

if not isinstance(other, ProtoFileElement):

return False

return self.to_schema() == other.to_schema()

hackaugusto · 2022-01-27T20:20:43Z

karapace/protobuf/proto_type.py

+        return self.string[dot + 1:]
+
+    @classmethod
+    def static_init(cls) -> None:


At least my pylint still accepts it without any messages

This is an anti pattern for tools like pylint. pylint is a static analyzer, it can't run the code for analysis, so code like this results in false negatives. Here is an example that pylint doesn't caught because the @static_init annotation is missing and it can't know that:

class R: @classmethod def static_init(cls) -> None: cls.A = "a" print(R().A)

So i prefer to add this change to the next stage of a development cycle.

Fair enough 👍

hackaugusto · 2022-01-27T20:53:10Z

karapace/protobuf/proto_file_element.py

+
+        if self.package_name != other.package_name:
+            result.add_modification(Modification.PACKAGE_ALTER)
+        # TODO: do we need syntax check?


Is this TODO necessary?

hackaugusto · 2022-01-27T22:41:03Z

karapace/protobuf/proto_parser.py

+                )
+            declaration = self.read_declaration(documentation, Context.FILE)
+            if isinstance(declaration, TypeElement):
+                # TODO: add check for exception?


which exceptions? maybe remove the todo?

amrutha-shanbhag · 2022-02-04T06:07:05Z

I added some comments, but those can be implemented as follow up PRs. Thank you for you contribution.

thank you @hackaugusto :)

libretto and others added 30 commits March 30, 2021 11:58

protbuf support skeleton

ba5c9c1

Revert "protbuf support skeleton"

6e782ca

This reverts commit ba5c9c1.

Merge pull request #3 from instaclustr/revert-direct-master-commits

3e5f83b

Revert "protbuf support skeleton"

Merge pull request #4 from aiven/master

c7cf56f

Sync fork

Add protobuf skeleton

d3aff39

Add skeleton files

46e23c7

remove unfinished tests

f6be627

fixup lint errors

4b2fdb7

Changed project structure, and added one test and debugged issues for…

e28b735

… PR #1

fixup lint issues

3063f58

protobuf parser draft save

62e56eb

beta version of prot_parser class (no dependencies)

da4fa3b

lint issues fuxup

8ffba9c

fixup by @hackaugusto suggestions

9ca84d8

Merge branch 'aiven:master' into master

99dfb79

Ported first part of tests from Wire project. ProtoParser code debugg…

adb145c

…ed by this tests

Merge remote-tracking branch 'origin/master' into protobuf-parser

e37d3e8

fixup lint problem

ad490fb

Add protobuf skeleton (#6)

4b88525

* Add protobuf skeleton * Add skeleton files * remove unfinished tests * fixup lint errors * Changed project structure, and added one test and debugged issues for PR #1 * fixup lint issues * fixup by @hackaugusto suggestions

Merge branch 'master' into protobuf-parser

0677729

fixup lint issues after conflict remove

af054c7

another part of tests added

fcfabb3

change project to pytest and add references to square/wire project files

e460350

Merge pull request #7 from instaclustr/protobuf-parser

a9abaee

Protobuf parser library and first part of working tests.

merge for sync with aiven/karapace

620759f

sync with latest dev branch

26928e4

finished porting of test_proto_parser module, ported test_proto_file_…

a9dda6f

…element module

add next part of unittests for protoparser library

39f5b08

fixup lint issues

29eb67f

libretto and others added 18 commits January 21, 2022 13:00

Update karapace/protobuf/proto_type.py

8c90d06

Co-authored-by: Augusto Hack <[email protected]>

Update karapace/protobuf/proto_type.py

3789079

Co-authored-by: Augusto Hack <[email protected]>

fixups of styles

64bb4b0

Merge branch 'master' of https://github.com/instaclustr/karapace

340ca13

Update karapace/protobuf/proto_type.py

dcb72ac

Co-authored-by: Augusto Hack <[email protected]>

Update karapace/protobuf/proto_type.py

00fe7d8

Co-authored-by: Augusto Hack <[email protected]>

Update karapace/protobuf/proto_type.py

34f4ccc

Co-authored-by: Augusto Hack <[email protected]>

Update karapace/protobuf/proto_type.py

ca4ce17

Co-authored-by: Augusto Hack <[email protected]>

fixup by review

a7bed1f

Update karapace/protobuf/syntax_reader.py

fa07747

Co-authored-by: Augusto Hack <[email protected]>

fixup by review

f68fcbc

Merge branch 'master' of https://github.com/instaclustr/karapace

e626424

Update karapace/protobuf/type_element.py

9436285

Co-authored-by: Augusto Hack <[email protected]>

Merge branch 'master' of https://github.com/instaclustr/karapace

344ca20

Update karapace/schema_reader.py

1780139

Co-authored-by: Augusto Hack <[email protected]>

fixup by review

9cac2aa

Merge branch 'master' of https://github.com/instaclustr/karapace

b4dd33a

fixup by review

9371d07

libretto reviewed Jan 21, 2022

View reviewed changes

libretto and others added 7 commits January 24, 2022 12:54

Update checks.py

3f22da7

Update checks.py

153bbb1

Update compare_result.py

42a365a

Update group_element.py

8c9b0ee

Update karapace/protobuf/io.py

8e2f2d3

Co-authored-by: Augusto Hack <[email protected]>

fixup minor issues

7967d49

Merge branch 'master' of https://github.com/instaclustr/karapace

d50be12

hackaugusto approved these changes Jan 28, 2022

View reviewed changes

hackaugusto merged commit 6111c6d into Aiven-Open:master Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Protobuf support #296

Implement Protobuf support #296

amrutha-shanbhag commented Dec 20, 2021

libretto left a comment

amrutha-shanbhag commented Jan 27, 2022

hackaugusto left a comment

hackaugusto Jan 22, 2022

hackaugusto Jan 22, 2022

hackaugusto Jan 22, 2022

hackaugusto Jan 25, 2022

hackaugusto Jan 25, 2022

hackaugusto Jan 27, 2022

hackaugusto Jan 27, 2022

hackaugusto Jan 27, 2022

hackaugusto Jan 27, 2022

hackaugusto Jan 27, 2022

amrutha-shanbhag commented Feb 4, 2022

	options: List[OptionElement] = []
	options: List[OptionElement] = field(default_factory=list)

	def __eq__(self, other: 'ProtoFileElement') -> bool: # type: ignore
	def __eq__(self, other: object) -> bool:

Implement Protobuf support #296

Implement Protobuf support #296

Conversation

amrutha-shanbhag commented Dec 20, 2021

About this change - What it does

Why this way

libretto left a comment

Choose a reason for hiding this comment

amrutha-shanbhag commented Jan 27, 2022

hackaugusto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amrutha-shanbhag commented Feb 4, 2022