Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Protobuf support #296

Merged
merged 209 commits into from
Jan 28, 2022
Merged

Implement Protobuf support #296

merged 209 commits into from
Jan 28, 2022

Conversation

amrutha-shanbhag
Copy link
Contributor

About this change - What it does

  • Validating, storing, and versioning Protobuf schemas
  • Protobuf schema evolution and compatibility
  • Protobuf messages serialization/deserialization

References: #67

Why this way

Design document outlining design decisions: https://github.com/instaclustr/karapace/pull/11/files?short_path=45bec23#diff-45bec23eda89f362ca9cda50a6be0477f6058d4e6b9cc6c5198cdc5f0488df77

libretto and others added 30 commits March 30, 2021 11:58
* Add protobuf skeleton

* Add skeleton files

* remove unfinished tests

* fixup lint errors

* Changed project structure, and added one test and debugged issues for PR #1

* fixup lint issues

* fixup by @hackaugusto suggestions
Protobuf parser library and first part of working tests.
* tests/test_schema.py: splitting test_schema()

Split test_schema() to multiple single-purpose tests
No essential functional changes in the tests

* Added information how to run integration tests against Confluence stack

Instructions in README.rst
Docker Compose file to start the Confluence stack

* Kafka REST fixed version to 6.1.1 to match Schema Registry

* README.rst: clarified compatibility

Changed the claim that Karapace is compatible to that aims to be compatible with 6.1.1 and added a list of known incompabilities.

* Configuration Keys as table

* fixed content table

* Fixed small spelling bugs

* test_schema.py removed assert_schema_versions from test_schema_repost, unrelated

* test_schema.py added -> None to all test method signatures.

* test_schema.py: added annotations to all functions

* test_schema.py duplicate code removal

* test_schema.py moved a comment to a an assert message

* test_schema.py removed unneeded f-string wrappings

* utils.py AVRO name compatible (http://avro.apache.org/docs/current/spec.html#names). Must not have '-'.

* test_schema.py test_schema_version_numbering uses 'name' in the Avro to make the schema unique

* test_schema.py: str format() -> f-strings

* test_schema.py no more JSONs as strings, instead dicts that are dumped as JSON strings

* utils.py add create_schema_name_factory, create safer names

For example in Avro field names '-' is not allowed. Using underscore instead.

* test_schema.py: split test_schema_versions into two tests

New ones: test_schema_versions_multiple_subjects_same_schema and test_schema_versions_deleting

The tests use unique schema names

* test_schema.py: test_schema_remains_constant fixes

Wasn't using a unique schema id.
Added doc

* test_schema.py removed test_enum_schema_compatibility

Essentially a duplicate of test_enum_schema

* test_schema.py: fix test_schema_repost

Compares JSONs now, not strings.

* test_schema.py test_compatibility_endpoint fix

Now uses a dynamic unique schema name. Was clashing before.
Added documentation on what the test does.

* test_schema.py test_record_schema_compatibility_backward split into two

The new ones: test_record_schema_compatibility_backward and test_record_schema_compatibility_forward

* test_schema_version_number_existing_schema takes version ids from response

Now compatible with SR

* test_schema.py: test_schema_subject_version_schema fix

Changed to use a proper Avro schema

* test_schema.py: test_schema_same_subject fix

No longer expects the exact same string schema to be returned.
The str parsed as JSON needs to match.

* Handle gracefully if no node is master eligible

Karapace configuration allows configuring node to not be eligible for
master.  Handle gracefully ie. read-only mode if all nodes are
configured non-eligible for master.

* schema_registry: breaking change in an error message

The error message in POST to /subject/<subject> when schema is not specified in the request changed.

Fixes test_schema_subject_post_invalid to run in Karapace and against Schema Registry

* schema_registry: breaking change in subjects/{}/versions/{}

Fixed the error message in subjects/{}/versions/{} to match Schema Registry

Now test_schema_subject_invalid_id works against SR

* test_schema.py test_version_number_validation fix

Error message check matches the error from SR (was breaking the test)
Dynamically fetches the version number
Added description for the test

* Add some typing, rename eligible master flag for clarification

* schema_registry: breaking change in POST subjects/{subject}/versions

In the case the endpoint is submitted without body, changed
the HTTP status code, error_code and message match the ones in Schema Registry.
Made the necessary changes so that Karapace also returns correct values.

test_schema.py: test_schema_missing_body fixed accordingly.

* schema_registry: breaking changes in some HTTP error messages

Now HTTP error messages match with the ones coming from Schema Registry.
Adjusted test_http_headers in test_schema.py to correctly check the messages.

* schema_registry: breaking change in /schemas/ids/<>/versions

/schemas/ids/<schema_id:path>/versions now returns empty list in case nothing is found.
This is the behaviour of SR. Karapace used to fail in this case before this change.

The tests test_schema_lifecycle and test_schema_versions_deleting now works against Schema Registry (in addition to Karapace)

* test_schema.py: test_schema_versions_deleting: No unique field

Unique field name not needed, schema name is enough. Using a fixed one.

* readme: clarified and separated readme

moved documentation about development to the CONTRIBUTING.md file, and
tried to make the README.rst a bit more concise.

* Remove explicit master eligibility flag and utilize optional master_url

* CONTRIBUTING.md small fixes

Only minor changes, no essential content change:

Changed some rst formattings to md
Some typos fixed such as karapace -> Karapace
A few small tweaks

* doc: fixed grammar

* KarapaceAll: startup fix

When started from KarapaceAll, the __init__ of KarapaceSchemaRegistry is not called.
schema_lock is initialized in __init__. Thus it's not called when using KarapaceAll.

Fix is to move schema_lock init to _init() which gets called also when using KarapaceAll.

* docs: locahost -> localhost

Co-authored-by: Juha Mynttinen <[email protected]>
Co-authored-by: Francesco <[email protected]>
Co-authored-by: Tommi Vainikainen <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
Copy link
Contributor

@libretto libretto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hackaugusto please see my comments... there are a few issues not finished yet.

@amrutha-shanbhag
Copy link
Contributor Author

Hi @hackaugusto ,

@libretto has addressed most of your feedback and left some comments of his own. Please let us know if there is any more feedback.

Thanks

Copy link
Contributor

@hackaugusto hackaugusto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments, but those can be implemented as follow up PRs. Thank you for you contribution.

return self.string

def hash_code(self) -> int:
return hash(self.string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the things that come up to mind:

  • Hashable objects should be implemented using the __hash__ method ref
  • Hashing and equality should take into account all the attributes of an object, here only self.string is being used. It should also include is_map, is_scalar, key_type, and value_type. Without these attributes the result of the equality is not reliable
  • Hashable objects must have a constant hash result, that means they must be immutable. This implementation is on a mutable object.

log.debug("IS_COMPATIBLE %s", result.is_compatible())
if result.is_compatible():
return SchemaCompatibilityResult.compatible()
# TODO: maybe move incompatibility level raising to ProtoFileElement.compatible() ??
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please remove the TODOs that are implemented and create issues for the ones that need further work?

name: str
tag: int
documentation: str = ""
options: List[OptionElement] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable objects are not allowed as default values (note the error bellow is raised by the standard library but not by the attr implementation):

>>> from dataclasses import dataclass
>>> 
>>> @dataclass
... class A:
...     f: list[str] = []
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib64/python3.9/dataclasses.py", line 1021, in dataclass
    return wrap(cls)
  File "/usr/lib64/python3.9/dataclasses.py", line 1013, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
  File "/usr/lib64/python3.9/dataclasses.py", line 863, in _process_class
    cls_fields = [_get_field(cls, name, type)
  File "/usr/lib64/python3.9/dataclasses.py", line 863, in <listcomp>
    cls_fields = [_get_field(cls, name, type)
  File "/usr/lib64/python3.9/dataclasses.py", line 747, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'list'> for field f is not allowed: use default_factory

this must be implement with a field_factory:

Suggested change
options: List[OptionElement] = []
options: List[OptionElement] = field(default_factory=list)

@@ -25,9 +25,9 @@
RECORD_KEYS = ["key", "value", "partition"]
PUBLISH_KEYS = {"records", "value_schema", "value_schema_id", "key_schema", "key_schema_id"}
RECORD_CODES = [42201, 42202]
KNOWN_FORMATS = {"json", "avro", "binary"}
KNOWN_FORMATS = {"json", "avro", "protobuf", "binary"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you didn't introduce this variable in your PR, but this is dead code, should be removed.

Comment on lines +41 to +43
def __init__(self, fail_msg: str, writer_schema=None, reader_schema=None) -> None:
writer_dump = pretty_print_json(str(writer_schema))
reader_dump = pretty_print_json(str(reader_schema))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there may be None in writer_schema or reader_schema.

This exception is only raised here:

https://github.com/aiven/karapace/pull/296/files#diff-382cb7b49ec14a61d73c973d5e7e44e6395d89e9bf0eed062635b69797e52837R75-R77

Neither of the arguments can be None in that function (assuming the type annotations used there are correct)

And that's why we use str() call to avoid passing None to pretty_print_json

This may make the type correct, but it is not valid JSON as per the snippet on my previous comment.

So the type here shouldn't be Optional

return ProtoFileElement(Location.get(path))

# TODO: there maybe be faster comparison workaround
def __eq__(self, other: 'ProtoFileElement') -> bool: # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the type ignore, the type of other should be object (this is part of the language, since any two objects can be compared)

Suggested change
def __eq__(self, other: 'ProtoFileElement') -> bool: # type: ignore
def __eq__(self, other: object) -> bool:

Comment on lines +91 to +94
a = self.to_schema()
b = other.to_schema()

return a == b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to check the type of other:

Suggested change
a = self.to_schema()
b = other.to_schema()
return a == b
if not isinstance(other, ProtoFileElement):
return False
return self.to_schema() == other.to_schema()

return self.string[dot + 1:]

@classmethod
def static_init(cls) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least my pylint still accepts it without any messages

This is an anti pattern for tools like pylint. pylint is a static analyzer, it can't run the code for analysis, so code like this results in false negatives. Here is an example that pylint doesn't caught because the @static_init annotation is missing and it can't know that:

class R:
    @classmethod
    def static_init(cls) -> None:
        cls.A = "a"


print(R().A)

So i prefer to add this change to the next stage of a development cycle.

Fair enough 👍


if self.package_name != other.package_name:
result.add_modification(Modification.PACKAGE_ALTER)
# TODO: do we need syntax check?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO necessary?

)
declaration = self.read_declaration(documentation, Context.FILE)
if isinstance(declaration, TypeElement):
# TODO: add check for exception?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which exceptions? maybe remove the todo?

@hackaugusto hackaugusto merged commit 6111c6d into Aiven-Open:master Jan 28, 2022
@amrutha-shanbhag
Copy link
Contributor Author

I added some comments, but those can be implemented as follow up PRs. Thank you for you contribution.

thank you @hackaugusto :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants