-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Protobuf support #296
Conversation
This reverts commit ba5c9c1.
Revert "protbuf support skeleton"
Sync fork
* Add protobuf skeleton * Add skeleton files * remove unfinished tests * fixup lint errors * Changed project structure, and added one test and debugged issues for PR #1 * fixup lint issues * fixup by @hackaugusto suggestions
Protobuf parser library and first part of working tests.
* tests/test_schema.py: splitting test_schema() Split test_schema() to multiple single-purpose tests No essential functional changes in the tests * Added information how to run integration tests against Confluence stack Instructions in README.rst Docker Compose file to start the Confluence stack * Kafka REST fixed version to 6.1.1 to match Schema Registry * README.rst: clarified compatibility Changed the claim that Karapace is compatible to that aims to be compatible with 6.1.1 and added a list of known incompabilities. * Configuration Keys as table * fixed content table * Fixed small spelling bugs * test_schema.py removed assert_schema_versions from test_schema_repost, unrelated * test_schema.py added -> None to all test method signatures. * test_schema.py: added annotations to all functions * test_schema.py duplicate code removal * test_schema.py moved a comment to a an assert message * test_schema.py removed unneeded f-string wrappings * utils.py AVRO name compatible (http://avro.apache.org/docs/current/spec.html#names). Must not have '-'. * test_schema.py test_schema_version_numbering uses 'name' in the Avro to make the schema unique * test_schema.py: str format() -> f-strings * test_schema.py no more JSONs as strings, instead dicts that are dumped as JSON strings * utils.py add create_schema_name_factory, create safer names For example in Avro field names '-' is not allowed. Using underscore instead. * test_schema.py: split test_schema_versions into two tests New ones: test_schema_versions_multiple_subjects_same_schema and test_schema_versions_deleting The tests use unique schema names * test_schema.py: test_schema_remains_constant fixes Wasn't using a unique schema id. Added doc * test_schema.py removed test_enum_schema_compatibility Essentially a duplicate of test_enum_schema * test_schema.py: fix test_schema_repost Compares JSONs now, not strings. * test_schema.py test_compatibility_endpoint fix Now uses a dynamic unique schema name. Was clashing before. Added documentation on what the test does. * test_schema.py test_record_schema_compatibility_backward split into two The new ones: test_record_schema_compatibility_backward and test_record_schema_compatibility_forward * test_schema_version_number_existing_schema takes version ids from response Now compatible with SR * test_schema.py: test_schema_subject_version_schema fix Changed to use a proper Avro schema * test_schema.py: test_schema_same_subject fix No longer expects the exact same string schema to be returned. The str parsed as JSON needs to match. * Handle gracefully if no node is master eligible Karapace configuration allows configuring node to not be eligible for master. Handle gracefully ie. read-only mode if all nodes are configured non-eligible for master. * schema_registry: breaking change in an error message The error message in POST to /subject/<subject> when schema is not specified in the request changed. Fixes test_schema_subject_post_invalid to run in Karapace and against Schema Registry * schema_registry: breaking change in subjects/{}/versions/{} Fixed the error message in subjects/{}/versions/{} to match Schema Registry Now test_schema_subject_invalid_id works against SR * test_schema.py test_version_number_validation fix Error message check matches the error from SR (was breaking the test) Dynamically fetches the version number Added description for the test * Add some typing, rename eligible master flag for clarification * schema_registry: breaking change in POST subjects/{subject}/versions In the case the endpoint is submitted without body, changed the HTTP status code, error_code and message match the ones in Schema Registry. Made the necessary changes so that Karapace also returns correct values. test_schema.py: test_schema_missing_body fixed accordingly. * schema_registry: breaking changes in some HTTP error messages Now HTTP error messages match with the ones coming from Schema Registry. Adjusted test_http_headers in test_schema.py to correctly check the messages. * schema_registry: breaking change in /schemas/ids/<>/versions /schemas/ids/<schema_id:path>/versions now returns empty list in case nothing is found. This is the behaviour of SR. Karapace used to fail in this case before this change. The tests test_schema_lifecycle and test_schema_versions_deleting now works against Schema Registry (in addition to Karapace) * test_schema.py: test_schema_versions_deleting: No unique field Unique field name not needed, schema name is enough. Using a fixed one. * readme: clarified and separated readme moved documentation about development to the CONTRIBUTING.md file, and tried to make the README.rst a bit more concise. * Remove explicit master eligibility flag and utilize optional master_url * CONTRIBUTING.md small fixes Only minor changes, no essential content change: Changed some rst formattings to md Some typos fixed such as karapace -> Karapace A few small tweaks * doc: fixed grammar * KarapaceAll: startup fix When started from KarapaceAll, the __init__ of KarapaceSchemaRegistry is not called. schema_lock is initialized in __init__. Thus it's not called when using KarapaceAll. Fix is to move schema_lock init to _init() which gets called also when using KarapaceAll. * docs: locahost -> localhost Co-authored-by: Juha Mynttinen <[email protected]> Co-authored-by: Francesco <[email protected]> Co-authored-by: Tommi Vainikainen <[email protected]> Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
Co-authored-by: Augusto Hack <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hackaugusto please see my comments... there are a few issues not finished yet.
Co-authored-by: Augusto Hack <[email protected]>
Hi @hackaugusto , @libretto has addressed most of your feedback and left some comments of his own. Please let us know if there is any more feedback. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments, but those can be implemented as follow up PRs. Thank you for you contribution.
return self.string | ||
|
||
def hash_code(self) -> int: | ||
return hash(self.string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the things that come up to mind:
- Hashable objects should be implemented using the
__hash__
method ref - Hashing and equality should take into account all the attributes of an object, here only
self.string
is being used. It should also includeis_map
,is_scalar
,key_type
, andvalue_type
. Without these attributes the result of the equality is not reliable - Hashable objects must have a constant hash result, that means they must be immutable. This implementation is on a mutable object.
log.debug("IS_COMPATIBLE %s", result.is_compatible()) | ||
if result.is_compatible(): | ||
return SchemaCompatibilityResult.compatible() | ||
# TODO: maybe move incompatibility level raising to ProtoFileElement.compatible() ?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please remove the TODOs that are implemented and create issues for the ones that need further work?
name: str | ||
tag: int | ||
documentation: str = "" | ||
options: List[OptionElement] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutable objects are not allowed as default values (note the error bellow is raised by the standard library but not by the attr
implementation):
>>> from dataclasses import dataclass
>>>
>>> @dataclass
... class A:
... f: list[str] = []
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib64/python3.9/dataclasses.py", line 1021, in dataclass
return wrap(cls)
File "/usr/lib64/python3.9/dataclasses.py", line 1013, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
File "/usr/lib64/python3.9/dataclasses.py", line 863, in _process_class
cls_fields = [_get_field(cls, name, type)
File "/usr/lib64/python3.9/dataclasses.py", line 863, in <listcomp>
cls_fields = [_get_field(cls, name, type)
File "/usr/lib64/python3.9/dataclasses.py", line 747, in _get_field
raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'list'> for field f is not allowed: use default_factory
this must be implement with a field_factory:
options: List[OptionElement] = [] | |
options: List[OptionElement] = field(default_factory=list) |
@@ -25,9 +25,9 @@ | |||
RECORD_KEYS = ["key", "value", "partition"] | |||
PUBLISH_KEYS = {"records", "value_schema", "value_schema_id", "key_schema", "key_schema_id"} | |||
RECORD_CODES = [42201, 42202] | |||
KNOWN_FORMATS = {"json", "avro", "binary"} | |||
KNOWN_FORMATS = {"json", "avro", "protobuf", "binary"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you didn't introduce this variable in your PR, but this is dead code, should be removed.
def __init__(self, fail_msg: str, writer_schema=None, reader_schema=None) -> None: | ||
writer_dump = pretty_print_json(str(writer_schema)) | ||
reader_dump = pretty_print_json(str(reader_schema)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there may be None in writer_schema or reader_schema.
This exception is only raised here:
Neither of the arguments can be None
in that function (assuming the type annotations used there are correct)
And that's why we use str() call to avoid passing None to pretty_print_json
This may make the type correct, but it is not valid JSON as per the snippet on my previous comment.
So the type here shouldn't be Optional
return ProtoFileElement(Location.get(path)) | ||
|
||
# TODO: there maybe be faster comparison workaround | ||
def __eq__(self, other: 'ProtoFileElement') -> bool: # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the type ignore, the type of other should be object (this is part of the language, since any two objects can be compared)
def __eq__(self, other: 'ProtoFileElement') -> bool: # type: ignore | |
def __eq__(self, other: object) -> bool: |
a = self.to_schema() | ||
b = other.to_schema() | ||
|
||
return a == b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to check the type of other
:
a = self.to_schema() | |
b = other.to_schema() | |
return a == b | |
if not isinstance(other, ProtoFileElement): | |
return False | |
return self.to_schema() == other.to_schema() |
return self.string[dot + 1:] | ||
|
||
@classmethod | ||
def static_init(cls) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least my pylint still accepts it without any messages
This is an anti pattern for tools like pylint. pylint is a static analyzer, it can't run the code for analysis, so code like this results in false negatives. Here is an example that pylint doesn't caught because the @static_init annotation is missing and it can't know that:
class R:
@classmethod
def static_init(cls) -> None:
cls.A = "a"
print(R().A)
So i prefer to add this change to the next stage of a development cycle.
Fair enough 👍
|
||
if self.package_name != other.package_name: | ||
result.add_modification(Modification.PACKAGE_ALTER) | ||
# TODO: do we need syntax check? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO necessary?
) | ||
declaration = self.read_declaration(documentation, Context.FILE) | ||
if isinstance(declaration, TypeElement): | ||
# TODO: add check for exception? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which exceptions? maybe remove the todo?
thank you @hackaugusto :) |
About this change - What it does
References: #67
Why this way
Design document outlining design decisions: https://github.com/instaclustr/karapace/pull/11/files?short_path=45bec23#diff-45bec23eda89f362ca9cda50a6be0477f6058d4e6b9cc6c5198cdc5f0488df77