Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design - Protobuf support in Aiven Karapace #11
Design - Protobuf support in Aiven Karapace #11
Changes from all commits
3976e3b
7c30ea3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Design - Protobuf support in Aiven Karapace
Requirements
Karapace is a 1-to-1 replacement for Confluent’s Schema Registry but it lacks Google’s Protocol Buffers (Protobuf) support.
The aim of the project is to add Protobuf support to Aiven’s Karapace tool.
The following functionality should be implemented for MVP:
Out of scope (not required for MVP):
Schema Registry supports references for Avro, JSON and Protobuf formats that was released in version 5.5. In Protobuf format references are external “.proto” files specified in “import” statements.
Unfortunately Karapace doesn’t support references. It means that before implementing Protobuf “import” support common functionality to manage references for all the formats should be designed and implemented.
Known dependencies are predefined “.proto” files specified in “import” statements: Well-Known Types (part of Protobuf standard), Google Common Types, two Confluent .proto files. They are specific to Protobuf and differ from custom references: don’t affect public APIs, should be bundled with the distribution, not stored in Kafka. Custom references functionality is not required to implement this feature.
In scope of this MVP only required tests should be implemented. Test coverage can be improved with time by the community.
Schema Registry has its own benchmark framework (modules and tests) for performance testing. The tests are not going to be provided with this MVP.
Solution Design
Validating and Storing Protobuf Schemas
Overview
Detail
Schema Registry Protobuf schema format management (parsing, validating, storing, comparing etc.) relies on Square Wire (https://github.com/square/wire) Kotlin library.
To provide 1-to-1 compatibility with Schema Registry the subset of Wire Protobuf schema support functionality has been already ported to Python.
The following functionality has been already implemented and PR created:
Protobuf Schema Evolution and Compatibility
Overview
Detail
Schema Registry compares schemas when a new version of the schema is going to be stored. It decides if these schemas are compatible by computing compatibility level.
The similar functionality is already developed for Avro schemas in Karapace. Comparison of Protobuf schemas is going to be implemented with the knowledge of Schema Registry protobuf comparison code.
The implementation will be compatible with Schema Registry but will be coded using Karapace approaches and common functionality (similar to Avro).
Comparison of Wire ProtoFileElement fields will be implemented inside of ported Wire classes. Other logic is going to be implemented in ProtobufSchema class. Code will be written from scratch and as a part of functionality will be added to Square Wire ported classes.
Endpoints which will have compatibility support and will be directly affected by it
Interface of the endpoints must be unchanged. Protobuf schema type support will be added.
Protobuf Message Serialization/Deserialization
Overview
Detail
Confluent REST Proxy features serialization and deserialization functionality that is partially implemented in Karapace in “kafka_rest_apis” folder of the repository for Avro and JSON formats.
Suggested approach
Offered deserialization procedure
Caching modules
Different strategies are applicable. Modules can be kept in the file system and reused or remove from FS right after importing in python
Memory usage
Python is a dynamic language and its memory usage is controlled by a garbage collector that must release not referenced resources. Generated by protoc Python modules contain classes. These classes depend on the Python Google.protobuf library.
In the process of importing the classes are registered and referenced by Google.protobuf library. It is not possible to unload modules out of the Python process as soon as registering/deregistering methods in Google.protobuf are undocumented.
Advantages
Public Interfaces
Protobuf support should be added as a new schema format only to existing API endpoints and should not affect existing (Avro and JSON) schema format functionality.
The API endpoints for all the schema formats should work in 1-to-1 compatible to Schema Registry after adding Protobuf integration.
Test plan
Each module/class will have a minimal set of unit tests. Also, high level functional should have at least a few integration tests.
Changes to Karapace should not affect most of the existing tests.
Required tests
Alternatives considered
Message Serialization/Deserialization can be implemented using alternative Python libraries that can parse Protobuf messages. But so far we do not have a good library with such functionality.
BlackBox Protobuf Burp Extension
Serialization/deserialization alternative
Advantages
Disadvantages