-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DECIMAL support to KSQL #842
Comments
Some quick thoughts... Decimal support would first require:
The main issue with Avro's decimal support comes from the fact that decimal is a logical type in Avro, not a first class type. The logical type is a way of interpreting another type. In the case of decimal the underlying type is either bytes or fixed. Scale and precision information are not stored within the serialized type, only within the schema. With Avro you can write and read the data using different schemas. Avro will check the schemas are compatible and throw an exception if they are not. It also provides compatibility checking methods that can be used by tools like the Schema Registry. Unfortunately, as of the time of writing Avro's schema evolution / compatibility checks do not validate logical types. The upshot of this is that you can write a decimal of "12.34", (schema has scale of two), and later read with a schema where the scale is different, say three, and Avro will happily deserialize the value as "123.4" i.e. Avro is not protecting against data corruption. This is covered by AVRO-2078 and AVRO-1721. Until these Jiras are fixed we'd need special handling of decimals in the SR to ensure correct evolution checks are in place, and potentially additional checks in Avro deserializers to again ensure write/read schema compitiblity. IMHO, not representing the scale in the actual data in some way is a bug. If Avro were to fix the above Jiras and implement correct schema compatibility / evolution checks, it would mean that it is simply not possible to ever change the scale, (and possible precision), of a decimal. It would be possible to implement readers that could accept different scales, e.g. read with a scale of 3 what was written with a scale of 2. But it would not be possible to evolve a Avro schema that included a change of scale. Doing so will result in data corruption. |
I've created AVRO-2164 to get community feed back on potentially promoting Decimal to a first class type. |
Thanks for the additional info, Andy. One option we have, in the first iteration of this functionality, is to not support decimals when using Avro. This would reduce the dependency of the first iteration to just Kafka Connect. |
@big-andy-coates Correction to your not -- Connect does support decimals, as well as other logical types, it just does so in a similar way to Avro. Decimals should translate into Avro decimals just fine (the SR doesn't have any code specifically for handling Decimal, but the Avro Converter does). |
I have a connect job that uses the JDBC Source, and pulls data from Oracle. Unfortunately many of the fields use the Oracle Number "variant" type which I wrap in my sql in the connect job to reduce the decimal size. but I end up with a schema like this for a column
Is there a work around for decimal data types as bytes with Avro? It limits the usability of KSQL with Avro if decimals are not usable. |
We are experimenting KSQL and trying to create STREAMS and TABLES on topic and encountered all the Oracle NUMBER(14,3) data types converted to
Schema registry data type for FIN_AMT is {"name":"FIN_AMT","type":["null",{"type":"bytes","scale":3,"precision":64,"connect.version":1,"connect.parameters":{"scale":"3"},"connect.name":"org.apache.kafka.connect.data.Decimal","logicalType":"decimal"}],"default":null} |
Would this ticket also address the other logical types? Or should I open a separate ticket for that? Example: {
"type": "record",
"name": "some_event",
"fields": [
{
"name": "event_id",
"type": "string"
},
{
"name": "event_timestamp",
"type": {"type": "long", "logicalType": "timestamp-millis"}
},
{
"name": "receiving_timestamp",
"type": {"type": "long", "logicalType": "timestamp-millis"}
}
]
} |
Any resoultion for this error? I have a byte[] data type from debezium cdc that is not getting understood by KSQL and KStreams
|
This would be a great feature. At the moment we cannot use KSQL for things where |
@sebastianvoss From what I understand, the problem lies with AVRO and their support for the datatype. Once they have support for it, then changes can be made for Kafka to support it. |
@KieranDevvs , no you misunderstood what @big-andy-coates wrote in his second post. To rephrase: Avro has a decimal type, but Kafka connect has its own type system, which has no decimal support. KSQL reuses the type system and is dependent on Kafka connect to implement it first. It's also not clear to Andy what the impact on the SR are (although I cannot see an issue tbh). |
Any update on this? |
Refactor some code to help to introduce DECIMAL types in #842 This refactor is needed to avoid disabling CyclomaticComplexity check style rules.
Should we do #3593 as part of the role out of DECIMALs? |
Update on this - KSQL will support decimals in the next release! I will close this ticket out, but please open specific bug reports/feature requests as the feature is still a little rough around the edges. |
Anyone please help to get some way with example to handle DECIMAL datatype of KSQL Table? |
Without decimal support, it's not possible to use KSQL to perform arithmetic operations on decimals, which are likely important in financial data, or metrics coming from IOT devices, etc.
The text was updated successfully, but these errors were encountered: