-
Notifications
You must be signed in to change notification settings - Fork 20
Conversation
thanks - i should be able to look at this soon after we get 1.0 of the kafka client out the door (I have been focussed on that). I notice that there are more than 32000 lines of code though, most of which shouldn't be here, so at the very least i'll need to tidy it up and open again (which would loose attribution to you). is that ok, or do you want to tidy it up? Also, I noticed we don't have the contributor license agreement bot set up on this repo, I think you can sign it by going here: http://clabot.confluent.io/cla (that just allows us to confidently do what we want with the code, probably ultimately merging upstream to the apache foundation). |
It looks like most of those 32K lines are |
I've also just complete the CLA. |
Around ~1K lines now. That seems more like it. :-) |
great, thanks! i hope I can get this in 1.0.1. |
There were a few missing switch branches for testing the processing of a |
/// <param name="logicalValue">The logical value to convert.</param> | ||
/// <param name="schema">The schema that represents the target of the conversion.</param> | ||
/// <returns>An object representing the encoded value of the base type.</returns> | ||
public override object ConvertToBaseValue(object logicalValue, LogicalSchema schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timjroberts I've been diving into avro decimal serialisation in a recent project, and I believe your conversion to/from decimals & bytes doesn't correctly account for 1) endianness and 2) the specific nature of 'valBits[3]' in the decimal type's data structure (according to the docs https://docs.microsoft.com/en-us/dotnet/api/system.decimal it contains a sign flag & the scale).
I've implemented some conversion code here: https://github.com/petersilverwood/avro-decimal-tools/tree/master/lang/csharp with some test fixtures to verify the same serialisation in C# and in java. There are probably some edge cases to handle and I haven't tested performance, but it should be good starting point.
Is this useful to integrate into your PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the implementations of the logical types were naive and likely not fully meeting with the current Avro specification. It was the one area where I needed more focused review. I'll take a look at your referenced. Many thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have some scripts in a local branch for bringing up an integration test environment in docker (multi broker kafka cluster, schema registry, monitoring etc). once that is in, it will be easy to get java <-> .net integration tests going. i have been thinking that is a pre-requisite for getting this PR merged. this is somewhere high on my priority list, but not at the top.
@mhowlett - have you had any time to look at the interop scenarios that you mentioned above? Is there anything you need me to look at? I'm planning on looking at @petersilverwood's Decimal implementation this week - it's the logical type implementation that I'm most naive with. |
I've now reworked the cc/ @petersilverwood |
[TestCase("123456789123456789.56")] | ||
[TestCase("-123456789123456789.56")] | ||
[TestCase("000000000000000001.01")] | ||
[TestCase("-000000000000000001.01")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest adding test cases for Decimal.MinValue and Decimal.MaxValue
|
||
private static byte[] GetUnscaledDecimalValue(decimal value, out int scale) | ||
{ | ||
var buffer = new byte[12]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest making this new byte[13]
. The internals of the decimal are an unsigned integer value, but BigInteger constructor is expecting a signed value. Therefore, easiest way to get BigInteger to interpret the resultant byte[] from this method is to ensure the final bye is 0x00. This should fix the Decimal.MinValue and Decimal.MaxValue test cases.
|
||
var paddedBuffer = new byte[12]; | ||
|
||
Buffer.BlockCopy(buffer, 0, paddedBuffer, 0, Math.Min(buffer.Length, 12)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If buffer.Length > 12 we can't correctly read the avro data into a decimal - so would it be better to raise an exception in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timjroberts - thinking on this point some more - would it make sense to create a new type AvroDecimal
which wraps a BigInteger and scale? This class could then include fromDecimal() and toDecimal() methods to bridge to the native decimal type for the majority of use cases, but still allow use with arbitrary size decimals as per the spec?
BitConverter.ToInt32(paddedBuffer, 0), | ||
BitConverter.ToInt32(paddedBuffer, 4), | ||
BitConverter.ToInt32(paddedBuffer, 8) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure of this ground, but this is an area I think we have to treat big-endian architectures differently, e.g.:
if (BitConverter.IsLittleEndian)
{
// as above
}
else
{
// On a BigEndian architecture, we need the byte order to be bigendian also before
// interpreting as an int32. However, the array of int32[] passed to the constructor
// of decimal should still be least significant byte first
Array.Reverse(paddedBuffer);
valBits = new int[]
{
BitConverter.ToInt32(paddedBuffer, 8),
BitConverter.ToInt32(paddedBuffer, 4),
BitConverter.ToInt32(paddedBuffer, 0)
};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might be correct. Given where these bytes are now coming from their endianess will need to be considered. Good catch. Will update.
Also ensure that the buffer size is large enough to support decimal.MinValue and decimal.MaxValue
I've added a new /cc @petersilverwood |
So after a little reading, I think that because .NET |
Quick observation - this doesn't look to support schemas of mixed primitive and logical types, is that intended? |
@dcrdev - not quite sure what you mean, but you can define Records that use a mix of types (primitive, complex, and now logical). For example, the following schema uses a mix of primitive and the {
"type": "record",
"namespace": "WordCountEntities",
"name": "WordCount",
"fields": [
{
"name": "TotalWordCount",
"type": "int"
},
{
"name": "WordCounts",
"type": { "type": "map", "values": "int" }
},
{
"name": "TopWord",
"type": "string"
},
{
"name": "ExecutionCost",
"type": {
"type": "bytes",
"logicalType": "decimal",
"precision": 8,
"scale": 2
}
}
]
} |
Any planned movement on this inclusion? |
there's plenty of demand for this (it came up again yesterday with a customer), but it's not at the top of the list. note: there's movement on getting rid of our fork, so this work will be done against the apache repo: confluentinc/confluent-kafka-dotnet#1033 |
Any update on supporting this? |
@shawnallen85 - there's movement on this over here apache#492 |
Do we have any movement on this tool? |
Confluent.Apache.Avro.AvroGen 1.7.7.5 generates schema file (.avsc) data type based on "type" instead of "logicaltype". Below example generates datatype as Nullable int instead of Nullable DateTime . Could you please help. |
hi, is there any plan to include this in a formal release? |
This is causing our Jenkins CI to break due to the "unknown repository" - This needs to be closed to get around that. If this change is still required, please re-do it in a fork or branch. |
This PR brings in C# support for logical types.
The code was originally based on the
apache/avro
repository (apache#492), but I've also applied the patch here for your consideration.