-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-2359: Support Logical Types in C# #492
Conversation
df61070
to
fddc0fd
Compare
On first glance, this is looking great to me. I'm going to review it carefully over in the @confluentinc fork confluentinc#14 and get it merged there as we see a lot of demand for this. side note: @confluentinc started a fork of this project since it allowed us to move fast on avro support in the .net kafka client. the big changes we've made relate to changing all the project files / references / etc for modern .NET, which would cause problems for any existing users of this library. That said, I believe it would be best to get our fork merged up into the official avro repo for the greatest net benefit and also ideally, @confluentinc would prefer to be not maintaining a separate fork (we'd prefer to contribute back to the official repo). |
Fantastic. I think the code around the Schema changes and the reader/writer side is OK. I think the implementation of the logical types themselves will need more careful review because I may be a little naive with the implementations compared to what the 1.8 spec says. I've just noticed that I haven't done |
0af22f7
to
611f4b8
Compare
This sounds like a great addition, however I have almost zero C# experience, so I'm afraid I can't provide meaningful feedback about the implementation. Unfortunately I don't know who'd be able to review this PR from the Avro community, who is maintaining the C# part. Would you mind sending an email to the dev list to (hopefully) get more attention and call for review? Also, I see that there are several test failures, which looks related to this PR, could you please have a look at those? |
Thanks @nandorKollar. These changes are now being reviewed via the Confluent fork: confluentinc#14 |
FWIW, not all Avro durations can be represented as C# TimeSpan because the duration includes a number of months. |
You're right @CurtHagenlocher. The Avro specification for a |
@timjroberts is the intention still to merge this PR? |
I'd like to see it merged @adprit. |
Any update on supporting this? |
I'm open to merge this, but currently there are merge conflicts that need to be resolved. Also, I have very little experience with C# or .Net, so I'd like to have @blachniet his opinion on this :-) Also, it would be great to give the code a try and check if it works for you as is. |
So I'm not a GitHub expert, but I believe I reconciled the differences here: https://github.com/shawnallen85/avro ... what needs to be done to bring them together? |
To provide some additional bits -- the differences were mainly around syntactic sugar, XML documentation, and the Logical schema implementation. The repo I listed above takes this PR, updates the documentation bits and syntactic sugar. |
This will be a great addition to the C# library! I'll start taking a look at this during the coming weekend. |
I'd look at confluentinc#14. This is where I completed the logical type work after realizing that we were using the Confluent fork of Avro. It is my understanding that Confluent are now looking at merging their fork back into Apache's and presumably that would include my PR. |
@timjroberts update on this - @confluentinc would ideally like to migrate to the official apache avro packages, and discontinue our fork. this is under discussion here: confluentinc/confluent-kafka-dotnet#1033 |
I may some time over the weekend. I'll pull down the latest Avro branch, and then look at replaying the commits that were made in the Confluent fork over the top. Hopefully as suggested, any conflicts will be easily resolvable. |
611f4b8
to
2070469
Compare
This PR is now identical to confluentinc#14. |
Thanks @timjroberts for the contribution. Great to see Confluent merging back to Avro master. However, the CI is still sad. It looks like there is a file without a (proper) license:
|
I've added the missing license. However, it looks like the solution has 'Warnings as Errors' turned on. I'll need to go through the |
9f2fbe9
to
29397bf
Compare
/// </summary> | ||
/// <param name="value">The <see cref="AvroDecimal"/>.</param> | ||
/// <returns>A byte.</returns> | ||
public static byte ToByte(AvroDecimal value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why expose all these static methods (in addition to the cast operators). Is there any precedent in the std lib? Doesn't seem consistent with how other numeric types work afaict. It seems to me it would be better to consolidate all of these into just the cast operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these are the implementation of IConvertible
. VB.NET doesn't invoke the conversion operators and instead relies on the type implementing this interface.
/// </summary> | ||
/// <param name="value">The byte.</param> | ||
/// <returns>An <see cref="AvroDecimal"/>.</returns> | ||
public static AvroDecimal FromByte(byte value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the same with the froms (just have the cast operators), unless there's a compelling reason to have them I can't think of. as a general rule, smaller api surface area is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. If I remove them you get CA2225.
Happy to suppress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to you i think.
this is looking good to me now. the last time I looked through this i did it quite closely and I think that feedback has all been covered off now. what i'm going to do next (hopefully get it done over the next few days) is to integrate this with the confluent kafka serdes, and do some integration tests with java. i'm just going to do this on an adhoc basis since multiple languages are going to make it difficult to set up properly. after that, i'll give my thumbs up, fwiw. |
Would be great @mhowlett, please keep us updated! |
{ TimeMillisecond.LogicalTypeName, new TimeMillisecond() }, | ||
{ TimeMicrosecond.LogicalTypeName, new TimeMicrosecond() }, | ||
{ TimestampMillisecond.LogicalTypeName, new TimestampMillisecond() }, | ||
{ TimestampMicrosecond.LogicalTypeName, new TimestampMicrosecond() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pity there's no support for fixed
types (but the gaps shouldn't prevent this pr being merged).
public override object ConvertToBaseValue(object logicalValue, LogicalSchema schema) | ||
{ | ||
var date = ((DateTime)logicalValue).ToUniversalTime(); | ||
return (long)((date - UnixEpochDateTime).TotalMilliseconds * 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looses more precision than required. Better to use Ticks
in this calculation which have 0.1ms precision. It's a pity using DateTime
necessitates loosing microsecond precision, but if someone really wants that they can use the underlying type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry! the problem is not here, it's in ConvretToLogicalValue
.
/// <inheritdoc/> | ||
public override object ConvertToLogicalValue(object baseValue, LogicalSchema schema) | ||
{ | ||
var noMs = (long)baseValue / 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looses precision. the way to do this i think is also take baseValue
module 1000, work out the fractional ms from that and then add this to noMs. if you use double
there may be precision issues.
if (time > _maxTime) | ||
throw new ArgumentOutOfRangeException(nameof(logicalValue), "A 'time-millis' value can only have the range '00:00:00' to '23:59:59'."); | ||
|
||
return (int)(time - UnixEpochDateTime.TimeOfDay).TotalMilliseconds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the time-millis logical type represents a time of day, with no reference to a particular calendar, time zone or date.
by subtracting unix time-of-day you're implying both that:
- the stored time of day is in reference to universal time.
- the passed in timespan is in reference to the timezone.
I think subtracting unix time of day here is both very confusing and misinterpreting the spec.
I assume/hope the java implementation doesn't work like this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that UnixEpochDateTime.TimeOfDay
is 00:00:00
so the value won't be incorrect, but the logic is.
if (time > _maxTime) | ||
throw new ArgumentOutOfRangeException(nameof(logicalValue), "A 'time-micros' value can only have the range '00:00:00' to '23:59:59'."); | ||
|
||
return (long)(time - UnixEpochDateTime.TimeOfDay).TotalMilliseconds * 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment in TimeMillisecond
.
/// <summary> | ||
/// The date and time of the Unix Epoch. | ||
/// </summary> | ||
protected static readonly DateTime UnixEpochDateTime = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be DateTimeKind.Unspecified
.
The spec says:
The timestamp-millis logical type represents an instant on the global timeline, independent of a particular time zone or calendar,
by specifying a kind of UTC, the returned values are explicitly not independent of a particular timezone.
the spec does go on to say:
where the long stores the number of milliseconds from the unix epoch, 1 January 1970 00:00:00.000 UTC
(i.e. includes time zone in the epoch), but I don't think that is intended or what we want.
I've now sanity checked this by reading data written with java (via kafka) and vice versa for each of the logical types (including some edge cases). A few more comments related to time types to address. It seems using the 'fixed' underlying type is not supported yet. I suspect people will want to use the In the interests of time, I didn't set my tests up in a reproducible way, but I'm satisfied enough that we'll be happy to link to this in Confluent.Kafka instead of our fork once released. Logical type support in .NET would have to be the most requested feature we see at Confluent - if there's any chance of getting this in 1.9.2 we'd be very happy :-). |
Thanks for the effort @timjroberts. Let's move this PR forward as is then. Thanks for the testing and review @mhowlett. |
* AVRO-2359: Support Logical Types in CSharp * fix more warnings * remove timezone change in TestTimestamp tests * update logical Date and Time tests to be local time zone aware * ensure TimeZone is set relative to a source TimeZone * reparse Dates assuming local if they're not UTC * try setting specific TimeZone offset * actions for code review comments * seperate the Timestamp* tests * add missing license to AvroDecimalTests * modify LogicalType to return .NET type rather than string * add Register method to LogicalTypeFactory * Fixup code-review comments * Make Sign internal and add xmldocs Co-authored-by: Fokko Driesprong <[email protected]>
I've cherry-picked this onto the 1.9.2 branch @blachniet @RyanSkraba |
thanks @Fokko ! @timjroberts - if you want to do a PR addressing the last round of comments, that'd be great (else I'm happy to do it). i think they should be addressed, but don't need to get them in 1.9.2. |
Ryan published the first RC of 1.9.2: http://mail-archives.apache.org/mod_mbox/avro-dev/202002.mbox/browser Would be great if y'all could check the .Net artifacts :) |
@mhowlett I'm happy for you to pick those final pieces up if it'll move things along quicker. I can't commit to things in the evenings at the moment. Please do let me know if you need anything from me though. Thanks for all the help in getting this. Much appreciated. |
can confirm I've successfully serialized and deserialized a record with a logical type field using the nuget package from the above reference. re: additional changes, I'll open a PR in due course. |
Is there some trick to using logical types as part of a record? The testing works when the entire schema consists of, for instance, a int/date type, but when it's just one field of many, the schema parser consistently ignores the logicalType annotation and treats the field as a pure int. Looked in the code, should Schema.cs#206 be passing jtok instead of jtype to LogicalSchema.NewInstance()? |
EDIT - False alarm, sorry! Regenerated our classes with v1.11.3 and then somehow reverted the nuget update, so I was trying to serialize with v1.9.1 writer. @mhowlett - I'm trying to update to Apache.Avro 1.11.3 in our C# project and am getting serialization issues with the logical type changes. A record that uses date now has its C# generated class with a DateTime instead of an int. When I go to serialize it, I get the error I feel like I might be missing something, but I'm not sure what. The generated schema has: Any thoughts or help is appreciated. |
@DannyBoyk can you post a sample program that hits the exception? |
Well, have to apologize, this was a PEBKAC error. I somehow managed to revert the new Avro nuget update after I did a code gen with the new package, so I was trying to serialize v1.11.3 C# Avro objects with a v1.9.1 writer. After reverting the revert, everything is working as it should. Sorry for the false alarm! |
This PR brings in C# support for logical types.
The Schema, Readers and CodeGen elements have been completed, and the
"date"
logical type has an implementation. As a reference, the logical types implemented in the Java implementation will be moved across.