Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README section for Logical Types Conversion #5

Closed
soumabrata-chakraborty opened this issue May 5, 2017 · 13 comments
Closed

README section for Logical Types Conversion #5

soumabrata-chakraborty opened this issue May 5, 2017 · 13 comments
Assignees
Labels

Comments

@soumabrata-chakraborty
Copy link
Contributor

Hi,

While working with logical types, I noticed that Confluent have changed their AvroConverter to rely on the property "logicalType" instead of "connect.name".
This is with reference to this commit confluentinc/schema-registry@da4d548

So, for example, for Date logical type, I can now mention "logicalType":"date" in the relevant schema field instead of "connect.name":"org.apache.kafka.connect.data.Date"

The README explicitly advises to use the connect.name instead of logicalType. Is there some aspect I am missing?

@hpgrahsl hpgrahsl self-assigned this May 5, 2017
@hpgrahsl
Copy link
Owner

hpgrahsl commented May 5, 2017

That's a great question.

When I implemented logical types support I first tried to go the standard AVRO way of defining the schema of course. However, it always resulted in errors at least for the then current kafka versions (0.10.0+0.10.1). Was pretty hard to find out how to define the schema to get it working thus I decided to explicitly point this out in the README to save others some headaches :)

It may well be that in the upcoming version (the commit you're referring to is actually for confluent platform 3.2.0-rc1) the standard AVRO way by just using the logical type is fine. We'll see...

If you find/found a way to get it working without the fully-qualified class name I'm happy to receive a pull request to change the code and/or readme.

Cheers and thx again!

@soumabrata-chakraborty
Copy link
Contributor Author

soumabrata-chakraborty commented May 5, 2017

I did manage to get it working. I was using the Kafka Connect docker image confluentinc/cp-kafka-connect:3.2.0

I have only verified the byte array to connect record conversion by using logicalType property for date.
I have not yet verified other logical types and not yet verified connect record to byte array conversion

This is how I defined the field in my Avro Schema and it worked fine - i.e. I ended up with a Date on Mongo.
{"name":"anyName", "type": {"type":"int", "logicalType":"date"}}

@hpgrahsl
Copy link
Owner

hpgrahsl commented May 5, 2017

Ok I see. It seems to me that this most likely became possible with version 0.10.2 because it didn't work in the past for me. Haven't tried it after upgrading dependencies to be honest.

Can you confirm that at the moment both ways are working for you? Or is it now only possible with the logical type name instead of the class name?

@soumabrata-chakraborty
Copy link
Contributor Author

My observation is that it is only possible if you have "logicalType" set in the field schema. Setting the logicalType is now mandatory for the AvroConverter to work properly.

Setting the "connect.name" property is now optional.
If you choose to set it -- then there is an additional check that the AvroConverter does to verify that the Schema.name() derived from the logicalType matches with the full classname provied in the connect.name property. If the validation fails then it throws an exception for Mismatched Names.

I have only worked with the Date logical type so far - but I believe my observation holds true for all logical types. You may refer to the toConnectSchema method here https://github.com/confluentinc/schema-registry/blob/master/avro-converter/src/main/java/io/confluent/connect/avro/AvroData.java

@hpgrahsl
Copy link
Owner

hpgrahsl commented May 6, 2017

I was just investigating matters with confluent platform 3.2.0 and think that we are maybe talking about different things here. My local demo scenario is as follows:

  1. I use a simple Java producer app with Avro Serialization.
  2. the custom value class I use results from avro code generation which is based on an .avsc file containing amongst simple types also logical types like the following:

... {"name": "date","type": { "type": "int", "connect.version": 1, "connect.name": "org.apache.kafka.connect.data.Date" } } ...

  1. I then use the generated class definition to create an object for the value and send it to the kafka topic using the producer app

All of this works as intended. Data is serialized correctly and stored in the kafka topic as expected.

Now, if I were to change the .avsc AVRO schema definition as you suggest, i.e. leaving out the connect.name property and renaming the nested type property to logicalType the AVRO code generator fails complaining about "...no such type".

@soumabrata-chakraborty
Copy link
Contributor Author

soumabrata-chakraborty commented May 6, 2017

Ok. You should not rename the nested type property to logicalType -- just mention the "logicalType" in addition to the "type".

So converting your example to use logicalType should look like:

... {"name": "date","type": { "type": "int", "logicalType": "date" } } ...

Is that how you tried it?

@hpgrahsl
Copy link
Owner

hpgrahsl commented May 6, 2017

No I didn't. Have tried it as you suggested and at least code generation works with that. However, interestingly enough, the resulting Java class definition changed and thus broke my producer app code :)

Instead of the date field resulting in Java type int/Integer it becomes a Joda LocalDate with your definition... very strange to me. Not so for the other three logical types (decimal, time, timestamp) which are still mapped to the same Java types as they were with my schema definition.

Many thx anyway for pointing this out. Unsure how to proceed on this because it feels wrong to me to accept a Joda LocalDate for this. Maybe I'm wrong and it's the way to go.

@soumabrata-chakraborty
Copy link
Contributor Author

soumabrata-chakraborty commented May 7, 2017

Here is how I was doing it:
(1) I created an Avro schema where the date field looked like ... {"name": "myDate","type": { "type": "int", "logicalType": "date" } }
(2) I create a generic record using that schema ... GenericRecord genericRecord = new GenericData.Record(schema) ;
(3) I populate the "myDate" field with an int ... genericRecord.put("myDate", new Long(someJavaDate.getTime()/86400000).intValue()) ; ... So the GenericRecord field indeed carried an int and not Joda LocalDate
(4) I have this sent out to my topic from a kafka-streams app using a GenericAvroSerde. Something on the lines of the example here https://github.com/JohnReedLOL/kafka-streams/blob/master/src/main/java/io/confluent/examples/streams/utils/GenericAvroSerde.java

Now from my kafka-connect app which listens to this topic, I have
(5) The converter was io.confluent.connect.avro.AvroConverter and I had the mongo connector successfully write a Date to the Mongo Collection.

@soumabrata-chakraborty
Copy link
Contributor Author

soumabrata-chakraborty commented May 7, 2017

I just had a look at the Avro Repository and I think what you observed might be the intended behavior in Avro.

It seems for a date logicalType -- if you are working with a GenericRecord then you just populate the field as an int as I had mentioned in my earlier comment.
i.e. genericRecord.put("myDate", someIntValue) ; //Here someIntValue is an int/Integer

However, when you work with Custom value class (generated from the Avro Schema) then the instance variable in the custom value class is a Joda LocalDate.

Refer the DateConversion inner class from https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/data/TimeConversions.java

@soumabrata-chakraborty
Copy link
Contributor Author

soumabrata-chakraborty commented May 7, 2017

I used avro-tools-1.8.0 to generate a class for me from an avsc file.
I saw the following logical types to java type conversion where the java type was from Joda Time:
date -> org.joda.time.LocalDate
time-millis -> org.joda.time.LocalTime
timestamp-millis -> org.joda.time.DateTime

While converting to byte arrays, Avro has converters that encode date and time-millis as int and timestamp-millis as long.

The io.confluent.connect.avro.AvroConverter will treat these logicalTypes properly as long as the schema has the "logicalType" property in addition to the "type" property

From a documentation perspective for the mongo connector, we are more concerned about the conversion of the Avro Schema to a Connect Schema as part of creating a ConnectRecord. The io.confluent.connect.avro.AvroConverter will do this Avro Schema --> Connect Schema conversion properly only if the Avro schema has the correct "logicalType" configured.

@hpgrahsl
Copy link
Owner

hpgrahsl commented May 7, 2017

Thx for checking this stuff. I'll adapt the readme to reflect the main aspects of this discussion.

I can confirm the type mapping by getting the following code generation result with avro-tools-1.8.1

... { "name": "date", "type": { "type": "int", "logicalType": "date" } }, { "name": "timemillis", "type": { "type": "int", "logicalType": "time-millis" } }, { "name": "timemicros", "type": { "type": "long", "logicalType": "time-micros" } }, { "name": "timestampmillis", "type": { "type": "long", "logicalType": "timestamp-millis" } }, { "name": "timestampmicros", "type": { "type": "long", "logicalType": "timestamp-micros" } } ...

mapped to Java Types:

private org.joda.time.LocalDate date;
private org.joda.time.LocalTime timemillis;
private long timemicros;
private org.joda.time.DateTime timestampmillis;
private long timestampmicros;

hpgrahsl added a commit that referenced this issue May 7, 2017
based on this discussion #5
@hpgrahsl hpgrahsl closed this as completed May 7, 2017
@soumabrata-chakraborty
Copy link
Contributor Author

Thanks @hpgrahsl

@krishkir
Copy link

It is working for me .Many thanks Soumabrata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants