Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro 0.16 wrongly fails on date examples #61

Closed
lerouxrgd opened this issue Jan 15, 2024 · 28 comments
Closed

Avro 0.16 wrongly fails on date examples #61

lerouxrgd opened this issue Jan 15, 2024 · 28 comments

Comments

@lerouxrgd
Copy link
Owner

          Interestingly 0.16 cannot parse this schema anymore:
{
  "type": "record",
  "name": "DateLogicalType",
  "fields": [ {
    "name": "birthday",
    "type": {"type": "int", "logicalType": "date"},
    "default": 1681601653
  } ]
}

It says:
default's value type of field "birthday" in "DateLogicalType" must be "{\"type\":\"int\",\"logicalType\":\"date\"}"

However this one is fine:

{
  "type": "record",
  "name": "DateLogicalType",
  "fields": [ {
    "name": "release_datetime_micro",
    "type": {"type": "long", "logicalType": "timestamp-micros"},
    "default": 1570903062000000
  } ]
}

@martin-g Do you know where I should report this ?

Originally posted by @lerouxrgd in #60 (comment)

@erikvullings
Copy link

@lerouxrgd Indeed, it seems that the apache-avro project on GitHub does not use the issue tracking system of GitHub. After some clicking around, it seems that all issues are managed in Jira by the Apache Software Foundation ASF, which can be found here. At the top of that page, you can find a self serve sign up link. After you have an account, I hope you can submit that issue there.

I've requested an account, but it seems to take some time...

FYI I had a similar issue as the one you've described. Below schema, although fine in Kafka, could not be parsed in rsgen-avro. The culprit is the logicalType, presumably because the field can also be null.

{
    "type": "record",
    "name": "TimeManagement",
    "namespace": "eu.driver.model.sim.config",
    "doc": "The time management message can be used for informing connected applications on time progression and changes. *Copyright (C) 2019-2020 XVR Simulation B.V., Delft, The Netherlands, Martijn Hendriks <hendriks @ xvrsim.com>. This file is licensed under the MIT license : https://github.com/DRIVER-EU/avro-schemas/blob/master/LICENSE*",
    "fields": [
        {
            "name": "state",
            "type": {
                "type": "enum",
                "name": "TimeState",
                "doc": "Initialization – preparing for the actual start of the simulation time; Started – the simulation time is started; Paused – the simulation time is paused; Stopped – the simulation time is stopped; Reset – the simulation time is reset",
                "symbols": [
                    "Initialization",
                    "Started",
                    "Paused",
                    "Stopped",
                    "Reset"
                ]
            },
            "doc": "State the time is currently in"
        },
        {
            "name": "tags",
            "type": [
                "null",
                {
                    "type": "map",
                    "values": "string"
                }
            ],
            "doc": "Optional map containing session time specific information: key – unique name of the specific property; value – value of that property",
            "default": null
        },
        {
            "name": "timestamp",
            "type": [
                "null",
                "long"
            ],
            "doc": "Optional UNIX Epoch time in milliseconds marking the time the update was or needs to be performed",
            "default": null,
            "logicalType": "timestamp-millis"
        },
        {
            "name": "simulationTime",
            "type": [
                "null",
                "long"
            ],
            "doc": "Optional UNIX Epoch time in milliseconds marking the fictive date and time the session should run with",
            "default": null,
            "logicalType": "timestamp-millis"
        },
        {
            "name": "simulationSpeed",
            "type": [
                "null",
                "float"
            ],
            "doc": "Optional speed factor this session wants to run a simulation. The range of this speed factor is [0, infinity)",
            "default": null
        }
    ]
}

@lerouxrgd
Copy link
Owner Author

Hello,
I have actually opened an issue on their Jira:
https://issues.apache.org/jira/browse/AVRO-3928
As far as I know @martin-g fixed it already, now we just have to wait for a new release of the Rust SDK.

@martin-g
Copy link
Contributor

martin-g commented Feb 20, 2024 via email

@erikvullings
Copy link

@martin-g Do you expect that this fix will also solve my issue?

@martin-g
Copy link
Contributor

martin-g commented Feb 21, 2024

Please test your use case against main branch and report an issue if it does not work!
The easiest way is to use git dependency!
Or clone the repo and test it as a unit test.

@lerouxrgd
Copy link
Owner Author

I have tried to test it by updating Cargo.toml as follows:

[dependencies]
apache-avro = { version = "0.17", features = ["derive"] }

[patch.crates-io]
apache-avro = { git = "https://github.com/apache/avro.git" }

However there are some breaking changes to figure out first.

@lerouxrgd
Copy link
Owner Author

Giving it a quick try, it looks like BigDecimal is not re-exported as part of the public API (contrarily to say apache_avro::Duration) , I think it should be re-exported so that consumers can actually match against it etc.

@lerouxrgd
Copy link
Owner Author

lerouxrgd commented Feb 24, 2024

@martin-g If BigDecimal gets re-exposed through apache_avro::BigDecimal then everything should be fine. For now I have pushed my fixes to the branch apache-avro-0.17, I will merge it into master when apache-avro gets updated.

@erikvullings As for your example it now generates the following Rust code:

/// Initialization – preparing for the actual start of the simulation time; Started – the simulation time is started; Paused – the simulation time is paused; Stopped – the simulation time is stopped; Reset – the simulation time is reset
#[derive(Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Clone, serde::Deserialize, serde::Serialize)]
pub enum TimeState {
    Initialization,
    Started,
    Paused,
    Stopped,
    Reset,
}

/// The time management message can be used for informing connected applications on time progression and changes. *Copyright (C) 2019-2020 XVR Simulation B.V., Delft, The Netherlands, Martijn Hendriks <hendriks @ xvrsim.com>. This file is licensed under the MIT license : https://github.com/DRIVER-EU/avro-schemas/blob/master/LICENSE*
#[derive(Debug, PartialEq, Clone, serde::Deserialize, serde::Serialize)]
pub struct TimeManagement {
    pub state: TimeState,
    #[serde(default = "default_timemanagement_tags")]
    pub tags: Option<::std::collections::HashMap<String, String>>,
    #[serde(default = "default_timemanagement_timestamp")]
    pub timestamp: Option<i64>,
    #[serde(rename = "simulationTime")]
    #[serde(default = "default_timemanagement_simulation_time")]
    pub simulation_time: Option<i64>,
    #[serde(rename = "simulationSpeed")]
    #[serde(default = "default_timemanagement_simulation_speed")]
    pub simulation_speed: Option<f32>,
}

#[inline(always)]
fn default_timemanagement_tags() -> Option<::std::collections::HashMap<String, String>> { None }

#[inline(always)]
fn default_timemanagement_timestamp() -> Option<i64> { None }

#[inline(always)]
fn default_timemanagement_simulation_time() -> Option<i64> { None }

#[inline(always)]
fn default_timemanagement_simulation_speed() -> Option<f32> { None }

Does that look good to you ?

@lerouxrgd
Copy link
Owner Author

By the way, I think that the timestamp fields should be defined as:

        {
            "name": "timestamp",
            "type": [
                "null",
                {"type": "long", "logicalType": "timestamp-millis"}
            ],
            "doc": "Optional UNIX Epoch time in milliseconds marking the time the update was or needs to be performed",
            "default": null
        }

Instead of what you currently have, that is:

        {
            "name": "timestamp",
            "type": [
                "null",
                "long"
            ],
            "doc": "Optional UNIX Epoch time in milliseconds marking the time the update was or needs to be performed",
            "default": null,
            "logicalType": "timestamp-millis"
        }

@martin-g
Copy link
Contributor

@lerouxrgd Is there a failing test in branch apache-avro-0.17 ? The build and tests pass here.
Could you please add the problematic code as a test case, so I can use it to verify the fix in apache_avro! Thanks!

@lerouxrgd
Copy link
Owner Author

I am trying with the following schema:

{
  "type": "record",
  "name": "Decimals",
  "fields": [ {
    "name": "a_decimal",
    "type": {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
  }, {
    "name": "a_big_decimal",
    "type": ["null", {"type": "bytes", "logicalType": "big-decimal"}],
    "default": null
  } ]
}

Which for now generates the following Rust code:

#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
    pub a_decimal: apache_avro::Decimal,
    #[serde(default = "default_decimals_a_big_decimal")]
    pub a_big_decimal: Option<apache_avro::BigDecimal>,
}

#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }

However there are a few things that prevent it from compiling:

  • As mentioned earlier apache_avro::BigDecimal does not exist
  • apache_avro::Decimal is not Eq, however the underlying num_bigint::BigInt is Eq
  • Surprisingly, to me at least, apache_avro::Decimal is not Serialize nor Deserialize

@martin-g
Copy link
Contributor

What do you think about adding examples/ to rsgen-avro that uses a build.rs to generate the Rust types and a dummy main.rs that just uses them ?
It could be used by the rsgen-avro users as a blueprint how to use it as a library and it will expose such kind of problems.
I could provide a PR if you like the idea!

@martin-g
Copy link
Contributor

I am not sure whether examples/ could use a build.rs ... But if it doesn't then we could add an internal/private sub-crate instead.

@lerouxrgd
Copy link
Owner Author

That's pretty much what is done in the tests/schemas dir already. Currently I added a test there as follows:

diff --git a/tests/schemas/decimals.avsc b/tests/schemas/decimals.avsc
new file mode 100644
index 0000000..d3c09a5
--- /dev/null
+++ b/tests/schemas/decimals.avsc
@@ -0,0 +1,12 @@
+{
+  "type": "record",
+  "name": "Decimals",
+  "fields": [ {
+    "name": "a_decimal",
+    "type": {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
+  }, {
+    "name": "a_big_decimal",
+    "type": ["null", {"type": "bytes", "logicalType": "big-decimal"}],
+    "default": null
+  } ]
+}
diff --git a/tests/schemas/decimals.rs b/tests/schemas/decimals.rs
new file mode 100644
index 0000000..974d3c4
--- /dev/null
+++ b/tests/schemas/decimals.rs
@@ -0,0 +1,10 @@
+
+#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
+pub struct Decimals {
+    pub a_decimal: apache_avro::Decimal,
+    #[serde(default = "default_decimals_a_big_decimal")]
+    pub a_big_decimal: Option<apache_avro::BigDecimal>,
+}
+
+#[inline(always)]
+fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
\ No newline at end of file
diff --git a/tests/schemas/mod.rs b/tests/schemas/mod.rs
index a7a6ee8..7767d34 100644
--- a/tests/schemas/mod.rs
+++ b/tests/schemas/mod.rs
@@ -1,4 +1,5 @@
 pub mod complex;
+pub mod decimals;
 pub mod enums;
 pub mod enums_casing;
 pub mod enums_multiline_doc;

@martin-g
Copy link
Contributor

OK! Thanks!
I will take a deeper look tomorrow!

If you want you can push this failing test to apache-avro-0.17 branch! Otherwise I will apply it locally to test my fixes!

@lerouxrgd
Copy link
Owner Author

Sure, I have force pushed it to apache-avro-0.17 branch.

@martin-g
Copy link
Contributor

BigDecimal re-export:

@martin-g
Copy link
Contributor

@martin-g
Copy link
Contributor

martin-g commented Feb 27, 2024

I think I fixed all issues!
But now it seems the new test has its own problem:


failures:
    gen_decimals

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 27 filtered out; finished in 0.20s


--- STDERR:              rsgen-avro::generation gen_decimals ---
thread 'gen_decimals' panicked at tests/generation.rs:19:5:
assertion failed: `(left == right)`: 

>>>>>>>>>>>>>>>>> Expected: 

#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
    pub a_decimal: apache_avro::Decimal,
    #[serde(default = "default_decimals_a_big_decimal")]
    pub a_big_decimal: Option<apache_avro::BigDecimal>,
}

#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
>>>>>>>>>>>>>>>>> But generated: 

#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct Decimals {
    pub a_decimal: apache_avro::Decimal,
    #[serde(default = "default_decimals_a_big_decimal")]
    pub a_big_decimal: Option<apache_avro::BigDecimal>,
}

#[inline(always)]
fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }


Diff < left / right > :
 
 #[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
 pub struct Decimals {
     pub a_decimal: apache_avro::Decimal,
     #[serde(default = "default_decimals_a_big_decimal")]
     pub a_big_decimal: Option<apache_avro::BigDecimal>,
 }
 
 #[inline(always)]
 fn default_decimals_a_big_decimal() -> Option<apache_avro::BigDecimal> { None }
>

@lerouxrgd
Copy link
Owner Author

Thanks for the fixes @martin-g !
It all looks good now (I've fixed the extra \n in my unit test).

Besides, I was thinking that maybe apache-avro should re-export all the types it uses it its public API, which means also apache_avro::Uuid, what do you think ?

@martin-g
Copy link
Contributor

I think it makes a perfect sense!
Let me do it!

martin-g added a commit to apache/avro that referenced this issue Feb 28, 2024
martin-g added a commit to apache/avro that referenced this issue Feb 28, 2024
Requested-at: lerouxrgd/rsgen-avro#61 (comment)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
(cherry picked from commit 64c1198)
@martin-g
Copy link
Contributor

Done!

What about the time/date/duration types ?
Currently they are backed by plain integers but maybe we should use Instant/Duration/unix_ts types ?!

@lerouxrgd
Copy link
Owner Author

Awesome, thanks !
Actually I think all other types are re-exported already, I mostly looked at the types used in apache_avro::types::Value. So it all looks good to me.

@erikvullings
Copy link

@lerouxrgd Thanks for the feedback on my schema! Indeed, when rewriting the schema as you suggested (which also made complete sense to me), it already works with the current rsgen-avro implementation. So that's great. Thank you and @martin-g for helping me out! Really appreciated.

@lerouxrgd
Copy link
Owner Author

Done!

What about the time/date/duration types ? Currently they are backed by plain integers but maybe we should use Instant/Duration/unix_ts types ?!

@martin-g Actually I could think of one more re-export that would be useful, that is serde_bytes.

In #38 we discussed using serde_bytes for Value::Fixed as it is already done for Value::Bytes (following GH-36). It was blocked on serde-rs/bytes#26 but now it is merged so we can implement it.

Ideally rsgen-avro would use it as #[serde(with = "apache_avro::serde_bytes")] instead of the current #[serde(with = "serde_bytes")]. What do you think about it ?

@martin-g
Copy link
Contributor

I will take a look soon!

@martin-g
Copy link
Contributor

Ideally rsgen-avro would use it as #[serde(with = "apache_avro::serde_bytes")] instead of the current #[serde(with = "serde_bytes")]. What do you think about it ?

Looks good!
But I think there is more work to add proper support for serde_bytes to apache_avro.
As you may have seen apache/avro#1900 is not finished ...
I don't remember now what was the actual issue with deserialization... I will need to refresh my memory but I won't have time for it in the next week or so.
Any help is welcome!

RanbirK pushed a commit to RanbirK/avro that referenced this issue May 13, 2024
@lerouxrgd
Copy link
Owner Author

With release 0.14.0 the following have been fixed:

  • Fix date example parsing (fixed in apache-avro)
  • Support for BigDecimal (re-exported from apache-avro)
  • Support for Uuid (re-exported from apache-avro)
  • Better fixed/byte values support (through apache-avro specialized serde functions)

martin-g added a commit to apache/avro-rs that referenced this issue Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants