Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2992: Gate LocalTimestamp references in AvroSchemaConverter #2993

Merged
merged 2 commits into from
Aug 21, 2024

Conversation

clairemcginty
Copy link
Contributor

Rationale for this change

See linked issue; AvroSchemaConverter references Avro classes and methods that don't exist before Avro 1.10

What changes are included in this PR?

Performs an Avro version check to silo references to local timestamp classes.

Are these changes tested?

I've tested them locally on one of my projects using Avro 1.8... unfortunately there isn't a great way to unit-test this :/

Are there any user-facing changes?

This should preserve the existing behavior for Avro 1.8 users while unlocking the new LocalTimestamp types for 1.10+ users.

Closes #2992

@clairemcginty
Copy link
Contributor Author

cc @RustedBones , can you take a look at this fix as well?

@@ -605,4 +611,14 @@ private static String namespace(String name, Map<String, Integer> names) {
Integer nameCount = names.merge(name, 1, (oldValue, value) -> oldValue + 1);
return nameCount > 1 ? name + nameCount : null;
}

/* Avro <= 1.9 does not support conversions to LocalTimestamp{Micros, Millis} classes */
private static boolean avroVersionSupportsLocalTimestampTypes() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of mocking this method to capture the behavior of supporting and not-supporting local timestamps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a try with Powermock!

I might start a thread on the mailing list this week discussing Avro version support more broadly, to formalize which Avro versions are currently supported/any planned EOL dates? I've been thinking it would also be good to have some automated testing of parquet-avro on different Avro versions... it would just be a little complicated to set up since we'd need either a separate Maven module per supported Avro version, or some very clever classloading :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @clairemcginty, that's a good idea!

What do we consider the lower-bound? :D Avro 1.8.2 is May 2017, so that's pretty old, but I'm aware of some breaking API changes because we exposed some Jackson classes in the public API.

To do cross-testing, we can do something similar to we do with Hadoop 2, you can get some inspiration there:

parquet-java/pom.xml

Lines 638 to 643 in d4384d3

<profile>
<id>hadoop2</id>
<properties>
<hadoop.version>2.7.3</hadoop.version>
</properties>
</profile>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 1.8 is a reasonable lower bound! for context on our use case, we use the Beam library, which until very recently was tightly coupled with 1.8. Things are now more flexible, but there's a lot of inertia surrounding 1.8 -- the breaking changes surrounding logical types make it a non-trivial upgrade 😅

I'll take a look at the hadoop setup! Avro is a bit tricky because we need to both set the Avro-compiler version, and load the Avro core library, for whatever version of Avro we want to test...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context, that helps. I see that 1.8.2 is still in there, so that sounds reasonable to me 👍

@Fokko Fokko merged commit 312a15f into apache:master Aug 21, 2024
9 checks passed
Fokko pushed a commit to Fokko/parquet-mr that referenced this pull request Aug 21, 2024
…pache#2993)

* PARQUET-2992: Gate LocalTimestamp references in AvroSchemaConverter

* apacheGH-2992: Test logical type conversion for different Avro versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet 1.14.2-RC1 breaks compat with Avro 1.7, 1.8, 1.9
2 participants