Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-15473: Hide duplicate plugins in /connector-plugins #14398

Merged

Conversation

gharris1727
Copy link
Contributor

In #14089 the PluginDesc::compareTo method was changed, such that plugins with the same name and version but different classloaders could be given a consistent order. This caused compareTo to return non-zero values where previously it returned zero.

The compareTo method is used by TreeSet to determine uniqueness in lieu of the more typical equals() in other Set implementations. This means that the PluginScanResult now stored multiple copies of a plugin with the same "name", which was beneficial for the connect-plugin-path script where the distinction was valuable. DelegatingClassLoader::computePluginLoaders already disambiguates among multiple PluginDescs for the same plugin name by picking the "last" one, as defined by compareTo.

However, the ConnectorPluginsResource does not disambiguate in the same way, and instead just forwards the list of plugins as PluginInfo objects, which contain only the class, type, and version of the plugin. This means that when a plugin is provided by multiple sources but with the exact same class, type, and version, the /connector-plugins endpoint contains confusing duplicates.

Instead of showing duplicates without any information to distinguish them, we should hide the trivial duplicates (duplicates where PluginInfo::equals is true). In order to verify this, the ConnectorPluginsResourceTest is changed to assert on the returned List rather than a Set, to detect duplicates.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@gharris1727 gharris1727 marked this pull request as ready for review September 18, 2023 21:49
@gharris1727
Copy link
Contributor Author

@yashmayya Could you take a look at this?

I'd also appreciate your opinion on whether this warrants blocking the 3.6.0 release, or if this is just a cosmetic improvement that can wait for a future release.

Copy link
Contributor

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gharris1727 thanks for the fix and the great explanation here. The change looks good to me and I think it does qualify as a release blocker because there could definitely exist clients that don't expect duplicate plugins in the GET /connector-plugins response (and fail to handle it gracefully). I would note that while such clients are potentially unlikely to exist in the wild, we cannot guarantee the non-existence of such clients and should ensure as best as we can that we don't cause any breakages.

Edit: I just saw this comment which states that clients would already need to handle duplicate plugins. However, the duplicates introduced in #14089 (that is being fixed here) would be a lot more in number for most installations of Kafka Connect. I'm now on the fence about whether or not this qualifies as a release blocker.


I was initially concerned with the discrepancy in the implementations of the compareTo and equals methods in the PluginDesc class and wondered if this could lead to other unexpected issues or bugs in the future. However, it looks like the contract for Comparable::compareTo does allow for such a discrepancy:

It is strongly recommended, but not strictly required that (x.compareTo(y)==0) == (x.equals(y)). Generally speaking, any class that implements the Comparable interface and violates this condition should clearly indicate this fact. The recommended language is "Note: this class has a natural ordering that is inconsistent with equals."

Could we add such a note to the PluginDesc class for posterity?

I'd also like to note that we're using SortedSet<PluginDesc> in the PluginScanner and PluginScanResult classes but Set<PluginDesc> in the Plugins class. For instance, PluginScanResult::sinkConnectors returns a SortedSet<PluginDesc<SinkConnector>> but Plugins::sinkConnectors returns a Set<PluginDesc<SinkConnector>>. From the SortedSet Javadoc:

Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

I think this could definitely lead to at least some confusion if not potential new bugs in the future and I wonder if we should re-visit this.

@vamossagar12
Copy link
Contributor

vamossagar12 commented Sep 19, 2023

Thanks @gharris1727 for the quick fix. While the changes are in place, IMO it is ok to not qualify this as a release blocker as duplicates plugins can be seen in other ways as well looks like. But I do agree that with Yash that we might see more number of duplicates.

Copy link
Contributor

@hgeraldino hgeraldino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gharris1727
Copy link
Contributor Author

However, the duplicates introduced in #14089 (that is being fixed here) would be a lot more in number for most installations of Kafka Connect. I'm now on the fence about whether or not this qualifies as a release blocker.

Yes, the number of duplicates is much higher and might actually be noticed now. Previously it was really only the dual converter/headerconverters which caused duplicates, and there would just be pairs of them. Now, if i install every public plugin, some are really duplicated a lot:

  16 "{\"class\":\"io.confluent.connect.avro.AvroConverter\",\"type\":\"converter\"}"
   5 "{\"class\":\"io.confluent.connect.cloud.storage.source.format.ByteArrayConverter\",\"type\":\"converter\"}"
   2 "{\"class\":\"io.confluent.connect.jms.JmsSourceConnector\",\"type\":\"source\",\"version\":\"5.5.1\"}"
   2 "{\"class\":\"io.confluent.connect.protobuf.ProtobufConverter\",\"type\":\"converter\"}"
  13 "{\"class\":\"io.confluent.connect.storage.tools.SchemaSourceConnector\",\"type\":\"source\",\"version\":\"3.6.0\"}"
   4 "{\"class\":\"io.debezium.converters.BinaryDataConverter\",\"type\":\"converter\"}"
   4 "{\"class\":\"io.debezium.converters.BinaryDataConverter\",\"type\":\"header_converter\"}"
   4 "{\"class\":\"io.debezium.converters.ByteArrayConverter\",\"type\":\"converter\"}"
   4 "{\"class\":\"io.debezium.converters.ByteArrayConverter\",\"type\":\"header_converter\"}"
   6 "{\"class\":\"io.debezium.transforms.ByLogicalTableRouter\",\"type\":\"transformation\"}"
   5 "{\"class\":\"io.debezium.transforms.ExtractNewRecordState\",\"type\":\"transformation\"}"
   5 "{\"class\":\"io.debezium.transforms.outbox.EventRouter\",\"type\":\"transformation\"}"
   5 "{\"class\":\"io.debezium.transforms.tracing.ActivateTracingSpan\",\"type\":\"transformation\"}"
   5 "{\"class\":\"org.apache.kafka.connect.storage.SimpleHeaderConverter\",\"type\":\"header_converter\"}"
   5 "{\"class\":\"org.apache.kafka.connect.storage.StringConverter\",\"type\":\"converter\"}"
   5 "{\"class\":\"org.apache.kafka.connect.storage.StringConverter\",\"type\":\"header_converter\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.Cast$Key\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.Cast$Value\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.ExtractField$Key\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.ExtractField$Value\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.Filter\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.Flatten$Key\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.Flatten$Value\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.HoistField$Key\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.HoistField$Value\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.InsertField$Key\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.InsertField$Value\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.MaskField$Key\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.MaskField$Value\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.RegexRouter\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.ReplaceField$Key\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.ReplaceField$Value\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.SetSchemaMetadata$Key\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.SetSchemaMetadata$Value\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.TimestampConverter$Key\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.TimestampConverter$Value\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.TimestampRouter\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.ValueToKey\",\"type\":\"transformation\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.predicates.HasHeaderKey\",\"type\":\"predicate\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.predicates.RecordIsTombstone\",\"type\":\"predicate\"}"
   2 "{\"class\":\"org.apache.kafka.connect.transforms.predicates.TopicNameMatches\",\"type\":\"predicate\"}"

@gharris1727
Copy link
Contributor Author

I was initially concerned with the discrepancy in the implementations of the compareTo and equals methods in the PluginDesc class and wondered if this could lead to other unexpected issues or bugs in the future. However, it looks like the contract for Comparable::compareTo does allow for such a discrepancy:

I don't believe we have this discrepancy, and don't need this caveat. Because PluginDesc::equals is using klass.equals, and PluginDesc::compareTo is using name.compareTo and location.compareTo, they should be consistent. We can change the equals method to use name.equals and location.equals to make it more obvious, but that would be mostly a readability change. (The same class name but loaded from different classloaders produces Class objects where equals is false).

@yashmayya
Copy link
Contributor

Ah, good point, I totally missed the implication of the Objects.equals(klass, that.klass) check here! Thanks for the correction!

@satishd
Copy link
Member

satishd commented Sep 19, 2023

@gharris1727 Please reach out to other committers in KafkaConnect area to expedite the review. It will be helpful to review and accept it sooner if we determine it as a blocker for 3.6.0.

@gharris1727
Copy link
Contributor Author

cc @mimaison and @C0urante please see the context for this solution in the ticket: https://issues.apache.org/jira/browse/KAFKA-15473

Thanks!

Copy link
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick fix Greg!

@C0urante C0urante merged commit b088307 into apache:trunk Sep 19, 2023
@C0urante
Copy link
Contributor

@satishd I'd like to backport this to 3.6 so that, even if we determine this is not a release blocker, it can be picked up in the event that we generate a new release candidate for 3.6.0. Is that acceptable to you?

@satishd
Copy link
Member

satishd commented Sep 19, 2023

@satishd I'd like to backport this to 3.6 so that, even if we determine this is not a release blocker, it can be picked up in the event that we generate a new release candidate for 3.6.0. Is that acceptable to you?

That seems reasonable to me. We can push this into 3.6 branch and make sure a followup 3.6.x release will have the fix.

satishd pushed a commit that referenced this pull request Sep 19, 2023
k-wall pushed a commit to k-wall/kafka that referenced this pull request Nov 21, 2023
mjsax pushed a commit to confluentinc/kafka that referenced this pull request Nov 22, 2023
Cerchie pushed a commit to Cerchie/kafka that referenced this pull request Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants