Skip to content

Adding Multimodal and nominal domain

Compare
Choose a tag to compare
@Borda Borda released this 30 Nov 16:19

We are happy to announce that Torchmetrics v0.11 is now publicly available. In Torchmetrics v0.11 we have primarily focused on the cleanup of the large classification refactor from v0.10 and adding new metrics. With v0.11 are crossing 90+ metrics in Torchmetrics nearing the milestone of having 100+ metrics.

New domains

In Torchmetrics we are not only looking to expand with new metrics in already established metric domains such as classification or regression, but also new domains. We are therefore happy to report that v0.11 includes two new domains: Multimodal and nominal.

Multimodal

If there is one topic within machine learning that is hot right now then it is generative models and in particular image-to-text generative models. Just recently stable diffusion v2 was released, able to create even more photorealistic images from a single text prompt than ever

In Torchmetrics v0.11 we are adding a new domain called multimodal to support the evaluation of such models. For now, we are starting out with a single metric, the CLIPScore from this paper that can be used to evaluate such image-to-text models. CLIPScore currently achieves the highest correlation with human judgment, and thus a high CLIPScore for an image-text pair means that it is highly plausible that an image caption and an image are related to each other.

Nominal

If you have ever taken any course in statistics or introduction to machine learning you should hopefully have heard about data can be of different types of attributes: nominal, ordinal, interval, and ratio. This essentially refers to how data can be compared. For example, nominal data cannot be ordered and cannot be measured. An example, would it be data that describes the color of your car: blue, red, or green? It does not make sense to compare the different values. Ordinal data can be compared but does have not a relative meaning. An example, would it be the safety rating of a car: 1,2,3? We can say that 3 is better than 1 but the actual numerical value does not mean anything.

In v0.11 of TorchMetrics, we are adding support for classic metrics on nominal data. In fact, 4 new metrics have already been added to this domain:

  • CramersV
  • PearsonsContingencyCoefficient
  • TschuprowsT
  • TheilsU

All metrics are measures of association between two nominal variables, giving a value between 0 and 1, with 1 meaning that there is a perfect association between the variables.

Small improvements

In addition to metrics within the two new domains v0.11 of Torchmetrics contains other smaller changes and fixes:

  • TotalVariation metric has been added to the image package, which measures the complexity of an image with respect to its spatial variation.

  • MulticlassExactMatch metric has been added to the classification package, which for example can be used to measure sentence level accuracy where all tokens need to match for a sentence to be counted as correct

  • KendallRankCorrCoef have been added to the regression package for measuring the overall correlation between two variables

  • LogCoshError have been added to the regression package for measuring the residual error between two variables. It is similar to the mean squared error close to 0 but similar to the mean absolute error away from 0.


Finally, Torchmetrics now only supports v1.8 and higher of Pytorch. It was necessary to increase from v1.3 to secure because we were running into compatibility issues with an older version of Pytorch. We strive to support as many versions of Pytorch, but for the best experience, we always recommend keeping Pytorch and Torchmetrics up to date.


[0.11.0] - 2022-11-30

Added

  • Added MulticlassExactMatch to classification metrics (#1343)
  • Added TotalVariation to image package (#978)
  • Added CLIPScore to new multimodal package (#1314)
  • Added regression metrics:
    • KendallRankCorrCoef (#1271)
    • LogCoshError (#1316)
  • Added new nominal metrics:
  • Added option to pass distributed_available_fn to metrics to allow checks for custom communication backend for making dist_sync_fn actually useful (#1301)
  • Added normalize argument to Inception, FID, KID metrics (#1246)

Changed

  • Changed minimum Pytorch version to be 1.8 (#1263)
  • Changed interface for all functional and modular classification metrics after refactor (#1252)

Removed

  • Removed deprecated BinnedAveragePrecision, BinnedPrecisionRecallCurve, RecallAtFixedPrecision (#1251)
  • Removed deprecated LabelRankingAveragePrecision, LabelRankingLoss and CoverageError (#1251)
  • Removed deprecated KLDivergence and AUC (#1251)

Fixed

  • Fixed precision bug in pairwise_euclidean_distance (#1352)

Contributors

@Borda, @justusschock, @ragavvenkatesan, @shenoynikhil, @SkafteNicki, @stancld

If we forgot someone due to not matching commit email with GitHub account, let us know :]