Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribution: Add new audio/speech metrics for generative audio #2464

Open
d-caviedes opened this issue Mar 20, 2024 · 7 comments
Open

Contribution: Add new audio/speech metrics for generative audio #2464

d-caviedes opened this issue Mar 20, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers New metric topic: Audio
Milestone

Comments

@d-caviedes
Copy link

🚀 Feature

Add new audio metrics for generative audio processing

Motivation

The evaluation of speech processing (denoising, dereverberation and in general enhancement) highly depends on audio metrics. Nowadays, generative AI is heavily used for speech/audio enhancement, becoming the new SOTA. However, the performance evaluation of speech enhancement with generative AI needs of reference/target less metrics that highly correlate with MOS (Mean Opinion Score). Currently implemented metrics do not allow for the correct assessment of generative speech enhancement algorithms (e.g. those based on diffusion or GANs) because they heavily rely on reference/target audio.

Newer metrics, such as DNSMOS, NISQA, CDPAM, WARPQ allow for a fundamented assessment of the performance of such algorithms (they are either reference-less or designed for generative methods). In addition, they have shown outperformance over traditional metrics (PESQ, STOI...) regarding MOS correlation.

Pitch

It would be great to have these metrics included, as they are currently available in scattered repositories
WARPQ
DNSMOS
CDPAM
NISQA

Alternatives

I cannot think of any

@d-caviedes d-caviedes added the enhancement New feature or request label Mar 20, 2024
Copy link

Hi! thanks for your contribution!, great first issue!

@SkafteNicki
Copy link
Member

Hi @d-caviedes,
Thanks for wanting to contribute to torchmetrics. Feel free to contribute any metric within the audio domain that you can :)
In general we are looking to add any metric that is used by researches or companies on a regular basis.
We welcome both parts of a implementation and full implementations and we will of cause help you with specific implementation details to get the metric into the torchmetrics library.

@SkafteNicki SkafteNicki added this to the future milestone Mar 24, 2024
@d-caviedes
Copy link
Author

Hi @SkafteNicki

Cool. Should I just work on my branch and go for pull request afterwards?

@Borda
Copy link
Member

Borda commented Mar 28, 2024

Cool. Should I just work on my branch and go for pull request afterwards?

yes, as soon you feel you want to share your work or need some guidance, please open a draft PR :)

@Borda Borda changed the title Contribution: Add new audio/speech metrics for generative audio. (I can help!) Contribution: Add new audio/speech metrics for generative audio Mar 28, 2024
@Borda Borda pinned this issue Mar 28, 2024
@Borda Borda added good first issue Good for newcomers topic: Audio labels Mar 28, 2024
@Borda Borda unpinned this issue Aug 5, 2024
@Borda
Copy link
Member

Borda commented Aug 5, 2024

hello @d-caviedes still interested in contributing? do you need some more guidance?

@d-caviedes
Copy link
Author

Hi yes, I would like to. Actually I have what I think it is a functional WARPQ in my local torchmetrics, but I would need a bit more guidance yes :)

@Borda
Copy link
Member

Borda commented Aug 8, 2024

but I would need a bit more guidance yes :)

sure, just ping me on slack or Discord 🦩

@philgzl philgzl mentioned this issue Oct 21, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers New metric topic: Audio
Projects
None yet
Development

No branches or pull requests

4 participants
@Borda @SkafteNicki @d-caviedes and others