how to do pronunciation evaluation with pocketsphinx? #357
Replies: 4 comments
-
Hello,
|
Beta Was this translation helpful? Give feedback.
-
Hi thanks for your quick reply? I have tried $ ./pocketsphinx -phone_align yes align 1.wav "these kids will be really disappointed if their trip gets cancelled"
if I change another totally wrong sentence to align the same audio file, I also get the high "p" result, it's not what I want: $ ./pocketsphinx -phone_align yes align 1.wav "hello thank you thank you very much"
I am trying https://github.com/jsalsman/featex, which was mentioned in https://cmusphinx.github.io/wiki/pocketsphinx_pronunciation_evaluation/, how do you think this repo? or do you have any advice about pocketsphinx to make it precise? Thanks very much! |
Beta Was this translation helpful? Give feedback.
-
Hi Yangang, I'm the author of the page you mentioned and the repo to which you linked. Sadly both are out of date. The "p" value is not a confidence score, and to the extent that it is, each one has its own different relative scale. The approach of trying to normalize them (including everything called "goodness of pronunciation" scores) is literally a dead end; the easiest way to understand why is to carefully read the part in https://www.isca-speech.org/archive/pdfs/interspeech_2015/loukina15_interspeech.pdf explaining that miscomprehensions are only due to pronunciation errors 14% of the time; the figure for how often such errors cause miscomprehensions is similarly very small. When you combine that with the fact that in the Common European Framework of Reference for Languages assessment criteria for "overall phonological control," intelligibility outweighs formally correct pronunciation at all levels, it is unavoidable that you want to measure intelligibility instead of any other measurement of pronunciation quality in any educational context except trying to train someone to exactly match one specific, invariable accent. If you do not, you are wasting the vast majority of the learner's time and will end up getting reviews such as you see at 2 minutes into https://www.youtube.com/watch?v=sKo6POdNyBI&t=120s -- Rosetta Stone's speech recognition lab director Barrett Davis apparently took my advice that intelligibility assessment is necessary to heart in 2018 when I was pitching him a contract; I am not sure whether or not I have yet convinced @lenzo-ka. So, I have in the past recommended our more recent work you saw at https://arxiv.org/abs/1709.01713 and since February of this year, with the excellent articulatory feature extraction system at https://github.com/articulatory/articulatory you can use to supplement the model parameters at https://github.com/jsalsman/featex -- However, this approach requires that you have a relatively large quantity of learners' attempts at pronouncing words and phrases along with blind transcriptions of those utterances. There is no avoiding this data collection task, although you can use blinded transcriptions by the learners themselves, so it need not be expensive. This is my current approach leveraging such learner transcriptions: https://i.ibb.co/ryTdVZ3/Screenshot-2023-06-06-7-19-00-AM.png (diagram caption: Learners begin by answering questions in full sentences, then confirm the text of what they said. They then attempt to transcribe sentences spoken by other learners. The system identifies consequential mispronunciations from these transcriptions, repeating this process until sufficient data is gathered for generalization. The identified mispronunciations are then generalized by articulatory features over phonemes, diphones, syllables, and word(s), allowing for targeted remediation exercises. These exercises utilize natural spoken feedback, remixed to emphasize error locations. The system tracks the learner's progress throughout, tracking their improvement and areas that still need work. The process repeats, with the system selecting additional questions intended to elicit answers with specific words including the phones and segments in need of improvement for transcription by other learners.) I can say with certainty that this approach is far more effective per time on task for intelligibility remediation than any current commercial or free products (like Google Search's and Microsoft's pronunciation assessment and remediation tools), and anything I have seen in the academic literature and patents, after a relatively exhaustive recent search. I hope this helps. Please do not hesitate to follow up with more questions! Best regards, |
Beta Was this translation helpful? Give feedback.
-
I am very honor to receive your quick and precise reply! @jsalsman, I will read the information you recommend, Thanks, |
Beta Was this translation helpful? Give feedback.
-
Hi dear maintainer, I found this: https://cmusphinx.github.io/wiki/pocketsphinx_pronunciation_evaluation/
But the pocketsphinx_continous has disappeared, Can I use pocketsphinx_batch to achieve the same usage? or Can you give me a example to use pocketsphinx_batch, I even don't know how to set input file
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions