Are you planning to add Uyghur to the supported language list? #1889

neouyghur · 2023-12-11T04:55:58Z

neouyghur
Dec 11, 2023

Hello, I'm interested in knowing whether there are plans to include Uyghur in the list of supported languages. Additionally, I am curious about the reason behind the absence of a tokenizer for Uyghur. Thank you!

Abdurahman-Amat · 2023-12-18T10:37:35Z

Abdurahman-Amat
Dec 18, 2023

@neouyghur you are totally right! Actually the publicly available open Dataset of the Uyghur language from CommonVoice https://commonvoice.mozilla.org/ug/datasets (dateded: 14.09.2023 and since then added a lot) should be enough to be listed and added to the OpenAI Whisper Tokenizer. This Uyghur dateset is already being used by Speechmatics https://speechmatics.com/ and there is also a demo Video by an Uyghur computer YouTube channel https://www.youtube.com/watch?v=JnxOuaJINwM (Speech to Text is demonstrated at 2:07 of the video).
Since this demo video was uploaded there is a significant increase in the dataset (see the screenshot at below is from CommonVoice)
So, please add the Uyghur to the Whisper tokenizer ....

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are you planning to add Uyghur to the supported language list? #1889

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Are you planning to add Uyghur to the supported language list? #1889

neouyghur Dec 11, 2023

Replies: 1 comment

Abdurahman-Amat Dec 18, 2023

neouyghur
Dec 11, 2023

Abdurahman-Amat
Dec 18, 2023