Come and try out the AI voice generating services in our website!
We used Neural Network Models (GlowTTS, HIFI-GAN, mlp) with KSS dataset and preprocessed Taeyeon voice dataset to create an optimized model. We synthesized the voice with the newly created model then converted the input text to speech. These processes allow users to listen to Taeyeon sing different singer's songs. i.e) Hyo Shin Park
Backend: Flask
Frontend: React, Next.js, Typescript, jQuery, Redux, Redux-Saga, styled-components
Middleware: Gunicorn
etc: Nginx, Docker, Mysql, Colaboratory, Google Cloud Storage, Pytorch, Swagger
$git clone --recursive https://github.com/SiliconWildCat/SiliconWildCat.git
docker-compose up -d
- Frontend
http://localhost:80
- Backend
http://localhost:8000
Frontend: http://localhost:3000
Backend: http://localhost:5000
This website provides 2 features, Text To Speech and Singing Voice Synthesize.
1) Provides clips of music in the style of our source voice(Taeyeon) covering songs originally from other singers.
2) Provides two options of voices that reads out a given text.
-
Enter the text you want to convert and select the desired voice to play the text as the corresponding voice.
-
In Text To Speech, GlowTTS and HIFI-GAN were used.
-
Train the audio dataset converted to Mel spectogram to learn the tone and pronounce of voice based Glow TTS Neural Network.
-
Reduce Noise and make the voice similar to the actual speaker by Hifi-Gan Neural Network.
-
-
This will provide the result of synthesizing songs with singer Taeyeon's voice.
-
In Voice Synthesizing, MLP Neural Network and HIFI-GAN were used.
-
Build the MLP Neural Network Layers based model with three files - text file, midi file, vocal file - to create a Mel-spectrogram.
We use text files and midi files to extract the pitch and phoneme to generate mel-spectrogram.
-
Reduce Noise and make the voice similar to the actual speaker by Hifi-Gan Neural Network.
-
How to Initiallize
> when you use npm
npm i && npm run build && npm start
> when you use yarn
yarn && yarn build && yarn start
About Installation
1. yarn : you can get node modules
./frontend/node_moduels
2. yarn build : you can get next build files
./frontend/
3. yarn start : run webpage!!!
About Pages
When you start the webpage you will see the SVG(Singing Voice Synthesis) page first.
Switching in between two pages can be reached by clicking on the button.
Enjoy IT! π
Directory Structure
frontend
β£ components
β β£ Music
β β β£ Music.tsx
β β β music.scss
β β£ Tts.tsx
β β musicPlayer.tsx
β£ hooks
β β£ createRequestSaga.ts
β β useSelector.tsx
β£ interface
β β£ counter.ts
β β£ loading.ts
β β tts.ts
β£ lib
β β api
β β β£ api.ts
β β β client.ts
β£ modules
β β£ index.ts
β β£ loading.ts
β β tts.ts
β£ pages
β β£ _app.tsx
β β£ _document.tsx
β β index.tsx
How to Initiallize
docker exec -it backend /bin/bash
python3 run.py
About
Enter the text you want to convert to desired voice. Our project provides Taeyeon and KSS voice dataset. If you select the voice and press the 'say it' button, the audio file will be saved in the path below.
>> /app/audio.wav
Directory Structure
backend
β£ web
β β£ TTS (submodule)
β β£ g2pK (submodule)
β β£ glowtts-v2 (Text to Mel spectogram Model)
β β β£ KSS
β β β TaeYeon
β β£ hifigan-v2 (Mel spectogram to Audio Model)
β β β£ KSS
β β β TaeYeon
β β£ config.py (database configuration)
β β£ inference.py (TTS synthesis)
β β£ run.py
β β saveText.py (save text to DB)
β£ Dockerfile
β requirements.txt
Submodule
g2pK : g2p module that converts graphemes to phonemes for Korean language
TTS : library for advanced Text-to-Speech generation