Readme, code structure, refactoring.

Tomiinek · May 26, 2020 · ca00959 · ca00959
1 parent 861c6e9
commit ca00959
Show file tree

Hide file tree

Showing 5 changed files with 101 additions and 724 deletions.
diff --git a/CODE.md b/CODE.md
@@ -0,0 +1,62 @@
+## Code Structure
+
+```
+Root
+ ║
+ ╠═ train.py                   <- main file used for training of models
+ ╠═ synthesize.py              <- file for synthesis of spectrograms using checkpoints
+ ╠═ gta.py                     <- script for generating ground-truth-aligned spectrograms
+ ╠═ utils                
+ ║   ├── __init__.py           <- various useful rutines: build model, move to GPU, mask by lengths        
+ ║   ├── audio.py              <- functions for audio processing, e.g., loading, spectrograms, mfcc, ...  
+ ║   ├── logging.py            <- Tensorboard logger, logs spectrograms, alignments, audios, texts, ...        
+ ║   ├── samplers.py           <- batch samplers to produce balanced batches w.r.t languages, speakers                   
+ ║   └── text.py               <- text rutines: conversion to IDs, punctuation stripping, phonemicization
+ ╠═ params     
+ ║   ├── params                <- definition of default hyperparameters and their description   
+ ║   ├── singles               <- multiple files with parameter settings for training of monolingual models               
+ ║   └── ...                   <- multiple files with parameter settings for training of multilingual models
+ ╠═ notebooks                
+ ║   ├── analyze.ipynb         <- basic dataset analysis for inspection of its properties and data distributions, with plots
+ ║   ├── audio_test.ipynb      <- experiments with audio processing and synthesis
+ ║   ├── encoder_analyze.ipynb   <- analysis of encoder outputs, speaker and language embeddings with plots
+ ║   ├── code_switching_demo.ipynb   <- code-switching synthesis demo         
+ ║   └── multi_training_demo.ipynb   <- multilingual training demo
+ ╠═ modules                
+ ║   ├── attention.py          <- attention modules: location-sensitive att., forward att., and base class  
+ ║   ├── cbhg.py               <- CBGH module known from Tacotron 1 with simple highway layer (not convolutional)
+ ║   ├── classifier.py         <- adversarial classifier with gradient reversal layer, cosine similarity classifier
+ ║   ├── encoder.py            <- multiple encoder architectures: convolutional, recurrent, generated, separate, shared
+ ║   ├── generated.py          <- meta-generated layers: 1d convolution, batch normalization
+ ║   ├── layers.py             <- regularized LSTMs (dropout, zoneout), convolutional block and highway convolutional blocks
+ ║   └── tacotron2.py          <- implementation of Tacotron 2 with all its modules and loss functions
+ ╠═ evaluation                
+ ║   ├── code-switched         <- code-switching evaluation sentences
+ ║   ├── in-domain             <- in-domain (i.e., from CSS10) monolingual evaluation sentences
+ ║   ├── out-domain            <- in-domain (i.e., from Wikipedia) monolingual evaluation sentences in ten languages
+ ║   ├── asr_request.py        <- script for scraping transcription of given audios from Google Cloud ASR
+ ║   ├── cer_computer.py       <- script for calculating character error rate between transcripts pairs
+ ║   └── mcd_request.py        <- script for getting mel cepstral distortion between two spectrograms (includes DTW)
+ ╠═ dataset_prepare                
+ ║   ├── mecab_convertor.py    <- romanization of Japanese script
+ ║   ├── pinyin_convertor.py   <- romanization of Chinese script
+ ║   ├── normalize_comvoi.sh   <- basic shell script for downloading, extracting and cleaning od some Common Voice data
+ ║   ├── normalize_css10.sh    <- set of regular expressions for cleaning CSS10 dataset transcripts  
+ ║   └── normalize_mailabs.sh  <- probably not complete set of reg. exp.s for cleaning M-AILABS dataset transcripts
+ ╠═ dataset
+ ║   ├── dataset.py            <- TTS dataset, contains mel and linear spec., texts, phonemes, speaker and language IDs and a 
+ ║   │                            function for generating proper meta-files and spectrograms for some datasets (see loaders.py) 
+ ║   └── loaders.py            <- methods for loading popular TTS datasets into standardized python list (see dataset.py above)
+ ╚═ data                
+     ├── comvoi_clean 
+     │    ├── all.txt          <- prepared meta-file for cleaned Common Voice dataset
+     │    └── silence.sh       <- script for removal of leading or trailing "silence" of Common Voice audios
+     ├── css10
+     │    ├── train.txt        <- prepared meta-file for training set of cleaned CSS10 dataset
+     │    └── val.txt          <- prepared meta-file for validation set of cleaned CSS10 dataset
+     ├── css_comvoi 
+     │    ├── train.txt        <- prepared meta-file for training set of dataset which is mixture of CSS10 and CV
+     │    └── val.txt          <- prepared meta-file for validation set of dataset which is mixture of CSS10 and CV
+     └── prepare_css_spectrograms.py    <- ad-hoc script for generating linear and mel spectrograms, see README.md 
+    
+```
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 <p align="center">
 <a href="https://colab.research.google.com/github/Tomiinek/Multilingual_Text_to_Speech/blob/master/notebooks/code_switching_demo.ipynb"><b>Interactive synthesis demo</b></a><br>
-<a href="http://tts.neqindi.cz"><b>Website with samples</b></a>
+<a href="https://tomiinek.github.io/multilingual_speech_samples/"><b>Website with samples</b></a>
 </p>
 
 <p>&nbsp;</p>
@@ -27,9 +27,9 @@ We provide synthesized samples, training and evaluation data, source code, and p
 
 **Interactive demos** introducing code-switching abilities and joint multilingual training of the generated model (trained on an enhanced CSS10 dataset) are available [here](https://colab.research.google.com/github/Tomiinek/Multilingual_Text_to_Speech/blob/master/notebooks/code_switching_demo.ipynb) and [here](https://github.com/Tomiinek/Multilingual_Text_to_Speech/blob/master/notebooks/multi_training_demo.ipynb), respectively.
 
-Many **samples synthesized using the three compared models** are at [this website](http://tts.neqindi.cz). It contains also a few samples synthesized by a monolingual vanilla Tacotron trained on LJ Speech with the Griffin-Lim vocoder (a sanity check of our implementation).
+Many **samples synthesized using the three compared models** are at [this website](https://tomiinek.github.io/multilingual_speech_samples/). It contains also a few samples synthesized by a monolingual vanilla Tacotron trained on LJ Speech with the Griffin-Lim vocoder (a sanity check of our implementation).
 
-Our best model supporting code-switching or voice-cloning can be downloaded [here](https://www.dropbox.com/s/hjrlg5d11er0u0c/generated_switching.pyt) and the best model trained on the whole CSS10 dataset without the ambition to do voice-cloning is available [here](https://www.dropbox.com/s/0vlz1fu2c6k1zfy/generated_training.pyt).
+Our best model supporting code-switching or voice-cloning can be downloaded [here](https://github.com/Tomiinek/Multilingual_Text_to_Speech/releases/download/v1.0/generated_switching.pyt) and the best model trained on the whole CSS10 dataset without the ambition to do voice-cloning is available [here](https://github.com/Tomiinek/Multilingual_Text_to_Speech/releases/download/v1.0/generated_training.pyt).
 
 <p>&nbsp;</p>
 
@@ -142,5 +142,10 @@ echo "01|Dies ist ein Beispieltext.|00-fr|de" | python3 synthesize.py --checkpoi
 
 ## Vocoding
 
-We used the WaveRNN model for vocoding. You can download [WaveRNN weights](https://www.dropbox.com/s/ydep8fdzbplaamu/wavernn_weight.pyt) pre-trained on the whole CSS10 dataset.
+We used the WaveRNN model for vocoding. You can download [WaveRNN weights](https://github.com/Tomiinek/Multilingual_Text_to_Speech/releases/download/v1.0/wavernn_weight.pyt) pre-trained on the whole CSS10 dataset.
 For examples of usage, visit our interactive demos ([here](https://colab.research.google.com/github/Tomiinek/Multilingual_Text_to_Speech/blob/master/notebooks/code_switching_demo.ipynb) and [here](https://github.com/Tomiinek/Multilingual_Text_to_Speech/blob/master/notebooks/multi_training_demo.ipynb)) or [this repository](https://github.com/Tomiinek/WaveRNN).
+
+
+## Code Structure
+
+Please, see [this file](https://github.com/Tomiinek/Multilingual_Text_to_Speech/blob/master/CODE.md) for more details about the contained source-code and its structure.