Skip to content
This repository has been archived by the owner on May 28, 2019. It is now read-only.

Recreating blizzard baseline #55

Open
ghostcow opened this issue May 8, 2018 · 0 comments
Open

Recreating blizzard baseline #55

ghostcow opened this issue May 8, 2018 · 0 comments

Comments

@ghostcow
Copy link

ghostcow commented May 8, 2018

Hi all,

I'm trying to recreate the blizzard baseline model myself, via the following steps:

  1. Downloaded the blizzard 2011 data, then trimmed the wav files using librosa.speech.trim with threshold 15
  2. Ran extract_feats.py to extract features. Split off 1000 random samples for the validation set.
  3. Trained a model using the following training scheme:
python train.py --data data/nancy_orig_feat --noise 1 --expName nancy_init --seq-len 10 --max-seq-len 1600 --nspk 1 --lr 1e-5 --epochs 10 --visualize && \
python train.py --data data/nancy_orig_feat --noise 1 --expName nancy --seq-len 1000 --max-seq-len 1000 --nspk 1 --lr 1e-4 --epochs 90 --visualize --checkpoint checkpoints/nancy_init/bestmodel.pth

Note: this scheme was devised by looking at your published args.pth, because the scheme in your README.md did not converge.

What results is an inferior model to your uploaded pretrained model. Uploaded are 4 samples demonstrating the issue. These are the sentences used:

"Generative adversarial network or variational auto-encoder.",
"Basilar membrane and otolaryngology are not auto-correlations.",
"He has read the whole thing.",
"He reads books."

The samples: samples.zip

What can be wrong? Please help me recreate your baseline.

Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant