Skip to content

Commit

Permalink
[examples] Add SRE16 recipe. (#177)
Browse files Browse the repository at this point in the history
* init sre recipe

* [examples] add sre recipe

* Update README.md

* [examples] add sre recipe, delete used files

* merge from master

* update specify sample number each epoch

* remove trailing whitespace

* fix the repeat read dataset problem in the evaluation process and update results

* fix the repeat read dataset problem in the evaluation process and update results

* update as hongji mentioned

* Update README.md

* Update sre recipe README.md

* Update recipe part in  README.md
  • Loading branch information
czy97 authored Jul 14, 2023
1 parent 326b871 commit 2d52982
Show file tree
Hide file tree
Showing 23 changed files with 1,240 additions and 20 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ pip3 install wespeakerruntime
```

## 🔥 News
* 2023.07.14: Support the [NIST SRE16 recipe](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016), see [#177](https://github.com/wenet-e2e/wespeaker/pull/177).
* 2023.07.10: Support the [Self-Supervised Learning recipe](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxceleb/v3) on Voxceleb, including [DINO](https://openaccess.thecvf.com/content/ICCV2021/papers/Caron_Emerging_Properties_in_Self-Supervised_Vision_Transformers_ICCV_2021_paper.pdf), [MoCo](https://openaccess.thecvf.com/content_CVPR_2020/papers/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.pdf) and [SimCLR](http://proceedings.mlr.press/v119/chen20j/chen20j.pdf), see [#180](https://github.com/wenet-e2e/wespeaker/pull/180).

* 2023.06.30: Support the [SphereFace2](https://ieeexplore.ieee.org/abstract/document/10094954) loss function, with better performance and noisy robust in comparison with the ArcMargin Softmax, see [#173](https://github.com/wenet-e2e/wespeaker/pull/173).
Expand All @@ -44,14 +45,16 @@ pip3 install wespeakerruntime
## Recipes

* [VoxCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxceleb): Speaker Verification recipe on the [VoxCeleb dataset](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/)
* 🔥 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achiving **2.627%** (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.
* 🔥 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achiving **0.447%/0.043** EER/mindcf on vox1-O-clean test set
* 🔥 UPDATE 2023.07.10: We support self-supervised learning recipe on Voxceleb! Achieving **2.627%** (ECAPA_TDNN_GLOB_c1024) EER on vox1-O-clean test set without any labels.
* 🔥 UPDATE 2022.10.31: We support deep r-vector up to the 293-layer version! Achieving **0.447%/0.043** EER/mindcf on vox1-O-clean test set
* 🔥 UPDATE 2022.07.19: We apply the same setups as the CNCeleb recipe, and obtain SOTA performance considering the open-source systems
- EER/minDCF on vox1-O-clean test set are **0.723%/0.069** (ResNet34) and **0.728%/0.099** (ECAPA_TDNN_GLOB_c1024), after LM fine-tuning and AS-Norm
* [CNCeleb](https://github.com/wenet-e2e/wespeaker/tree/master/examples/cnceleb/v2): Speaker Verification recipe on the [CnCeleb dataset](http://cnceleb.org/)
* 🔥 UPDATE 2022.10.31: 221-layer ResNet achieves **5.655%/0.330** EER/minDCF
* 🔥 UPDATE 2022.07.12: We migrate the winner system of CNSRC 2022 [report](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082.pdf) [slides](https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T082-ZhengyangChen.pdf)
- EER/minDCF reduction from 8.426%/0.487 to **6.492%/0.354** after large margin fine-tuning and AS-Norm
* [NIST SRE16](https://github.com/wenet-e2e/wespeaker/tree/master/examples/sre/v2): Speaker Verification recipe for the [2016 NIST Speaker Recognition Evaluation Plan](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016). Similar recipe can be found in [Kaldi](https://github.com/kaldi-asr/kaldi/tree/master/egs/sre16).
* 🔥 UPDATE 2023.07.14: We support NIST SRE16 recipe. After PLDA adaptation, we achieved 6.608%, 10.01%, and 2.974% EER on trial Pooled, Tagalog, and Cantonese, respectively.
* [VoxConverse](https://github.com/wenet-e2e/wespeaker/tree/master/examples/voxconverse): Diarization recipe on the [VoxConverse dataset](https://www.robots.ox.ac.uk/~vgg/data/voxconverse/)

## Support List:
Expand Down
21 changes: 21 additions & 0 deletions examples/sre/v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
## Results for SRE16

* Setup: fbank40, num_frms200, epoch150, Softmax, aug_prob0.6
* Scoring: cosine & PLDA & PLDA Adaptation
* Metric: EER(%)

Without PLDA training data augmentation:
| Model | Params | Backend | Pooled | Tagalog | Cantonese |
|:------|:------:|:------------:|:------------:|:------------:|:------------:|
| ResNet34-TSTP-emb256 | 6.63M | Cosine | 15.4 | 19.82 | 10.39 |
| | | PLDA | 9.36 | 14.26 | 4.513 |
| | | Adapt PLDA | 6.608 | 10.01 | 2.974 |

With PLDA training data augmentation:
| Model | Params | Backend | Pooled | Tagalog | Cantonese |
|:------|:------:|:------------:|:------------:|:------------:|:------------:|
| ResNet34-TSTP-emb256 | 6.63M | Cosine | 15.4 | 19.82 | 10.39 |
| | | PLDA | 8.944 | 13.54 | 4.462 |
| | | Adapt PLDA | 6.543 | 9.666 | 3.254 |

* 🔥 UPDATE 2023.07.14: Support the [NIST SRE16 recipe](https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016), see [#177](https://github.com/wenet-e2e/wespeaker/pull/177).
81 changes: 81 additions & 0 deletions examples/sre/v2/conf/resnet.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
### train configuration

exp_dir: exp/ResNet34-TSTP-emb256-fbank40-num_frms200-aug0.6-spFalse-saFalse-Softmax-SGD-epoch150
gpus: "[0,1]"
num_avg: 10
enable_amp: False # whether enable automatic mixed precision training

seed: 42
num_epochs: 150
save_epoch_interval: 5 # save model every 5 epochs
log_batch_interval: 100 # log every 100 batchs

dataloader_args:
batch_size: 256
num_workers: 16
pin_memory: False
prefetch_factor: 8
drop_last: True

dataset_args:
# the sample number which will be traversed within one epoch, if the value equals to 0,
# the utterance number in the dataset will be used as the sample_num_per_epoch.
sample_num_per_epoch: 780000
shuffle: True
shuffle_args:
shuffle_size: 1500
filter: True
filter_args:
min_num_frames: 100
max_num_frames: 300
resample_rate: 8000
speed_perturb: False
num_frms: 200
aug_prob: 0.6 # prob to add reverb & noise aug per sample
fbank_args:
num_mel_bins: 40
frame_shift: 10
frame_length: 25
dither: 1.0
spec_aug: False
spec_aug_args:
num_t_mask: 1
num_f_mask: 1
max_t: 10
max_f: 8
prob: 0.6

model: ResNet34 # ResNet18, ResNet34, ResNet50, ResNet101, ResNet152
model_init: null
model_args:
feat_dim: 40
embed_dim: 256
pooling_func: "TSTP" # TSTP, ASTP, MQMHASTP
two_emb_layer: False
projection_args:
project_type: "softmax" # add_margin, arc_margin, sphere, softmax, arc_margin_intertopk_subcenter

margin_scheduler: MarginScheduler
margin_update:
initial_margin: 0.0
final_margin: 0.2
increase_start_epoch: 20
fix_start_epoch: 40
update_margin: True
increase_type: "exp" # exp, linear

loss: CrossEntropyLoss
loss_args: {}

optimizer: SGD
optimizer_args:
momentum: 0.9
nesterov: True
weight_decay: 0.0001

scheduler: ExponentialDecrease
scheduler_args:
initial_lr: 0.1
final_lr: 0.00005
warm_up_epoch: 6
warm_from_zero: True
95 changes: 95 additions & 0 deletions examples/sre/v2/local/extract_sre.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#!/bin/bash

# Copyright (c) 2022 Hongji Wang ([email protected])
# 2023 Zhengyang Chen ([email protected])
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

exp_dir=''
model_path=''
nj=4
gpus="[0,1]"
data_type="shard" # shard/raw/feat
data=data
reverb_data=data/rirs/lmdb
noise_data=data/musan/lmdb
aug_plda_data=0

. tools/parse_options.sh
set -e

if [ $aug_plda_data = 0 ];then
sre_plda_data=sre
else
sre_plda_data=sre_aug
fi

data_name_array=(
"${sre_plda_data}"
"sre16_major"
"sre16_eval_enroll"
"sre16_eval_test"
)
data_list_path_array=(
"${data}/${sre_plda_data}/${data_type}.list"
"${data}/sre16_major/${data_type}.list"
"${data}/sre16_eval_enroll/${data_type}.list"
"${data}/sre16_eval_test/${data_type}.list"
)
data_scp_path_array=(
"${data}/${sre_plda_data}/wav.scp"
"${data}/sre16_major/wav.scp"
"${data}/sre16_eval_enroll/wav.scp"
"${data}/sre16_eval_test/wav.scp"
) # to count the number of wavs
nj_array=($nj $nj $nj $nj)
batch_size_array=(1 1 1 1) # batch_size of test set must be 1 !!!
num_workers_array=(1 1 1 1)
if [ $aug_plda_data = 0 ];then
aug_prob_array=(0.0 0.0 0.0 0.0)
else
aug_prob_array=(0.67 0.0 0.0 0.0)
fi
count=${#data_name_array[@]}

for i in $(seq 0 $(($count - 1))); do
wavs_num=$(wc -l ${data_scp_path_array[$i]} | awk '{print $1}')
bash tools/extract_embedding.sh --exp_dir ${exp_dir} \
--model_path $model_path \
--data_type ${data_type} \
--data_list ${data_list_path_array[$i]} \
--wavs_num ${wavs_num} \
--store_dir ${data_name_array[$i]} \
--batch_size ${batch_size_array[$i]} \
--num_workers ${num_workers_array[$i]} \
--aug_prob ${aug_prob_array[$i]} \
--reverb_data ${reverb_data} \
--noise_data ${noise_data} \
--nj ${nj_array[$i]} \
--gpus $gpus
done

wait

echo "mean vector of enroll"
python tools/vector_mean.py \
--spk2utt ${data}/sre16_eval_enroll/spk2utt \
--xvector_scp $exp_dir/embeddings/sre16_eval_enroll/xvector.scp \
--spk_xvector_ark $exp_dir/embeddings/sre16_eval_enroll/enroll_spk_xvector.ark

mkdir -p ${exp_dir}/embeddings/eval
cat ${exp_dir}/embeddings/sre16_eval_enroll/enroll_spk_xvector.scp \
${exp_dir}/embeddings/sre16_eval_test/xvector.scp \
> ${exp_dir}/embeddings/eval/xvector.scp

echo "Embedding dir is (${exp_dir}/embeddings)."
36 changes: 36 additions & 0 deletions examples/sre/v2/local/filter_utt_accd_dur.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Copyright (c) 2023 Zhengyang Chen
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import fire


def main(wav_scp, utt2voice_dur, filter_wav_scp, dur_thres=5.0):

utt2voice_dur_dict = {}
with open(utt2voice_dur, "r") as f:
for line in f:
utt, dur = line.strip().split()
utt2voice_dur_dict[utt] = float(dur)

with open(wav_scp, "r") as f, open(filter_wav_scp, "w") as fw:
for line in f:
utt = line.strip().split()[0]
if utt in utt2voice_dur_dict:
if utt2voice_dur_dict[utt] > dur_thres:
fw.write(line)


if __name__ == "__main__":
fire.Fire(main)
57 changes: 57 additions & 0 deletions examples/sre/v2/local/generate_sre_aug.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Copyright (c) 2023 Zhengyang Chen
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import os
import fire


def main(ori_dir, aug_dir, aug_copy_num=2):

if not os.path.exists(aug_dir):
os.makedirs(aug_dir)

read_wav_scp = os.path.join(ori_dir, 'wav.scp')
aug_wav_scp = os.path.join(aug_dir, 'wav.scp')
read_utt2spk = os.path.join(ori_dir, 'utt2spk')
aug_utt2spk = os.path.join(aug_dir, 'utt2spk')
read_vad = os.path.join(ori_dir, 'vad')
store_vad = os.path.join(aug_dir, 'vad')

with open(read_wav_scp, 'r') as f, open(aug_wav_scp, 'w') as wf:
for line in f:
line = line.strip().split()
utt, other_info = line[0], ' '.join(line[1:])
for i in range(aug_copy_num + 1):
wf.write(utt + '_copy-' + str(i) + ' ' + other_info + '\n')

with open(read_utt2spk, 'r') as f, open(aug_utt2spk, 'w') as wf:
for line in f:
line = line.strip().split()
utt, spk = line[0], line[1]
for i in range(aug_copy_num + 1):
wf.write(utt + '_copy-' + str(i) + ' ' + spk + '\n')

with open(read_vad, 'r') as f, open(store_vad, 'w') as wf:
for line in f:
line = line.strip().split()
seg, utt, vad = line[0], line[1], ' '.join(line[2:])
for i in range(aug_copy_num + 1):
new_seg = seg + '_copy-' + str(i)
new_utt = utt + '_copy-' + str(i)
wf.write(new_seg + ' ' + new_utt + ' ' + vad + '\n')


if __name__ == "__main__":
fire.Fire(main)
Loading

0 comments on commit 2d52982

Please sign in to comment.