Pipeline for Diarization and ASR

Building a pipeline to apply diarization and asr on political wavfiles.

Designed to run on a slurm cluster.

Work in progress ☕

Currently contains 3 steps:

Diarization: split wavfile according to speaker turns. (every new wavfile contains a single speaker).
Segmentation: split the diarized wavfiles into wavfiles of length 15-30 seconds for optimized use of Wav2Vec2
ASR: apply Wav2Vec2

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.vscode		.vscode
diarization		diarization
segmentation		segmentation
transcription		transcription
.gitignore		.gitignore
ASRpipeline.py		ASRpipeline.py
README.md		README.md
Segmentation_bookkeep.json		Segmentation_bookkeep.json
diarizerobject.txt		diarizerobject.txt
filelist_clean.txt		filelist_clean.txt
iterchannels_out.txt		iterchannels_out.txt
itertracks_out.txt		itertracks_out.txt
love		love
make_slurmscripts.py		make_slurmscripts.py
sync_ALLICE.sh		sync_ALLICE.sh
test2.rttm		test2.rttm
test_bookkeep.json		test_bookkeep.json
test_diarization_bookkeep.json		test_diarization_bookkeep.json
wavfiles.txt		wavfiles.txt

Provide feedback