From ac58218a14ebb5ead3e6a167260ca265239dfee8 Mon Sep 17 00:00:00 2001 From: Taejin Park Date: Mon, 21 Nov 2022 10:55:08 -0800 Subject: [PATCH] Standalone diarization+ASR evaluation script (#5439) * first commit on eval_diar_with_asr.py Signed-off-by: Taejin Park * Add a standalone diarization-ASR evaluation transcript Signed-off-by: Taejin Park * Fixed examples in docstrings Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed staticmethod error Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added description on eval modes Signed-off-by: Taejin Park * adding diar_infer_general.yaml Signed-off-by: Taejin Park * fix msdd_model in general yaml file Signed-off-by: Taejin Park * fixed errors in yaml file Signed-off-by: Taejin Park * combine into 1 commit Signed-off-by: Taejin Park * Added description on eval modes Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MoE support for T5 model (w/o expert parallel) (#5409) * clean Signed-off-by: Abhinav Khattar * kwarg ref Signed-off-by: Abhinav Khattar * fix Signed-off-by: Abhinav Khattar * fix Signed-off-by: Abhinav Khattar * test Signed-off-by: Abhinav Khattar * test Signed-off-by: Abhinav Khattar * test Signed-off-by: Abhinav Khattar * test Signed-off-by: Abhinav Khattar * test Signed-off-by: Abhinav Khattar * test Signed-off-by: Abhinav Khattar * extra args Signed-off-by: Abhinav Khattar * test Signed-off-by: Abhinav Khattar * rm prints Signed-off-by: Abhinav Khattar * style Signed-off-by: Abhinav Khattar * review comments Signed-off-by: Abhinav Khattar * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * review comments Signed-off-by: Abhinav Khattar * review comments Signed-off-by: Abhinav Khattar * fix Signed-off-by: Abhinav Khattar Signed-off-by: Abhinav Khattar Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix args (#5410) (#5416) Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Co-authored-by: Sandeep Subramanian * Fix for concat map dataset (#5133) * change for concat map dataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Exhaust longest dataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: 1-800-BAD-CODE <> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev Co-authored-by: Sandeep Subramanian * Add temporary fix for CUDA issue in Dockerfile (#5421) (#5422) Signed-off-by: Yu Yao Signed-off-by: Yu Yao Signed-off-by: Yu Yao Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> * Fix GPT generation when using sentencepiece tokenizer (#5413) (#5428) * Fix Signed-off-by: MaximumEntropy * Fix Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Co-authored-by: Yi Dong Co-authored-by: Oleksii Kuchaiev Signed-off-by: MaximumEntropy Co-authored-by: Sandeep Subramanian Co-authored-by: Yi Dong Co-authored-by: Oleksii Kuchaiev * Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (#5339) * Initial refactor Signed-off-by: MaximumEntropy * Resolve config before passing to load_from_checkpoint Signed-off-by: MaximumEntropy * Fixes for model parallel and nemo restore Signed-off-by: MaximumEntropy * Fixes for eval Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert config changes Signed-off-by: MaximumEntropy * Refactor Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo Signed-off-by: MaximumEntropy * Remove comments Signed-off-by: MaximumEntropy * Minor Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix validation reconfiguration Signed-off-by: MaximumEntropy * Remove old comment Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes for test_ds Signed-off-by: MaximumEntropy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: MaximumEntropy Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Revert "Add temporary fix for CUDA issue in Dockerfile (#5421)" (#5431) (#5432) This reverts commit 0718b17a8e1f89ee7e167698d1a5d5acad3f1b2a. Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> * [ITN] fix year date graph, cardinals extension for hundreds (#5435) * wip Signed-off-by: ekmb * add lociko's hundreds extension for cardinals Signed-off-by: ekmb * add optional end Signed-off-by: ekmb * restart ci Signed-off-by: ekmb Signed-off-by: ekmb * update doc in terms of get_label for lang id model (#5366) * reflect PR 5278 ion doc Signed-off-by: fayejf * reflect comment Signed-off-by: fayejf Signed-off-by: fayejf * Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (#5420) (#5433) * Revert workers workaround Signed-off-by: MaximumEntropy * Fix in config Signed-off-by: MaximumEntropy * Fix Signed-off-by: MaximumEntropy Signed-off-by: MaximumEntropy Co-authored-by: Oleksii Kuchaiev Signed-off-by: MaximumEntropy Co-authored-by: Sandeep Subramanian Co-authored-by: Oleksii Kuchaiev * Fixed bug in notebook (#5382) (#5394) Signed-off-by: Virginia Adams Signed-off-by: Virginia Adams Signed-off-by: Virginia Adams Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> * Fixing bug in Megatron BERT when loss mask is all zeros (#5424) * Fixing bug when loss mask is fully zero Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update megatron_bert_model.py Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update dataset_utils.py Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update dataset_utils.py Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update dataset_utils.py Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sandeep Subramanian * Use updated API for overlapping grad sync with pipeline parallelism (#5236) Signed-off-by: Tim Moon Signed-off-by: Tim Moon * support to disable sequence length + 1 input tokens for each sample in MegatronGPT (#5363) * support to disable sequence length + 1 input tokens for MegatronGPT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Anmol Gupta Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sandeep Subramanian * [TTS] Create script for processing TTS training audio (#5262) * Create script for processing TTS training audio * Update VAD trimming logic * Remove unused import Signed-off-by: Ryan * [TTS] remove useless logic for set_tokenizer. (#5430) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * Fix setting up of `ReduceLROnPlateau` learning rate scheduler (#5444) * Fix tests Signed-off-by: PeganovAnton * Add accidentally lost changes Signed-off-by: PeganovAnton Signed-off-by: PeganovAnton * Create codeql.yml (#5445) Signed-off-by: Somshubra Majumdar Signed-off-by: Somshubra Majumdar * Fix for getting tokenizer in character-based ASR models when using tarred dataset (#5442) Signed-off-by: Jonghwan Hyeon Signed-off-by: Jonghwan Hyeon * Combine 5 commits adding diar_infer_general.yaml Signed-off-by: Taejin Park Update codeql.yml Signed-off-by: Somshubra Majumdar Update codeql.yml Signed-off-by: Somshubra Majumdar fix msdd_model in general yaml file Signed-off-by: Taejin Park fixed errors in yaml file Signed-off-by: Taejin Park * moved eval_der function and fixed tqdm options Signed-off-by: Taejin Park * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed minor error in docstrings Signed-off-by: Taejin Park * removed score_labels and changed leave=True Signed-off-by: Taejin Park Signed-off-by: Taejin Park Signed-off-by: Abhinav Khattar Signed-off-by: MaximumEntropy Signed-off-by: Yu Yao Signed-off-by: ekmb Signed-off-by: fayejf Signed-off-by: Virginia Adams Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Signed-off-by: Tim Moon Signed-off-by: Ryan Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: PeganovAnton Signed-off-by: Somshubra Majumdar Signed-off-by: Jonghwan Hyeon Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abhinav Khattar Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sandeep Subramanian Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Yi Dong Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com> Co-authored-by: Anmol Gupta Co-authored-by: Ryan Langman Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: PeganovAnton Co-authored-by: Somshubra Majumdar Co-authored-by: Jonghwan Hyeon Signed-off-by: shane carroll --- .../offline_diar_with_asr_infer.py | 29 +- .../conf/inference/diar_infer_general.yaml | 90 + nemo/collections/asr/metrics/der.py | 48 + .../asr/models/clustering_diarizer.py | 4 +- .../asr/parts/utils/diarization_utils.py | 269 ++- .../asr/parts/utils/speaker_utils.py | 2 +- nemo/collections/asr/parts/utils/vad_utils.py | 12 +- scripts/speaker_tasks/eval_diar_with_asr.py | 243 ++ .../ASR_with_SpeakerDiarization.ipynb | 1222 +--------- .../Speaker_Diarization_Inference.ipynb | 2019 +---------------- 10 files changed, 749 insertions(+), 3189 deletions(-) create mode 100644 examples/speaker_tasks/diarization/conf/inference/diar_infer_general.yaml create mode 100644 scripts/speaker_tasks/eval_diar_with_asr.py diff --git a/examples/speaker_tasks/diarization/clustering_diarizer/offline_diar_with_asr_infer.py b/examples/speaker_tasks/diarization/clustering_diarizer/offline_diar_with_asr_infer.py index f1bc7c81c0a8..da9671a2fc43 100644 --- a/examples/speaker_tasks/diarization/clustering_diarizer/offline_diar_with_asr_infer.py +++ b/examples/speaker_tasks/diarization/clustering_diarizer/offline_diar_with_asr_infer.py @@ -64,9 +64,32 @@ def main(cfg): # If RTTM is provided and DER evaluation if diar_score is not None: metric, mapping_dict, _ = diar_score - der_results = asr_diar_offline.gather_eval_results(metric, mapping_dict, trans_info_dict) - wer_results = asr_diar_offline.evaluate(trans_info_dict) - asr_diar_offline.print_errors(der_results, wer_results) + + # Get session-level diarization error rate and speaker counting error + der_results = OfflineDiarWithASR.gather_eval_results( + diar_score=diar_score, + audio_rttm_map_dict=asr_diar_offline.AUDIO_RTTM_MAP, + trans_info_dict=trans_info_dict, + root_path=asr_diar_offline.root_path, + ) + + # Calculate WER and cpWER if reference CTM files exist + wer_results = OfflineDiarWithASR.evaluate( + hyp_trans_info_dict=trans_info_dict, + audio_file_list=asr_diar_offline.audio_file_list, + ref_ctm_file_list=asr_diar_offline.ctm_file_list, + ) + + # Print average DER, WER and cpWER + OfflineDiarWithASR.print_errors(der_results=der_results, wer_results=wer_results) + + # Save detailed session-level evaluation results in `root_path`. + OfflineDiarWithASR.write_session_level_result_in_csv( + der_results=der_results, + wer_results=wer_results, + root_path=asr_diar_offline.root_path, + csv_columns=asr_diar_offline.csv_columns, + ) if __name__ == '__main__': diff --git a/examples/speaker_tasks/diarization/conf/inference/diar_infer_general.yaml b/examples/speaker_tasks/diarization/conf/inference/diar_infer_general.yaml new file mode 100644 index 000000000000..8036e2882b71 --- /dev/null +++ b/examples/speaker_tasks/diarization/conf/inference/diar_infer_general.yaml @@ -0,0 +1,90 @@ +# This YAML file is created for all types of offline speaker diarization inference tasks in `/example/speaker_tasks/diarization` folder. +# The inference parameters for VAD, speaker embedding extractor, clustering module, MSDD module, ASR decoder are all included in this YAML file. +# All the keys under `diarizer` key (`vad`, `speaker_embeddings`, `clustering`, `msdd_model`, `asr`) can be selectively used for its own purpose and also can be ignored if the module is not used. +# The configurations in this YAML file is optimized to show balanced performances on various types of domain. VAD is optimized on multilingual ASR datasets and diarizer is optimized on DIHARD3 development set. +# An example line in an input manifest file (`.json` format): +# {"audio_filepath": "/path/to/audio_file", "offset": 0, "duration": null, "label": "infer", "text": "-", "num_speakers": null, "rttm_filepath": "/path/to/rttm/file", "uem_filepath": "/path/to/uem/file"} +name: &name "ClusterDiarizer" + +num_workers: 1 +sample_rate: 16000 +batch_size: 64 + +diarizer: + manifest_filepath: ??? + out_dir: ??? + oracle_vad: False # If True, uses RTTM files provided in the manifest file to get speech activity (VAD) timestamps + collar: 0.25 # Collar value for scoring + ignore_overlap: True # Consider or ignore overlap segments while scoring + + vad: + model_path: vad_multilingual_marblenet # .nemo local model path or pretrained VAD model name + external_vad_manifest: null # This option is provided to use external vad and provide its speech activity labels for speaker embeddings extraction. Only one of model_path or external_vad_manifest should be set + + parameters: # Tuned by detection error rate (false alarm + miss) on multilingual ASR evaluation datasets + window_length_in_sec: 0.63 # Window length in sec for VAD context input + shift_length_in_sec: 0.08 # Shift length in sec for generate frame level VAD prediction + smoothing: False # False or type of smoothing method (eg: median) + overlap: 0.5 # Overlap ratio for overlapped mean/median smoothing filter + onset: 0.5 # Onset threshold for detecting the beginning and end of a speech + offset: 0.3 # Offset threshold for detecting the end of a speech + pad_onset: 0.2 # Adding durations before each speech segment + pad_offset: 0.2 # Adding durations after each speech segment + min_duration_on: 0.5 # Threshold for small non_speech deletion + min_duration_off: 0.5 # Threshold for short speech segment deletion + filter_speech_first: True + + speaker_embeddings: + model_path: titanet_large # .nemo local model path or pretrained model name (titanet_large, ecapa_tdnn or speakerverification_speakernet) + parameters: + window_length_in_sec: [1.9,1.2,0.5] # Window length(s) in sec (floating-point number). either a number or a list. ex) 1.5 or [1.5,1.0,0.5] + shift_length_in_sec: [0.95,0.6,0.25] # Shift length(s) in sec (floating-point number). either a number or a list. ex) 0.75 or [0.75,0.5,0.25] + multiscale_weights: [1,1,1] # Weight for each scale. should be null (for single scale) or a list matched with window/shift scale count. ex) [0.33,0.33,0.33] + save_embeddings: True # If True, save speaker embeddings in pickle format. This should be True if clustering result is used for other models, such as `msdd_model`. + + clustering: + parameters: + oracle_num_speakers: False # If True, use num of speakers value provided in manifest file. + max_num_speakers: 8 # Max number of speakers for each recording. If an oracle number of speakers is passed, this value is ignored. + enhanced_count_thres: 80 # If the number of segments is lower than this number, enhanced speaker counting is activated. + max_rp_threshold: 0.25 # Determines the range of p-value search: 0 < p <= max_rp_threshold. + sparse_search_volume: 10 # The higher the number, the more values will be examined with more time. + maj_vote_spk_count: False # If True, take a majority vote on multiple p-values to estimate the number of speakers. + + msdd_model: + model_path: null # .nemo local model path or pretrained model name for multiscale diarization decoder (MSDD) + parameters: + use_speaker_model_from_ckpt: True # If True, use speaker embedding model in checkpoint. If False, the provided speaker embedding model in config will be used. + infer_batch_size: 25 # Batch size for MSDD inference. + sigmoid_threshold: [0.7] # Sigmoid threshold for generating binarized speaker labels. The smaller the more generous on detecting overlaps. + seq_eval_mode: False # If True, use oracle number of speaker and evaluate F1 score for the given speaker sequences. Default is False. + split_infer: True # If True, break the input audio clip to short sequences and calculate cluster average embeddings for inference. + diar_window_length: 50 # The length of split short sequence when split_infer is True. + overlap_infer_spk_limit: 5 # If the estimated number of speakers are larger than this number, overlap speech is not estimated. + + asr: + model_path: null # Provide NGC cloud ASR model name. stt_en_conformer_ctc_* models are recommended for diarization purposes. + parameters: + asr_based_vad: False # if True, speech segmentation for diarization is based on word-timestamps from ASR inference. + asr_based_vad_threshold: 1.0 # Threshold (in sec) that caps the gap between two words when generating VAD timestamps using ASR based VAD. + asr_batch_size: null # Batch size can be dependent on each ASR model. Default batch sizes are applied if set to null. + decoder_delay_in_sec: null # Native decoder delay. null is recommended to use the default values for each ASR model. + word_ts_anchor_offset: null # Offset to set a reference point from the start of the word. Recommended range of values is [-0.05 0.2]. + word_ts_anchor_pos: "start" # Select which part of the word timestamp we want to use. The options are: 'start', 'end', 'mid'. + fix_word_ts_with_VAD: False # Fix the word timestamp using VAD output. You must provide a VAD model to use this feature. + colored_text: False # If True, use colored text to distinguish speakers in the output transcript. + print_time: True # If True, the start and end time of each speaker turn is printed in the output transcript. + break_lines: False # If True, the output transcript breaks the line to fix the line width (default is 90 chars) + + ctc_decoder_parameters: # Optional beam search decoder (pyctcdecode) + pretrained_language_model: null # KenLM model file: .arpa model file or .bin binary file. + beam_width: 32 + alpha: 0.5 + beta: 2.5 + + realigning_lm_parameters: # Experimental feature + arpa_language_model: null # Provide a KenLM language model in .arpa format. + min_number_of_words: 3 # Min number of words for the left context. + max_number_of_words: 10 # Max number of words for the right context. + logprob_diff_threshold: 1.2 # The threshold for the difference between two log probability values from two hypotheses. + diff --git a/nemo/collections/asr/metrics/der.py b/nemo/collections/asr/metrics/der.py index 000f05f10d11..78553032bc01 100644 --- a/nemo/collections/asr/metrics/der.py +++ b/nemo/collections/asr/metrics/der.py @@ -108,6 +108,54 @@ def score_labels( return None +def evaluate_der(audio_rttm_map_dict, all_reference, all_hypothesis, diar_eval_mode='all'): + """ + Evaluate with a selected diarization evaluation scheme + + AUDIO_RTTM_MAP (dict): + Dictionary containing information provided from manifestpath + all_reference (list[uniq_name,annotation]): + reference annotations for score calculation + all_hypothesis (list[uniq_name,annotation]): + hypothesis annotations for score calculation + diar_eval_mode (str): + Diarization evaluation modes + + diar_eval_mode == "full": + DIHARD challenge style evaluation, the most strict way of evaluating diarization + (collar, ignore_overlap) = (0.0, False) + diar_eval_mode == "fair": + Evaluation setup used in VoxSRC challenge + (collar, ignore_overlap) = (0.25, False) + diar_eval_mode == "forgiving": + Traditional evaluation setup + (collar, ignore_overlap) = (0.25, True) + diar_eval_mode == "all": + Compute all three modes (default) + """ + eval_settings = [] + if diar_eval_mode == "full": + eval_settings = [(0.0, False)] + elif diar_eval_mode == "fair": + eval_settings = [(0.25, False)] + elif diar_eval_mode == "forgiving": + eval_settings = [(0.25, True)] + elif diar_eval_mode == "all": + eval_settings = [(0.0, False), (0.25, False), (0.25, True)] + else: + raise ValueError("`diar_eval_mode` variable contains an unsupported value") + + for collar, ignore_overlap in eval_settings: + diar_score = score_labels( + AUDIO_RTTM_MAP=audio_rttm_map_dict, + all_reference=all_reference, + all_hypothesis=all_hypothesis, + collar=collar, + ignore_overlap=ignore_overlap, + ) + return diar_score + + def calculate_session_cpWER_bruteforce(spk_hypothesis: List[str], spk_reference: List[str]) -> Tuple[float, str, str]: """ Calculate cpWER with actual permutations in brute-force way when LSA algorithm cannot deliver the correct result. diff --git a/nemo/collections/asr/models/clustering_diarizer.py b/nemo/collections/asr/models/clustering_diarizer.py index a7c6b2e5a1f9..5b690f65e649 100644 --- a/nemo/collections/asr/models/clustering_diarizer.py +++ b/nemo/collections/asr/models/clustering_diarizer.py @@ -213,7 +213,7 @@ def _run_vad(self, manifest_file): data.append(get_uniqname_from_filepath(file)) status = get_vad_stream_status(data) - for i, test_batch in enumerate(tqdm(self._vad_model.test_dataloader(), desc='vad', leave=False)): + for i, test_batch in enumerate(tqdm(self._vad_model.test_dataloader(), desc='vad', leave=True)): test_batch = [x.to(self._device) for x in test_batch] with autocast(): log_probs = self._vad_model(input_signal=test_batch[0], input_signal_length=test_batch[1]) @@ -342,7 +342,7 @@ def _extract_embeddings(self, manifest_file: str, scale_idx: int, num_scales: in all_embs = torch.empty([0]) for test_batch in tqdm( - self._speaker_model.test_dataloader(), desc=f'[{scale_idx}/{num_scales}] extract embeddings', leave=False + self._speaker_model.test_dataloader(), desc=f'[{scale_idx+1}/{num_scales}] extract embeddings', leave=True ): test_batch = [x.to(self._device) for x in test_batch] audio_signal, audio_signal_len, labels, slices = test_batch diff --git a/nemo/collections/asr/parts/utils/diarization_utils.py b/nemo/collections/asr/parts/utils/diarization_utils.py index 19a8c2fe4632..f0b951eb89d3 100644 --- a/nemo/collections/asr/parts/utils/diarization_utils.py +++ b/nemo/collections/asr/parts/utils/diarization_utils.py @@ -187,6 +187,78 @@ def convert_word_dict_seq_to_ctm( return ctm_lines +def get_total_result_dict( + der_results: Dict[str, Dict[str, float]], wer_results: Dict[str, Dict[str, float]], csv_columns: List[str], +): + """ + Merge WER results and DER results into a single dictionary variable. + + Args: + der_results (dict): + Dictionary containing FA, MISS, CER and DER values for both aggregated amount and + each session. + wer_results (dict): + Dictionary containing session-by-session WER and cpWER. `wer_results` only + exists when CTM files are provided. + + Returns: + total_result_dict (dict): + Dictionary containing both DER and WER results. This dictionary contains unique-IDs of + each session and `total` key that includes average (cp)WER and DER/CER/Miss/FA values. + """ + total_result_dict = {} + for uniq_id in der_results.keys(): + if uniq_id == 'total': + continue + total_result_dict[uniq_id] = {x: "-" for x in csv_columns} + total_result_dict[uniq_id]["uniq_id"] = uniq_id + if uniq_id in der_results: + total_result_dict[uniq_id].update(der_results[uniq_id]) + if uniq_id in wer_results: + total_result_dict[uniq_id].update(wer_results[uniq_id]) + total_result_jsons = list(total_result_dict.values()) + return total_result_jsons + + +def get_audacity_label(word: str, stt_sec: float, end_sec: float, speaker: str) -> str: + """ + Get a string formatted line for Audacity label. + + Args: + word (str): + A decoded word + stt_sec (float): + Start timestamp of the word + end_sec (float): + End timestamp of the word + + Returns: + speaker (str): + Speaker label in string type + """ + spk = speaker.split('_')[-1] + return f'{stt_sec}\t{end_sec}\t[{spk}] {word}' + + +def get_num_of_spk_from_labels(labels: List[str]) -> int: + """ + Count the number of speakers in a segment label list. + Args: + labels (list): + List containing segment start and end timestamp and speaker labels. + + Example: + >>> labels = ["15.25 21.82 speaker_0", "21.18 29.51 speaker_1", ... ] + + Returns: + n_spk (int): + The number of speakers in the list `labels` + + """ + spk_set = [x.split(' ')[-1].strip() for x in labels] + return len(set(spk_set)) + + class OfflineDiarWithASR: """ A class designed for performing ASR and diarization together. @@ -248,7 +320,12 @@ def __init__(self, cfg_diarizer): self.make_file_lists() - self.color_palette = { + self.color_palette = self.get_color_palette() + self.csv_columns = self.get_csv_columns() + + @staticmethod + def get_color_palette() -> Dict[str, str]: + return { 'speaker_0': '\033[1;32m', 'speaker_1': '\033[1;34m', 'speaker_2': '\033[1;30m', @@ -262,7 +339,9 @@ def __init__(self, cfg_diarizer): 'white': '\033[0;37m', } - self.csv_columns = [ + @staticmethod + def get_csv_columns() -> List[str]: + return [ 'uniq_id', 'DER', 'CER', @@ -347,15 +426,20 @@ def _save_VAD_labels_list(self, word_ts_dict: Dict[str, Dict[str, List[float]]]) """ self.VAD_RTTM_MAP = {} for idx, (uniq_id, word_timestamps) in enumerate(word_ts_dict.items()): - speech_labels_float = self._get_speech_labels_from_decoded_prediction(word_timestamps) - speech_labels = self._get_str_speech_labels(speech_labels_float) + speech_labels_float = self.get_speech_labels_from_decoded_prediction( + word_timestamps, self.nonspeech_threshold + ) + speech_labels = self.get_str_speech_labels(speech_labels_float) output_path = os.path.join(self.root_path, 'pred_rttms') if not os.path.exists(output_path): os.makedirs(output_path) filename = labels_to_rttmfile(speech_labels, uniq_id, output_path) self.VAD_RTTM_MAP[uniq_id] = {'audio_filepath': self.audio_file_list[idx], 'rttm_filepath': filename} - def _get_speech_labels_from_decoded_prediction(self, input_word_ts: List[float]) -> List[float]: + @staticmethod + def get_speech_labels_from_decoded_prediction( + input_word_ts: List[float], nonspeech_threshold: float, + ) -> List[float]: """ Extract speech labels from the ASR output (decoded predictions) @@ -375,7 +459,7 @@ def _get_speech_labels_from_decoded_prediction(self, input_word_ts: List[float]) count = len(word_ts) - 1 while count > 0: if len(word_ts) > 1: - if word_ts[count][0] - word_ts[count - 1][1] <= self.nonspeech_threshold: + if word_ts[count][0] - word_ts[count - 1][1] <= nonspeech_threshold: trangeB = word_ts.pop(count) trangeA = word_ts.pop(count - 1) word_ts.insert(count - 1, [trangeA[0], trangeB[1]]) @@ -445,8 +529,13 @@ def _get_frame_level_VAD(self, vad_processing_dir, smoothing_type=False): frame_vad_float_list.append(float(line.strip())) self.frame_VAD[uniq_id] = frame_vad_float_list + @staticmethod def gather_eval_results( - self, metric, mapping_dict: Dict[str, str], trans_info_dict: Dict[str, Dict[str, float]], decimals: int = 4 + diar_score, + audio_rttm_map_dict: Dict[str, Dict[str, str]], + trans_info_dict: Dict[str, Dict[str, float]], + root_path: str, + decimals: int = 4, ) -> Dict[str, Dict[str, float]]: """ Gather diarization evaluation results from pyannote DiarizationErrorRate metric object. @@ -466,22 +555,22 @@ def gather_eval_results( der_results (dict): Dictionary containing scores for each audio file along with aggregated results """ + metric, mapping_dict, _ = diar_score results = metric.results_ der_results = {} count_correct_spk_counting = 0 for result in results: key, score = result - pred_rttm = os.path.join(self.root_path, 'pred_rttms', key + '.rttm') + if 'hyp_rttm_filepath' in audio_rttm_map_dict[key]: + pred_rttm = audio_rttm_map_dict[key]['hyp_rttm_filepath'] + else: + pred_rttm = os.path.join(root_path, 'pred_rttms', key + '.rttm') pred_labels = rttm_to_labels(pred_rttm) - ref_rttm = self.AUDIO_RTTM_MAP[key]['rttm_filepath'] + ref_rttm = audio_rttm_map_dict[key]['rttm_filepath'] ref_labels = rttm_to_labels(ref_rttm) - ref_n_spk = self.get_num_of_spk_from_labels(ref_labels) - est_n_spk = self.get_num_of_spk_from_labels(pred_labels) - - if self.cfg_diarizer['oracle_vad']: - score['missed detection'] = 0 - score['false alarm'] = 0 + ref_n_spk = get_num_of_spk_from_labels(ref_labels) + est_n_spk = get_num_of_spk_from_labels(pred_labels) _DER, _CER, _FA, _MISS = ( (score['confusion'] + score['false alarm'] + score['missed detection']) / score['total'], @@ -783,7 +872,7 @@ def _make_json_output( sentences, terms_list = [], [] sentence = {'speaker': speaker, 'start_time': start_point, 'end_time': end_point, 'text': ''} - n_spk = self.get_num_of_spk_from_labels(diar_labels) + n_spk = get_num_of_spk_from_labels(diar_labels) logging.info(f"Creating results for Session: {uniq_id} n_spk: {n_spk} ") session_trans_dict = self._init_session_trans_dict(uniq_id=uniq_id, n_spk=n_spk) gecko_dict = self._init_session_gecko_dict() @@ -817,7 +906,7 @@ def _make_json_output( # add current word to sentence sentence['text'] += word.strip() + ' ' - audacity_label_words.append(self.get_audacity_label(word, stt_sec, end_sec, speaker)) + audacity_label_words.append(get_audacity_label(word, stt_sec, end_sec, speaker)) prev_speaker = speaker session_trans_dict['words'] = word_dict_seq_list @@ -831,8 +920,7 @@ def _make_json_output( session_trans_dict['transcription'] = ' '.join(word_seq_list) # add sentences to transcription information dict session_trans_dict['sentences'] = sentences - - self.write_and_log(uniq_id, session_trans_dict, audacity_label_words, gecko_dict, sentences) + self._write_and_log(uniq_id, session_trans_dict, audacity_label_words, gecko_dict, sentences) return session_trans_dict def _get_realignment_ranges(self, k: int, word_seq_len: int) -> Tuple[int, int]: @@ -950,14 +1038,28 @@ def realign_words_with_lm(self, word_dict_seq_list: List[Dict[str, float]]) -> L realigned_list.append(line_dict) return realigned_list - def evaluate(self, trans_info_dict: Dict[str, Dict[str, float]]) -> Dict[str, Dict[str, float]]: + @staticmethod + def evaluate( + audio_file_list: List[str], + hyp_trans_info_dict: Dict[str, Dict[str, float]], + hyp_ctm_file_list: List[str] = None, + ref_ctm_file_list: List[str] = None, + ) -> Dict[str, Dict[str, float]]: """ Evaluate the result transcripts based on the provided CTM file. WER and cpWER are calculated to assess the performance of ASR system and diarization at the same time. Args: - trans_info_dict (dict): - Dictionary containing overall results of diarization and ASR inference from all sessions. + audio_file_list (list): + List containing file path to the input audio files. + hyp_trans_info_dict (dict): + Dictionary containing the hypothesis transcriptions for all sessions. + hyp_ctm_file_list (list): + List containing file paths of the hypothesis transcriptions in CTM format for all sessions. + ref_ctm_file_list (list): + List containing file paths of the reference transcriptions in CTM format for all sessions. + + Note: Either `hyp_trans_info_dict` or `hyp_ctm_file_list` should be provided. Returns: wer_results (dict): @@ -965,15 +1067,30 @@ def evaluate(self, trans_info_dict: Dict[str, Dict[str, float]]) -> Dict[str, Di """ wer_results = {} - if self.ctm_exists: + if ref_ctm_file_list is not None: spk_hypotheses, spk_references = [], [] mix_hypotheses, mix_references = [], [] WER_values, uniq_id_list = [], [] - for (audio_file_path, ctm_file_path) in zip(self.audio_file_list, self.ctm_file_list): + for k, (audio_file_path, ctm_file_path) in enumerate(zip(audio_file_list, ref_ctm_file_list)): uniq_id = get_uniqname_from_filepath(audio_file_path) uniq_id_list.append(uniq_id) - spk_hypothesis, mix_hypothesis = convert_word_dict_seq_to_text(trans_info_dict[uniq_id]['words']) + if uniq_id != get_uniqname_from_filepath(ctm_file_path): + raise ValueError("audio_file_list has mismatch in uniq_id with ctm_file_path") + + # Either hypothesis CTM file or hyp_trans_info_dict should be provided + if hyp_ctm_file_list is not None: + if uniq_id == get_uniqname_from_filepath(hyp_ctm_file_list[k]): + spk_hypothesis, mix_hypothesis = convert_ctm_to_text(hyp_ctm_file_list[k]) + else: + raise ValueError("Hypothesis CTM files are provided but uniq_id is mismatched") + elif hyp_trans_info_dict is not None and uniq_id in hyp_trans_info_dict: + spk_hypothesis, mix_hypothesis = convert_word_dict_seq_to_text( + hyp_trans_info_dict[uniq_id]['words'] + ) + else: + raise ValueError("Hypothesis information is not provided in the correct format.") + spk_reference, mix_reference = convert_ctm_to_text(ctm_file_path) spk_hypotheses.append(spk_hypothesis) @@ -999,7 +1116,8 @@ def evaluate(self, trans_info_dict: Dict[str, Dict[str, float]]) -> Dict[str, Di return wer_results - def _get_str_speech_labels(self, speech_labels_float: List[List[float]]) -> List[str]: + @staticmethod + def get_str_speech_labels(speech_labels_float: List[List[float]]) -> List[str]: """ Convert floating point speech labels list to a list containing string values. @@ -1014,8 +1132,13 @@ def _get_str_speech_labels(self, speech_labels_float: List[List[float]]) -> List speech_labels.append("{:.3f} {:.3f} speech".format(start, end)) return speech_labels + @staticmethod def write_session_level_result_in_csv( - self, der_results: Dict[str, Dict[str, float]], wer_results: Dict[str, Dict[str, float]] + der_results: Dict[str, Dict[str, float]], + wer_results: Dict[str, Dict[str, float]], + root_path: str, + csv_columns: List[str], + csv_file_name: str = "ctm_eval.csv", ): """ This function is for development use when a CTM file is provided. @@ -1026,50 +1149,19 @@ def write_session_level_result_in_csv( Dictionary containing session-by-session results of ASR and diarization in terms of WER and cpWER. """ - target_path = f"{self.root_path}/pred_rttms/ctm_eval.csv" - logging.info(f"Writing {target_path}") - total_result_jsons = self.get_total_result_dict(der_results, wer_results) + target_path = f"{root_path}/pred_rttms" + os.makedirs(target_path, exist_ok=True) + logging.info(f"Writing {target_path}/{csv_file_name}") + total_result_jsons = get_total_result_dict(der_results, wer_results, csv_columns) try: - with open(target_path, 'w') as csvfile: - writer = csv.DictWriter(csvfile, fieldnames=self.csv_columns) + with open(f"{target_path}/{csv_file_name}", 'w') as csvfile: + writer = csv.DictWriter(csvfile, fieldnames=csv_columns) writer.writeheader() for data in total_result_jsons: writer.writerow(data) except IOError: logging.info("I/O error has occurred while writing a csv file.") - def get_total_result_dict( - self, der_results: Dict[str, Dict[str, float]], wer_results: Dict[str, Dict[str, float]] - ): - """ - Merge WER results and DER results into a single dictionary variable. - - Args: - der_results (dict): - Dictionary containing FA, MISS, CER and DER values for both aggregated amount and - each session. - wer_results (dict): - Dictionary containing session-by-session WER and cpWER. `wer_results` only - exists when CTM files are provided. - - Returns: - total_result_dict (dict): - Dictionary containing both DER and WER results. This dictionary contains unique-IDs of - each session and `total` key that includes average (cp)WER and DER/CER/Miss/FA values. - """ - total_result_dict = {} - for uniq_id in der_results.keys(): - if uniq_id == 'total': - continue - total_result_dict[uniq_id] = {x: "-" for x in self.csv_columns} - total_result_dict[uniq_id]["uniq_id"] = uniq_id - if uniq_id in der_results: - total_result_dict[uniq_id].update(der_results[uniq_id]) - if uniq_id in wer_results: - total_result_dict[uniq_id].update(wer_results[uniq_id]) - total_result_jsons = list(total_result_dict.values()) - return total_result_jsons - def _break_lines(self, string_out: str, max_chars_in_line: int = 90) -> str: """ Break the lines in the transcript. @@ -1102,7 +1194,7 @@ def _break_lines(self, string_out: str, max_chars_in_line: int = 90) -> str: return_string_out = '\n'.join(return_string_out) return return_string_out - def write_and_log( + def _write_and_log( self, uniq_id: str, session_trans_dict: Dict[str, Dict[str, float]], @@ -1131,13 +1223,16 @@ def write_and_log( string_out = self._break_lines(string_out) session_trans_dict["status"] = "success" + ctm_lines_list = convert_word_dict_seq_to_ctm(session_trans_dict['words']) dump_json_to_file(f'{self.root_path}/pred_rttms/{uniq_id}.json', session_trans_dict) dump_json_to_file(f'{self.root_path}/pred_rttms/{uniq_id}_gecko.json', gecko_dict) + write_txt(f'{self.root_path}/pred_rttms/{uniq_id}.ctm', '\n'.join(ctm_lines_list)) write_txt(f'{self.root_path}/pred_rttms/{uniq_id}.txt', string_out.strip()) write_txt(f'{self.root_path}/pred_rttms/{uniq_id}.w.label', '\n'.join(audacity_label_words)) - def print_errors(self, der_results: Dict[str, Dict[str, float]], wer_results: Dict[str, Dict[str, float]]): + @staticmethod + def print_errors(der_results: Dict[str, Dict[str, float]], wer_results: Dict[str, Dict[str, float]]): """ Print a slew of error metrics for ASR and Diarization. @@ -1154,7 +1249,7 @@ def print_errors(self, der_results: Dict[str, Dict[str, float]], wer_results: Di \nMISS : {der_results['total']['MISS']:.4f} \ \nCER : {der_results['total']['CER']:.4f} \ \nSpk. counting acc. : {der_results['total']['spk_counting_acc']:.4f}" - if self.ctm_exists: + if wer_results is not None and len(wer_results) > 0: logging.info( DER_info + f"\ncpWER : {wer_results['total']['average_cpWER']:.4f} \ @@ -1162,7 +1257,6 @@ def print_errors(self, der_results: Dict[str, Dict[str, float]], wer_results: Di ) else: logging.info(DER_info) - self.write_session_level_result_in_csv(der_results, wer_results) def print_sentences(self, sentences: List[Dict[str, float]]): """ @@ -1210,42 +1304,3 @@ def print_sentences(self, sentences: List[Dict[str, float]]): string_out += f'{color}{time_str}{speaker}: {text}\n' return string_out - - @staticmethod - def get_audacity_label(word: str, stt_sec: float, end_sec: float, speaker: str) -> str: - """ - Get a string formatted line for Audacity label. - - Args: - word (str): - A decoded word - stt_sec (float): - Start timestamp of the word - end_sec (float): - End timestamp of the word - - Returns: - speaker (str): - Speaker label in string type - """ - spk = speaker.split('_')[-1] - return f'{stt_sec}\t{end_sec}\t[{spk}] {word}' - - @staticmethod - def get_num_of_spk_from_labels(labels: List[str]) -> int: - """ - Count the number of speakers in a segment label list. - Args: - labels (list): - List containing segment start and end timestamp and speaker labels. - - Example: - >>> labels = ["15.25 21.82 speaker_0", "21.18 29.51 speaker_1", ... ] - - Returns: - n_spk (int): - The number of speakers in the list `labels` - - """ - spk_set = [x.split(' ')[-1].strip() for x in labels] - return len(set(spk_set)) diff --git a/nemo/collections/asr/parts/utils/speaker_utils.py b/nemo/collections/asr/parts/utils/speaker_utils.py index cae43d779f5e..f5cb7bce60b7 100644 --- a/nemo/collections/asr/parts/utils/speaker_utils.py +++ b/nemo/collections/asr/parts/utils/speaker_utils.py @@ -435,7 +435,7 @@ def perform_clustering(embs_and_timestamps, AUDIO_RTTM_MAP, out_rttm_dir, cluste speaker_clustering = torch.jit.script(speaker_clustering) torch.jit.save(speaker_clustering, 'speaker_clustering_script.pt') - for uniq_id, audio_rttm_values in tqdm(AUDIO_RTTM_MAP.items(), desc='clustering', leave=False): + for uniq_id, audio_rttm_values in tqdm(AUDIO_RTTM_MAP.items(), desc='clustering', leave=True): uniq_embs_and_timestamps = embs_and_timestamps[uniq_id] if clustering_params.oracle_num_speakers: diff --git a/nemo/collections/asr/parts/utils/vad_utils.py b/nemo/collections/asr/parts/utils/vad_utils.py index 0def54871c63..f3e64472d21d 100644 --- a/nemo/collections/asr/parts/utils/vad_utils.py +++ b/nemo/collections/asr/parts/utils/vad_utils.py @@ -87,13 +87,13 @@ def prepare_manifest(config: dict) -> str: p.imap(write_vad_infer_manifest_star, inputs), total=len(input_list), desc='splitting manifest', - leave=False, + leave=True, ) ) else: results = [ write_vad_infer_manifest(input_el, args_func) - for input_el in tqdm(input_list, desc='splitting manifest', leave=False) + for input_el in tqdm(input_list, desc='splitting manifest', leave=True) ] if os.path.exists(manifest_vad_input): @@ -282,12 +282,12 @@ def generate_overlap_vad_seq( p.imap(generate_overlap_vad_seq_per_file_star, inputs), total=len(frame_filepathlist), desc='generating preds', - leave=False, + leave=True, ) ) else: - for frame_filepath in tqdm(frame_filepathlist, desc='generating preds', leave=False): + for frame_filepath in tqdm(frame_filepathlist, desc='generating preds', leave=True): generate_overlap_vad_seq_per_file(frame_filepath, per_args) return overlap_out_dir @@ -731,12 +731,12 @@ def generate_vad_segment_table( p.imap(generate_vad_segment_table_per_file_star, inputs), total=len(vad_pred_filepath_list), desc='creating speech segments', - leave=False, + leave=True, ) ) else: - for vad_pred_filepath in tqdm(vad_pred_filepath_list, desc='creating speech segments', leave=False): + for vad_pred_filepath in tqdm(vad_pred_filepath_list, desc='creating speech segments', leave=True): generate_vad_segment_table_per_file(vad_pred_filepath, per_args) return table_out_dir diff --git a/scripts/speaker_tasks/eval_diar_with_asr.py b/scripts/speaker_tasks/eval_diar_with_asr.py new file mode 100644 index 000000000000..9fc651e953cd --- /dev/null +++ b/scripts/speaker_tasks/eval_diar_with_asr.py @@ -0,0 +1,243 @@ +# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import argparse +import json +import os + +from nemo.collections.asr.metrics.der import evaluate_der +from nemo.collections.asr.parts.utils.diarization_utils import OfflineDiarWithASR +from nemo.collections.asr.parts.utils.manifest_utils import read_file +from nemo.collections.asr.parts.utils.speaker_utils import ( + get_uniqname_from_filepath, + labels_to_pyannote_object, + rttm_to_labels, +) + + +""" +Evaluation script for diarization with ASR. +Calculates Diarization Error Rate (DER) with RTTM files and WER and cpWER with CTM files. +In the output ctm_eval.csv file in the output folder, +session-level DER, WER, cpWER and speaker counting accuracies are evaluated. + +- Evaluation mode + +diar_eval_mode == "full": + DIHARD challenge style evaluation, the most strict way of evaluating diarization + (collar, ignore_overlap) = (0.0, False) +diar_eval_mode == "fair": + Evaluation setup used in VoxSRC challenge + (collar, ignore_overlap) = (0.25, False) +diar_eval_mode == "forgiving": + Traditional evaluation setup + (collar, ignore_overlap) = (0.25, True) +diar_eval_mode == "all": + Compute all three modes (default) + + +Use CTM files to calculate WER and cpWER +``` +python eval_diar_with_asr.py \ + --hyp_rttm_list="/path/to/hypothesis_rttm_filepaths.list" \ + --ref_rttm_list="/path/to/reference_rttm_filepaths.list" \ + --hyp_ctm_list="/path/to/hypothesis_ctm_filepaths.list" \ + --ref_ctm_list="/path/to/reference_ctm_filepaths.list" \ + --root_path="/path/to/output/directory" +``` + +Use .json files to calculate WER and cpWER +``` +python eval_diar_with_asr.py \ + --hyp_rttm_list="/path/to/hypothesis_rttm_filepaths.list" \ + --ref_rttm_list="/path/to/reference_rttm_filepaths.list" \ + --hyp_json_list="/path/to/hypothesis_json_filepaths.list" \ + --ref_ctm_list="/path/to/reference_ctm_filepaths.list" \ + --root_path="/path/to/output/directory" +``` + +Only use RTTMs to calculate DER +``` +python eval_diar_with_asr.py \ + --hyp_rttm_list="/path/to/hypothesis_rttm_filepaths.list" \ + --ref_rttm_list="/path/to/reference_rttm_filepaths.list" \ + --root_path="/path/to/output/directory" +``` + +""" + + +def get_pyannote_objs_from_rttms(rttm_file_path_list): + """Generate PyAnnote objects from RTTM file list + """ + pyannote_obj_list = [] + for rttm_file in rttm_file_path_list: + rttm_file = rttm_file.strip() + if rttm_file is not None and os.path.exists(rttm_file): + uniq_id = get_uniqname_from_filepath(rttm_file) + ref_labels = rttm_to_labels(rttm_file) + reference = labels_to_pyannote_object(ref_labels, uniq_name=uniq_id) + pyannote_obj_list.append([uniq_id, reference]) + return pyannote_obj_list + + +def make_meta_dict(hyp_rttm_list, ref_rttm_list): + """Create a temporary `audio_rttm_map_dict` for evaluation + """ + meta_dict = {} + for k, rttm_file in enumerate(ref_rttm_list): + uniq_id = get_uniqname_from_filepath(rttm_file) + meta_dict[uniq_id] = {"rttm_filepath": rttm_file.strip()} + if hyp_rttm_list is not None: + hyp_rttm_file = hyp_rttm_list[k] + meta_dict[uniq_id].update({"hyp_rttm_filepath": hyp_rttm_file.strip()}) + return meta_dict + + +def make_trans_info_dict(hyp_json_list_path): + """Create `trans_info_dict` from the `.json` files + """ + trans_info_dict = {} + for json_file in hyp_json_list_path: + json_file = json_file.strip() + with open(json_file) as jsf: + json_data = json.load(jsf) + uniq_id = get_uniqname_from_filepath(json_file) + trans_info_dict[uniq_id] = json_data + return trans_info_dict + + +def read_file_path(list_path): + """Read file path and strip to remove line change symbol + """ + return sorted([x.strip() for x in read_file(list_path)]) + + +def main( + hyp_rttm_list_path: str, + ref_rttm_list_path: str, + hyp_ctm_list_path: str, + ref_ctm_list_path: str, + hyp_json_list_path: str, + diar_eval_mode: str = "all", + root_path: str = "./", +): + + # Read filepath list files + hyp_rttm_list = read_file_path(hyp_rttm_list_path) if hyp_rttm_list_path else None + ref_rttm_list = read_file_path(ref_rttm_list_path) if ref_rttm_list_path else None + hyp_ctm_list = read_file_path(hyp_ctm_list_path) if hyp_ctm_list_path else None + ref_ctm_list = read_file_path(ref_ctm_list_path) if ref_ctm_list_path else None + hyp_json_list = read_file_path(hyp_json_list_path) if hyp_json_list_path else None + + audio_rttm_map_dict = make_meta_dict(hyp_rttm_list, ref_rttm_list) + + trans_info_dict = make_trans_info_dict(hyp_json_list) if hyp_json_list else None + + all_hypothesis = get_pyannote_objs_from_rttms(hyp_rttm_list) + all_reference = get_pyannote_objs_from_rttms(ref_rttm_list) + + diar_score = evaluate_der( + audio_rttm_map_dict=audio_rttm_map_dict, + all_reference=all_reference, + all_hypothesis=all_hypothesis, + diar_eval_mode=diar_eval_mode, + ) + + # Get session-level diarization error rate and speaker counting error + der_results = OfflineDiarWithASR.gather_eval_results( + diar_score=diar_score, + audio_rttm_map_dict=audio_rttm_map_dict, + trans_info_dict=trans_info_dict, + root_path=root_path, + ) + + if ref_ctm_list is not None: + # Calculate WER and cpWER if reference CTM files exist + if hyp_ctm_list is not None: + wer_results = OfflineDiarWithASR.evaluate( + audio_file_list=hyp_rttm_list, + hyp_trans_info_dict=None, + hyp_ctm_file_list=hyp_ctm_list, + ref_ctm_file_list=ref_ctm_list, + ) + elif hyp_json_list is not None: + wer_results = OfflineDiarWithASR.evaluate( + audio_file_list=hyp_rttm_list, + hyp_trans_info_dict=trans_info_dict, + hyp_ctm_file_list=None, + ref_ctm_file_list=ref_ctm_list, + ) + else: + raise ValueError("Hypothesis information is not provided in the correct format.") + else: + wer_results = {} + + # Print average DER, WER and cpWER + OfflineDiarWithASR.print_errors(der_results=der_results, wer_results=wer_results) + + # Save detailed session-level evaluation results in `root_path`. + OfflineDiarWithASR.write_session_level_result_in_csv( + der_results=der_results, + wer_results=wer_results, + root_path=root_path, + csv_columns=OfflineDiarWithASR.get_csv_columns(), + ) + return None + + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument( + "--hyp_rttm_list", help="path to the filelist of hypothesis RTTM files", type=str, required=True, default=None + ) + parser.add_argument( + "--ref_rttm_list", help="path to the filelist of reference RTTM files", type=str, required=True, default=None + ) + parser.add_argument( + "--hyp_ctm_list", help="path to the filelist of hypothesis CTM files", type=str, required=False, default=None + ) + parser.add_argument( + "--ref_ctm_list", help="path to the filelist of reference CTM files", type=str, required=False, default=None + ) + parser.add_argument( + "--hyp_json_list", + help="(Optional) path to the filelist of hypothesis JSON files", + type=str, + required=False, + default=None, + ) + parser.add_argument( + "--diar_eval_mode", + help='evaluation mode: "all", "full", "fair", "forgiving"', + type=str, + required=False, + default="all", + ) + parser.add_argument( + "--root_path", help='directory for saving result files', type=str, required=False, default="./" + ) + + args = parser.parse_args() + + main( + args.hyp_rttm_list, + args.ref_rttm_list, + args.hyp_ctm_list, + args.ref_ctm_list, + args.hyp_json_list, + args.diar_eval_mode, + args.root_path, + ) diff --git a/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb b/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb index 70e1c3b60fd7..08d41c9736a2 100644 --- a/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb +++ b/tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb @@ -69,24 +69,9 @@ }, { "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/taejinp/anaconda3/lib/python3.9/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 4.0.0-unsupported is an invalid version and will not be supported in a future release\n", - " warnings.warn(\n", - "/home/taejinp/anaconda3/lib/python3.9/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 4.0.0-unsupported is an invalid version and will not be supported in a future release\n", - " warnings.warn(\n", - "[NeMo W 2022-11-10 16:20:15 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.\n", - "[NeMo W 2022-11-10 16:20:27 nemo_logging:349] /home/taejinp/anaconda3/lib/python3.9/site-packages/torch/jit/annotations.py:296: UserWarning: TorchScript will treat type annotations of Tensor dtype-specific subtypes as if they are normal Tensors. dtype constraints are not enforced in compilation either.\n", - " warnings.warn(\"TorchScript will treat type annotations of Tensor \"\n", - " \n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import nemo.collections.asr as nemo_asr\n", "import numpy as np\n", @@ -112,35 +97,9 @@ }, { "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Input audio file list: \n", - " ['/home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/an4_diarize_test.wav']\n" - ] - }, - { - "data": { - "text/html": [ - "\n", - " \n", - " " - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "ROOT = os.getcwd()\n", "data_dir = os.path.join(ROOT,'data')\n", @@ -168,7 +127,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -211,22 +170,9 @@ }, { "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "display_waveform(signal)" ] @@ -241,110 +187,9 @@ }, { "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "name: ClusterDiarizer\n", - "num_workers: 1\n", - "sample_rate: 16000\n", - "batch_size: 64\n", - "diarizer:\n", - " manifest_filepath: ???\n", - " out_dir: ???\n", - " oracle_vad: false\n", - " collar: 0.25\n", - " ignore_overlap: true\n", - " vad:\n", - " model_path: vad_multilingual_marblenet\n", - " external_vad_manifest: null\n", - " parameters:\n", - " window_length_in_sec: 0.63\n", - " shift_length_in_sec: 0.01\n", - " smoothing: false\n", - " overlap: 0.5\n", - " onset: 0.9\n", - " offset: 0.5\n", - " pad_onset: 0\n", - " pad_offset: 0\n", - " min_duration_on: 0\n", - " min_duration_off: 0.6\n", - " filter_speech_first: true\n", - " speaker_embeddings:\n", - " model_path: titanet_large\n", - " parameters:\n", - " window_length_in_sec:\n", - " - 3.0\n", - " - 2.5\n", - " - 2.0\n", - " - 1.5\n", - " - 1.0\n", - " - 0.5\n", - " shift_length_in_sec:\n", - " - 1.5\n", - " - 1.25\n", - " - 1.0\n", - " - 0.75\n", - " - 0.5\n", - " - 0.25\n", - " multiscale_weights:\n", - " - 1\n", - " - 1\n", - " - 1\n", - " - 1\n", - " - 1\n", - " - 1\n", - " save_embeddings: true\n", - " clustering:\n", - " parameters:\n", - " oracle_num_speakers: false\n", - " max_num_speakers: 8\n", - " enhanced_count_thres: 80\n", - " max_rp_threshold: 0.25\n", - " sparse_search_volume: 30\n", - " maj_vote_spk_count: false\n", - " msdd_model:\n", - " model_path: null\n", - " parameters:\n", - " use_speaker_model_from_ckpt: true\n", - " infer_batch_size: 25\n", - " sigmoid_threshold:\n", - " - 0.7\n", - " seq_eval_mode: false\n", - " split_infer: true\n", - " diar_window_length: 50\n", - " overlap_infer_spk_limit: 5\n", - " asr:\n", - " model_path: stt_en_conformer_ctc_large\n", - " parameters:\n", - " asr_based_vad: false\n", - " asr_based_vad_threshold: 1.0\n", - " asr_batch_size: null\n", - " lenient_overlap_WDER: true\n", - " decoder_delay_in_sec: null\n", - " word_ts_anchor_offset: null\n", - " word_ts_anchor_pos: start\n", - " fix_word_ts_with_VAD: false\n", - " colored_text: false\n", - " print_time: true\n", - " break_lines: false\n", - " ctc_decoder_parameters:\n", - " pretrained_language_model: null\n", - " beam_width: 32\n", - " alpha: 0.5\n", - " beta: 2.5\n", - " realigning_lm_parameters:\n", - " arpa_language_model: null\n", - " min_number_of_words: 3\n", - " max_number_of_words: 10\n", - " logprob_diff_threshold: 1.2\n", - "\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from omegaconf import OmegaConf\n", "import shutil\n", @@ -389,17 +234,9 @@ }, { "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"audio_filepath\": \"/home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/an4_diarize_test.wav\", \"offset\": 0, \"duration\": null, \"label\": \"infer\", \"text\": \"-\", \"num_speakers\": null, \"rttm_filepath\": null, \"uem_filepath\": null}\r\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Create a manifest file for input with below format. \n", "# {\"audio_filepath\": \"/path/to/audio_file\", \"offset\": 0, \"duration\": null, \"label\": \"infer\", \"text\": \"-\", \n", @@ -432,7 +269,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -467,126 +304,9 @@ }, { "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:28 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:20:28 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/stt_en_conformer_ctc_large/afb212c5bcf904e326b5e5751e7c7465/stt_en_conformer_ctc_large.nemo.\n", - "[NeMo I 2022-11-10 16:20:28 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/stt_en_conformer_ctc_large/afb212c5bcf904e326b5e5751e7c7465/stt_en_conformer_ctc_large.nemo\n", - "[NeMo I 2022-11-10 16:20:28 common:911] Instantiating model from pre-trained checkpoint\n", - "[NeMo I 2022-11-10 16:20:29 mixins:170] Tokenizer SentencePieceTokenizer initialized with 128 tokens\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:20:29 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath:\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket1/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket2/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket3/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket4/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket5/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket6/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket7/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket8/tarred_audio_manifest.json\n", - " sample_rate: 16000\n", - " batch_size: 1\n", - " shuffle: true\n", - " num_workers: 4\n", - " pin_memory: true\n", - " use_start_end_token: false\n", - " trim_silence: false\n", - " max_duration: 20.0\n", - " min_duration: 0.1\n", - " is_tarred: true\n", - " tarred_audio_filepaths:\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket1/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket2/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket3/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket4/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket5/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket6/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket7/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket8/audio__OP_0..8191_CL_.tar\n", - " shuffle_n: 2048\n", - " bucketing_strategy: synced_randomized\n", - " bucketing_batch_size:\n", - " - 34\n", - " - 30\n", - " - 26\n", - " - 22\n", - " - 18\n", - " - 16\n", - " - 12\n", - " - 8\n", - " \n", - "[NeMo W 2022-11-10 16:20:29 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath:\n", - " - /manifests/librispeech/librivox-dev-other.json\n", - " - /manifests/librispeech/librivox-dev-clean.json\n", - " - /manifests/librispeech/librivox-test-other.json\n", - " - /manifests/librispeech/librivox-test-clean.json\n", - " sample_rate: 16000\n", - " batch_size: 32\n", - " shuffle: false\n", - " num_workers: 8\n", - " pin_memory: true\n", - " use_start_end_token: false\n", - " \n", - "[NeMo W 2022-11-10 16:20:29 modelPT:155] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).\n", - " Test config : \n", - " manifest_filepath:\n", - " - /manifests/librispeech/librivox-dev-other.json\n", - " - /manifests/librispeech/librivox-dev-clean.json\n", - " - /manifests/librispeech/librivox-test-other.json\n", - " - /manifests/librispeech/librivox-test-clean.json\n", - " sample_rate: 16000\n", - " batch_size: 32\n", - " shuffle: false\n", - " num_workers: 8\n", - " pin_memory: true\n", - " use_start_end_token: false\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:29 features:225] PADDING: 0\n", - "[NeMo I 2022-11-10 16:20:33 save_restore_connector:243] Model EncDecCTCModelBPE was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/stt_en_conformer_ctc_large/afb212c5bcf904e326b5e5751e7c7465/stt_en_conformer_ctc_large.nemo.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:20:33 decoder_timestamps_utils:66] `ctc_decode` was set to True. Note that this is ignored.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:33 features:225] PADDING: 0\n", - "[NeMo I 2022-11-10 16:20:33 features:225] PADDING: 0\n", - "[NeMo I 2022-11-10 16:20:33 decoder_timestamps_utils:640] Running ASR model stt_en_conformer_ctc_large\n", - "[NeMo I 2022-11-10 16:20:33 decoder_timestamps_utils:644] [1/1] FrameBatchASR: /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/an4_diarize_test.wav\n", - "Decoded word output dictionary: \n", - " ['eleven', 'twenty', 'seven', 'fifty', 'seven', 'october', 'twenty', 'fourth', 'nineteen', 'seventy']\n", - "Word-level timestamps dictionary: \n", - " [[0.36, 0.68], [0.92, 1.28], [1.4, 1.64], [1.92, 2.28], [2.36, 2.6], [3.08, 3.52], [3.6, 3.84], [3.88, 4.12], [4.4, 4.72], [4.84, 5.16]]\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from nemo.collections.asr.parts.utils.decoder_timestamps_utils import ASRDecoderTimeStamps\n", "asr_decoder_ts = ASRDecoderTimeStamps(cfg.diarizer)\n", @@ -606,17 +326,9 @@ }, { "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:35 speaker_utils:92] Number of files to diarize: 1\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from nemo.collections.asr.parts.utils.diarization_utils import OfflineDiarWithASR\n", "asr_diar_offline = OfflineDiarWithASR(cfg.diarizer)\n", @@ -646,321 +358,9 @@ }, { "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:35 clustering_diarizer:129] Loading pretrained vad_multilingual_marblenet model from NGC\n", - "[NeMo I 2022-11-10 16:20:35 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.\n", - "[NeMo I 2022-11-10 16:20:35 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo\n", - "[NeMo I 2022-11-10 16:20:35 common:911] Instantiating model from pre-trained checkpoint\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:20:35 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath: /manifests/ami_train_0.63.json,/manifests/freesound_background_train.json,/manifests/freesound_laughter_train.json,/manifests/fisher_2004_background.json,/manifests/fisher_2004_speech_sampled.json,/manifests/google_train_manifest.json,/manifests/icsi_all_0.63.json,/manifests/musan_freesound_train.json,/manifests/musan_music_train.json,/manifests/musan_soundbible_train.json,/manifests/mandarin_train_sample.json,/manifests/german_train_sample.json,/manifests/spanish_train_sample.json,/manifests/french_train_sample.json,/manifests/russian_train_sample.json\n", - " sample_rate: 16000\n", - " labels:\n", - " - background\n", - " - speech\n", - " batch_size: 256\n", - " shuffle: true\n", - " is_tarred: false\n", - " tarred_audio_filepaths: null\n", - " tarred_shard_strategy: scatter\n", - " augmentor:\n", - " shift:\n", - " prob: 0.5\n", - " min_shift_ms: -10.0\n", - " max_shift_ms: 10.0\n", - " white_noise:\n", - " prob: 0.5\n", - " min_level: -90\n", - " max_level: -46\n", - " norm: true\n", - " noise:\n", - " prob: 0.5\n", - " manifest_path: /manifests/noise_0_1_musan_fs.json\n", - " min_snr_db: 0\n", - " max_snr_db: 30\n", - " max_gain_db: 300.0\n", - " norm: true\n", - " gain:\n", - " prob: 0.5\n", - " min_gain_dbfs: -10.0\n", - " max_gain_dbfs: 10.0\n", - " norm: true\n", - " num_workers: 16\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:20:35 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath: /manifests/ami_dev_0.63.json,/manifests/freesound_background_dev.json,/manifests/freesound_laughter_dev.json,/manifests/ch120_moved_0.63.json,/manifests/fisher_2005_500_speech_sampled.json,/manifests/google_dev_manifest.json,/manifests/musan_music_dev.json,/manifests/mandarin_dev.json,/manifests/german_dev.json,/manifests/spanish_dev.json,/manifests/french_dev.json,/manifests/russian_dev.json\n", - " sample_rate: 16000\n", - " labels:\n", - " - background\n", - " - speech\n", - " batch_size: 256\n", - " shuffle: false\n", - " val_loss_idx: 0\n", - " num_workers: 16\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:20:35 modelPT:155] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).\n", - " Test config : \n", - " manifest_filepath: null\n", - " sample_rate: 16000\n", - " labels:\n", - " - background\n", - " - speech\n", - " batch_size: 128\n", - " shuffle: false\n", - " test_loss_idx: 0\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:35 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:20:35 save_restore_connector:243] Model EncDecClassificationModel was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.\n", - "[NeMo I 2022-11-10 16:20:35 clustering_diarizer:156] Loading pretrained titanet_large model from NGC\n", - "[NeMo I 2022-11-10 16:20:35 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/titanet-l/492c0ab8416139171dc18c21879a9e45/titanet-l.nemo.\n", - "[NeMo I 2022-11-10 16:20:35 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/titanet-l/492c0ab8416139171dc18c21879a9e45/titanet-l.nemo\n", - "[NeMo I 2022-11-10 16:20:35 common:911] Instantiating model from pre-trained checkpoint\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:20:36 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath: /manifests/combined_fisher_swbd_voxceleb12_librispeech/train.json\n", - " sample_rate: 16000\n", - " labels: null\n", - " batch_size: 64\n", - " shuffle: true\n", - " time_length: 3\n", - " is_tarred: false\n", - " tarred_audio_filepaths: null\n", - " tarred_shard_strategy: scatter\n", - " augmentor:\n", - " noise:\n", - " manifest_path: /manifests/noise/rir_noise_manifest.json\n", - " prob: 0.5\n", - " min_snr_db: 0\n", - " max_snr_db: 15\n", - " speed:\n", - " prob: 0.5\n", - " sr: 16000\n", - " resample_type: kaiser_fast\n", - " min_speed_rate: 0.95\n", - " max_speed_rate: 1.05\n", - " num_workers: 15\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:20:36 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath: /manifests/combined_fisher_swbd_voxceleb12_librispeech/dev.json\n", - " sample_rate: 16000\n", - " labels: null\n", - " batch_size: 128\n", - " shuffle: false\n", - " time_length: 3\n", - " num_workers: 15\n", - " pin_memory: true\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:36 label_models:126] Setting angular: true/false in decoder is deprecated and will be removed in 1.13 version, use specific loss with _target_\n", - "[NeMo I 2022-11-10 16:20:36 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:20:37 save_restore_connector:243] Model EncDecSpeakerLabelModel was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/titanet-l/492c0ab8416139171dc18c21879a9e45/titanet-l.nemo.\n", - "[NeMo I 2022-11-10 16:20:37 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:20:37 clustering_diarizer:303] Split long audio file to avoid CUDA memory issue\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:38 vad_utils:100] The prepared manifest file exists. Overwriting!\n", - "[NeMo I 2022-11-10 16:20:38 classification_models:247] Perform streaming frame-level VAD\n", - "[NeMo I 2022-11-10 16:20:38 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:20:38 collections:300] # 1 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:38 clustering_diarizer:258] Converting frame level prediction to speech/no-speech segment in start and end times format.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:38 clustering_diarizer:281] Subsegmentation for embedding extraction: scale0, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/subsegments_scale0.json\n", - "[NeMo I 2022-11-10 16:20:38 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:20:38 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:20:38 collections:300] # 3 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:38 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:20:38 clustering_diarizer:281] Subsegmentation for embedding extraction: scale1, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/subsegments_scale1.json\n", - "[NeMo I 2022-11-10 16:20:38 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:20:38 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:20:38 collections:300] # 4 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:281] Subsegmentation for embedding extraction: scale2, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/subsegments_scale2.json\n", - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:20:39 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:20:39 collections:300] # 5 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " \r" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:281] Subsegmentation for embedding extraction: scale3, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/subsegments_scale3.json\n", - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:20:39 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:20:39 collections:300] # 6 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:281] Subsegmentation for embedding extraction: scale4, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/subsegments_scale4.json\n", - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:20:39 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:20:39 collections:300] # 10 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:281] Subsegmentation for embedding extraction: scale5, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/subsegments_scale5.json\n", - "[NeMo I 2022-11-10 16:20:39 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:20:39 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:20:39 collections:300] # 20 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:40 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/speaker_outputs/embeddings\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:20:42 der:105] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:42 clustering_diarizer:455] Outputs are saved in /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data directory\n", - "Diarization hypothesis output: \n", - " ['0.07 2.695 speaker_0', '2.695 5.199999999999999 speaker_1']\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "diar_hyp, diar_score = asr_diar_offline.run_diarization(cfg, word_ts_hyp)\n", "print(\"Diarization hypothesis output: \\n\", diar_hyp['an4_diarize_test'])" @@ -975,47 +375,9 @@ }, { "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[ 'SPEAKER an4_diarize_test 1 0.070 2.625 speaker_0 ',\n", - " 'SPEAKER an4_diarize_test 1 2.695 2.505 speaker_1 ']\n" - ] - }, - { - "data": { - "text/html": [ - "\n", - " \n", - " " - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "def read_file(path_to_file):\n", " with open(path_to_file) as f:\n", @@ -1046,18 +408,9 @@ }, { "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:43 diarization_utils:787] Creating results for Session: an4_diarize_test n_spk: 2 \n", - "[NeMo I 2022-11-10 16:20:43 diarization_utils:660] Diarization with ASR output files are saved in: /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/pred_rttms\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "trans_info_dict = asr_diar_offline.get_transcript_with_speaker_labels(diar_hyp, word_hyp, word_ts_hyp)" ] @@ -1071,18 +424,9 @@ }, { "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[ '[00:00.07 - 00:02.60] speaker_0: eleven twenty seven fifty seven',\n", - " '[00:03.08 - 00:05.16] speaker_1: october twenty fourth nineteen seventy']\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "transcription_path_to_file = f\"{data_dir}/pred_rttms/an4_diarize_test.txt\"\n", "transcript = read_file(transcription_path_to_file)\n", @@ -1100,99 +444,9 @@ }, { "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[ '{',\n", - " ' \"status\": \"success\",',\n", - " ' \"session_id\": \"an4_diarize_test\",',\n", - " ' \"transcription\": \"eleven twenty seven fifty seven october twenty '\n", - " 'fourth nineteen seventy\",',\n", - " ' \"speaker_count\": 2,',\n", - " ' \"words\": [',\n", - " ' {',\n", - " ' \"word\": \"eleven\",',\n", - " ' \"start_time\": 0.36,',\n", - " ' \"end_time\": 0.68,',\n", - " ' \"speaker\": \"speaker_0\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"word\": \"twenty\",',\n", - " ' \"start_time\": 0.92,',\n", - " ' \"end_time\": 1.28,',\n", - " ' \"speaker\": \"speaker_0\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"word\": \"seven\",',\n", - " ' \"start_time\": 1.4,',\n", - " ' \"end_time\": 1.64,',\n", - " ' \"speaker\": \"speaker_0\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"word\": \"fifty\",',\n", - " ' \"start_time\": 1.92,',\n", - " ' \"end_time\": 2.28,',\n", - " ' \"speaker\": \"speaker_0\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"word\": \"seven\",',\n", - " ' \"start_time\": 2.36,',\n", - " ' \"end_time\": 2.6,',\n", - " ' \"speaker\": \"speaker_0\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"word\": \"october\",',\n", - " ' \"start_time\": 3.08,',\n", - " ' \"end_time\": 3.52,',\n", - " ' \"speaker\": \"speaker_1\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"word\": \"twenty\",',\n", - " ' \"start_time\": 3.6,',\n", - " ' \"end_time\": 3.84,',\n", - " ' \"speaker\": \"speaker_1\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"word\": \"fourth\",',\n", - " ' \"start_time\": 3.88,',\n", - " ' \"end_time\": 4.12,',\n", - " ' \"speaker\": \"speaker_1\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"word\": \"nineteen\",',\n", - " ' \"start_time\": 4.4,',\n", - " ' \"end_time\": 4.72,',\n", - " ' \"speaker\": \"speaker_1\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"word\": \"seventy\",',\n", - " ' \"start_time\": 4.84,',\n", - " ' \"end_time\": 5.16,',\n", - " ' \"speaker\": \"speaker_1\"',\n", - " ' }',\n", - " ' ],',\n", - " ' \"sentences\": [',\n", - " ' {',\n", - " ' \"speaker\": \"speaker_0\",',\n", - " ' \"start_time\": \"0.07\",',\n", - " ' \"end_time\": 2.6,',\n", - " ' \"text\": \"eleven twenty seven fifty seven\"',\n", - " ' },',\n", - " ' {',\n", - " ' \"speaker\": \"speaker_1\",',\n", - " ' \"start_time\": 3.08,',\n", - " ' \"end_time\": 5.16,',\n", - " ' \"text\": \"october twenty fourth nineteen seventy\"',\n", - " ' }',\n", - " ' ]',\n", - " '}']\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "transcription_path_to_file = f\"{data_dir}/pred_rttms/an4_diarize_test.json\"\n", "json_contents = read_file(transcription_path_to_file)\n", @@ -1227,7 +481,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1259,20 +513,11 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": { "scrolled": true }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "word_seq_lists:\n", - " [{'word': 'eleven', 'start_time': 0.36, 'end_time': 0.68, 'speaker': 'speaker_0'}, {'word': 'twenty', 'start_time': 0.92, 'end_time': 1.28, 'speaker': 'speaker_0'}, {'word': 'seven', 'start_time': 1.4, 'end_time': 1.64, 'speaker': 'speaker_0'}, {'word': 'fifty', 'start_time': 1.92, 'end_time': 2.28, 'speaker': 'speaker_0'}, {'word': 'seven', 'start_time': 2.36, 'end_time': 2.6, 'speaker': 'speaker_0'}, {'word': 'october', 'start_time': 3.08, 'end_time': 3.52, 'speaker': 'speaker_1'}, {'word': 'twenty', 'start_time': 3.6, 'end_time': 3.84, 'speaker': 'speaker_1'}, {'word': 'fourth', 'start_time': 3.88, 'end_time': 4.12, 'speaker': 'speaker_1'}, {'word': 'nineteen', 'start_time': 4.4, 'end_time': 4.72, 'speaker': 'speaker_1'}, {'word': 'seventy', 'start_time': 4.84, 'end_time': 5.16, 'speaker': 'speaker_1'}]\n" - ] - } - ], + "outputs": [], "source": [ "from nemo.collections.asr.parts.utils.speaker_utils import get_uniqname_from_filepath\n", "\n", @@ -1301,7 +546,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1327,7 +572,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1351,21 +596,9 @@ }, { "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "spk_hypothesis: ['eleven twenty seven fifty seven', 'october twenty fourth nineteen seventy']\n", - "mix_hypothesis: eleven twenty seven fifty seven october twenty fourth nineteen seventy\n", - "\n", - "spk_reference: ['eleven twenty seven fifty seven', 'october twenty fourth nineteen seventy']\n", - "mix_reference: eleven twenty seven fifty seven october twenty fourth nineteen seventy\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from nemo.collections.asr.metrics.der import concat_perm_word_error_rate\n", "from nemo.collections.asr.metrics.wer import word_error_rate\n", @@ -1396,20 +629,9 @@ }, { "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cpWER: 0.0\n", - "WER: 0.0\n", - "concat_hyp: eleven twenty seven fifty seven october twenty fourth nineteen seventy\n", - "concat_ref: eleven twenty seven fifty seven october twenty fourth nineteen seventy\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from nemo.collections.asr.metrics.der import concat_perm_word_error_rate \n", "from nemo.collections.asr.metrics.wer import word_error_rate\n", @@ -1434,23 +656,9 @@ }, { "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "spk_hypothesis: ['eleven twenty seven fifty seven', 'october twenty fourth nineteen seventy']\n", - "mix_hypothesis: eleven twenty seven fifty seven october twenty fourth nineteen seventy\n", - "\n", - "spk_reference: ['eleven twenty seven fifty seven', 'october twenty fourth nineteen seventy']\n", - "mix_reference: eleven twenty seven fifty seven october twenty fourth nineteen seventy\n", - "cpWER: 0.0\n", - "WER: 0.0\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from nemo.collections.asr.parts.utils.diarization_utils import convert_word_dict_seq_to_text\n", "\n", @@ -1499,18 +707,9 @@ }, { "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"audio_filepath\": \"/home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/an4_diarize_test.wav\", \"offset\": 0, \"duration\": null, \"label\": \"infer\", \"text\": \"-\", \"num_speakers\": 2, \"rttm_filepath\": null, \"ctm_filepath\": \"/home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/an4_diarize_test.ctm\", \"uem_filepath\": null}\n", - "[NeMo I 2022-11-10 16:20:43 speaker_utils:92] Number of files to diarize: 1\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Create a new manifest file for input with the reference CTM file. \n", "meta = {\n", @@ -1545,23 +744,15 @@ }, { "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:20:43 diarization_utils:787] Creating results for Session: an4_diarize_test n_spk: 2 \n", - "[NeMo I 2022-11-10 16:20:43 diarization_utils:660] Diarization with ASR output files are saved in: /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/pred_rttms\n", - "cpWER: 0.0\n", - "WER: 0.0\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ + "from nemo.collections.asr.parts.utils.diarization_utils import OfflineDiarWithASR\n", "trans_info_dict = asr_diar_offline.get_transcript_with_speaker_labels(diar_hyp, word_hyp, word_ts_hyp)\n", - "session_result_dict = asr_diar_offline.evaluate(trans_info_dict)\n", + "session_result_dict = OfflineDiarWithASR.evaluate(hyp_trans_info_dict=trans_info_dict,\n", + " audio_file_list=asr_diar_offline.audio_file_list,\n", + " ref_ctm_file_list=asr_diar_offline.ctm_file_list)\n", "session_result_dict['an4_diarize_test']\n", "\n", "print(\"cpWER:\", session_result_dict['an4_diarize_test']['cpWER'])\n", @@ -1580,25 +771,9 @@ }, { "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: pyctcdecode in /home/taejinp/anaconda3/lib/python3.9/site-packages (0.4.0)\n", - "Requirement already satisfied: pygtrie<3.0,>=2.1 in /home/taejinp/anaconda3/lib/python3.9/site-packages (from pyctcdecode) (2.5.0)\n", - "Requirement already satisfied: hypothesis<7,>=6.14 in /home/taejinp/anaconda3/lib/python3.9/site-packages (from pyctcdecode) (6.56.3)\n", - "Requirement already satisfied: numpy<2.0.0,>=1.15.0 in /home/taejinp/anaconda3/lib/python3.9/site-packages (from pyctcdecode) (1.21.1)\n", - "Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /home/taejinp/anaconda3/lib/python3.9/site-packages (from hypothesis<7,>=6.14->pyctcdecode) (1.0.0rc9)\n", - "Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /home/taejinp/anaconda3/lib/python3.9/site-packages (from hypothesis<7,>=6.14->pyctcdecode) (2.4.0)\n", - "Requirement already satisfied: attrs>=19.2.0 in /home/taejinp/anaconda3/lib/python3.9/site-packages (from hypothesis<7,>=6.14->pyctcdecode) (21.2.0)\n", - "Collecting https://github.com/kpu/kenlm/archive/master.zip\n", - " Using cached https://github.com/kpu/kenlm/archive/master.zip\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "!pip install pyctcdecode\n", "!pip install https://github.com/kpu/kenlm/archive/master.zip" @@ -1613,17 +788,9 @@ }, { "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "100% [........................................................................] 99823907 / 99823907" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import gzip\n", "import shutil\n", @@ -1647,7 +814,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1664,133 +831,9 @@ }, { "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:21:04 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:21:04 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/stt_en_conformer_ctc_large/afb212c5bcf904e326b5e5751e7c7465/stt_en_conformer_ctc_large.nemo.\n", - "[NeMo I 2022-11-10 16:21:04 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/stt_en_conformer_ctc_large/afb212c5bcf904e326b5e5751e7c7465/stt_en_conformer_ctc_large.nemo\n", - "[NeMo I 2022-11-10 16:21:04 common:911] Instantiating model from pre-trained checkpoint\n", - "[NeMo I 2022-11-10 16:21:05 mixins:170] Tokenizer SentencePieceTokenizer initialized with 128 tokens\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:21:05 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath:\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket1/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket2/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket3/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket4/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket5/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket6/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket7/tarred_audio_manifest.json\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket8/tarred_audio_manifest.json\n", - " sample_rate: 16000\n", - " batch_size: 1\n", - " shuffle: true\n", - " num_workers: 4\n", - " pin_memory: true\n", - " use_start_end_token: false\n", - " trim_silence: false\n", - " max_duration: 20.0\n", - " min_duration: 0.1\n", - " is_tarred: true\n", - " tarred_audio_filepaths:\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket1/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket2/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket3/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket4/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket5/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket6/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket7/audio__OP_0..8191_CL_.tar\n", - " - - /data2/nemo_asr/nemo_asr_set_3.0/bucket8/audio__OP_0..8191_CL_.tar\n", - " shuffle_n: 2048\n", - " bucketing_strategy: synced_randomized\n", - " bucketing_batch_size:\n", - " - 34\n", - " - 30\n", - " - 26\n", - " - 22\n", - " - 18\n", - " - 16\n", - " - 12\n", - " - 8\n", - " \n", - "[NeMo W 2022-11-10 16:21:05 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath:\n", - " - /manifests/librispeech/librivox-dev-other.json\n", - " - /manifests/librispeech/librivox-dev-clean.json\n", - " - /manifests/librispeech/librivox-test-other.json\n", - " - /manifests/librispeech/librivox-test-clean.json\n", - " sample_rate: 16000\n", - " batch_size: 32\n", - " shuffle: false\n", - " num_workers: 8\n", - " pin_memory: true\n", - " use_start_end_token: false\n", - " \n", - "[NeMo W 2022-11-10 16:21:05 modelPT:155] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).\n", - " Test config : \n", - " manifest_filepath:\n", - " - /manifests/librispeech/librivox-dev-other.json\n", - " - /manifests/librispeech/librivox-dev-clean.json\n", - " - /manifests/librispeech/librivox-test-other.json\n", - " - /manifests/librispeech/librivox-test-clean.json\n", - " sample_rate: 16000\n", - " batch_size: 32\n", - " shuffle: false\n", - " num_workers: 8\n", - " pin_memory: true\n", - " use_start_end_token: false\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:21:05 features:225] PADDING: 0\n", - "[NeMo I 2022-11-10 16:21:07 save_restore_connector:243] Model EncDecCTCModelBPE was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/stt_en_conformer_ctc_large/afb212c5bcf904e326b5e5751e7c7465/stt_en_conformer_ctc_large.nemo.\n", - "[NeMo I 2022-11-10 16:21:07 decoder_timestamps_utils:380] Loading language model : /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/4gram_big.arpa\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Loading the LM will be faster if you build a binary file.\n", - "Reading /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/4gram_big.arpa\n", - "----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100\n", - "****************************************************************************************************\n", - "Unigrams and labels don't seem to agree.\n", - "[NeMo W 2022-11-10 16:21:14 decoder_timestamps_utils:66] `ctc_decode` was set to True. Note that this is ignored.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:21:14 features:225] PADDING: 0\n", - "[NeMo I 2022-11-10 16:21:14 features:225] PADDING: 0\n", - "[NeMo I 2022-11-10 16:21:14 decoder_timestamps_utils:640] Running ASR model stt_en_conformer_ctc_large\n", - "[NeMo I 2022-11-10 16:21:14 decoder_timestamps_utils:644] [1/1] FrameBatchASR: /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/an4_diarize_test.wav\n", - "[NeMo I 2022-11-10 16:21:14 decoder_timestamps_utils:656] Running beam-search decoder with LM /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/4gram_big.arpa\n", - "Decoded word output dictionary: \n", - " ['eleven', 'twenty', 'seven', 'fifty', 'seven', 'october', 'twenty', 'four', 'nineteen', 'seventy']\n", - "Word-level timestamps dictionary: \n", - " [[0.27, 0.59], [0.83, 1.19], [1.31, 1.55], [1.83, 2.19], [2.27, 2.51], [2.99, 3.43], [3.51, 3.75], [3.79, 3.95], [4.27, 4.67], [4.75, 5.07]]\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import importlib\n", "import nemo.collections.asr.parts.utils.decoder_timestamps_utils as decoder_timestamps_utils\n", @@ -1832,17 +875,9 @@ }, { "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: arpa in /home/taejinp/anaconda3/lib/python3.9/site-packages (0.1.0b4)\r\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "!pip install arpa" ] @@ -1859,17 +894,9 @@ }, { "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:21:18 speaker_utils:92] Number of files to diarize: 1\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "arpa_model_path = os.path.join(data_dir, '4gram_big.arpa')\n", "cfg.diarizer.asr.realigning_lm_parameters.arpa_language_model = arpa_model_path\n", @@ -1893,103 +920,20 @@ }, { "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:21:18 diarization_utils:308] Loading LM for realigning: /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/4gram_big.arpa\n", - "[NeMo I 2022-11-10 16:22:12 diarization_utils:787] Creating results for Session: an4_diarize_test n_spk: 2 \n", - "[NeMo I 2022-11-10 16:22:12 diarization_utils:660] Diarization with ASR output files are saved in: /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/pred_rttms\n" - ] - }, - { - "data": { - "text/plain": [ - "{'an4_diarize_test': OrderedDict([('status', 'success'),\n", - " ('session_id', 'an4_diarize_test'),\n", - " ('transcription',\n", - " 'eleven twenty seven fifty seven october twenty four nineteen seventy'),\n", - " ('speaker_count', 2),\n", - " ('words',\n", - " [{'word': 'eleven',\n", - " 'start_time': 0.27,\n", - " 'end_time': 0.59,\n", - " 'speaker': 'speaker_0'},\n", - " {'word': 'twenty',\n", - " 'start_time': 0.83,\n", - " 'end_time': 1.19,\n", - " 'speaker': 'speaker_0'},\n", - " {'word': 'seven',\n", - " 'start_time': 1.31,\n", - " 'end_time': 1.55,\n", - " 'speaker': 'speaker_0'},\n", - " {'word': 'fifty',\n", - " 'start_time': 1.83,\n", - " 'end_time': 2.19,\n", - " 'speaker': 'speaker_0'},\n", - " {'word': 'seven',\n", - " 'start_time': 2.27,\n", - " 'end_time': 2.51,\n", - " 'speaker': 'speaker_0'},\n", - " {'word': 'october',\n", - " 'start_time': 2.99,\n", - " 'end_time': 3.43,\n", - " 'speaker': 'speaker_1'},\n", - " {'word': 'twenty',\n", - " 'start_time': 3.51,\n", - " 'end_time': 3.75,\n", - " 'speaker': 'speaker_1'},\n", - " {'word': 'four',\n", - " 'start_time': 3.79,\n", - " 'end_time': 3.95,\n", - " 'speaker': 'speaker_1'},\n", - " {'word': 'nineteen',\n", - " 'start_time': 4.27,\n", - " 'end_time': 4.67,\n", - " 'speaker': 'speaker_1'},\n", - " {'word': 'seventy',\n", - " 'start_time': 4.75,\n", - " 'end_time': 5.07,\n", - " 'speaker': 'speaker_1'}]),\n", - " ('sentences',\n", - " [{'speaker': 'speaker_0',\n", - " 'start_time': '0.07',\n", - " 'end_time': 2.51,\n", - " 'text': 'eleven twenty seven fifty seven'},\n", - " {'speaker': 'speaker_1',\n", - " 'start_time': 2.99,\n", - " 'end_time': 5.07,\n", - " 'text': 'october twenty four nineteen seventy'}])])}" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "asr_diar_offline.get_transcript_with_speaker_labels(diar_hyp, word_hyp, word_ts_hyp)" ] }, { "cell_type": "code", - "execution_count": 32, + "execution_count": null, "metadata": { "scrolled": true }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[ '[00:00.07 - 00:02.51] speaker_0: eleven twenty seven fifty seven',\n", - " '[00:02.99 - 00:05.07] speaker_1: october twenty four nineteen seventy']\n" - ] - } - ], + "outputs": [], "source": [ "transcription_path_to_file = f\"{data_dir}/pred_rttms/an4_diarize_test.txt\"\n", "transcript = read_file(transcription_path_to_file)\n", diff --git a/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb b/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb index dfaed8090931..64ceb49d7d64 100644 --- a/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb +++ b/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb @@ -210,7 +210,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -238,40 +238,9 @@ }, { "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " " - ], - "text/plain": [ - "" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import IPython\n", "import matplotlib.pyplot as plt\n", @@ -303,24 +272,9 @@ }, { "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/taejinp/anaconda3/lib/python3.9/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 4.0.0-unsupported is an invalid version and will not be supported in a future release\n", - " warnings.warn(\n", - "/home/taejinp/anaconda3/lib/python3.9/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: 4.0.0-unsupported is an invalid version and will not be supported in a future release\n", - " warnings.warn(\n", - "[NeMo W 2022-11-10 16:14:22 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.\n", - "[NeMo W 2022-11-10 16:14:22 nemo_logging:349] /home/taejinp/anaconda3/lib/python3.9/site-packages/torch/jit/annotations.py:296: UserWarning: TorchScript will treat type annotations of Tensor dtype-specific subtypes as if they are normal Tensors. dtype constraints are not enforced in compilation either.\n", - " warnings.warn(\"TorchScript will treat type annotations of Tensor \"\n", - " \n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from nemo.collections.asr.parts.utils.speaker_utils import rttm_to_labels, labels_to_pyannote_object" ] @@ -334,18 +288,9 @@ }, { "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "SPEAKER an4 1 0.298981 2.47133 A \r\n", - "SPEAKER an4 1 3.163901 1.98311 B \r\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# view the sample rttm file\n", "!cat {an4_rttm}" @@ -353,28 +298,9 @@ }, { "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "['0.299 2.77 A', '3.164 5.147 B']\n" - ] - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABG0AAACsCAYAAADBlVHFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAKIElEQVR4nO3dT6jl91nH8c/TJNCStgidIQyT0AtdVDFibIfSNCKliK2maJUsWqirgFAEq1kE6iYTQfyDxoLgQm3B0GI31U2LkxZMlEBjnUknHduotJhSx9oxVmkHgpbkcTFHJkknTSa5N9/nzHm94DLnHs6Fz138YHjf35/q7gAAAAAwyytWDwAAAADge4k2AAAAAAOJNgAAAAADiTYAAAAAA4k2AAAAAAOJNgAAAAADiTYAAAAAA4k2AAAAAAOJNgAAAAADiTYAAAAAA21ltKmqn6+qrqofXL1lP1XVk1V1uqoeqaqHq+ptqzcBAAAAa2xltEnyviQPJnnv6iH77Inuvqm7fzTJh5L81upBAAAAwBpbF22q6tVJbklye668aPN0r03yX6tHAAAAAGtcvXrAi/CeJCe6+5+r6ltV9abufnj1qH3yqqo6neSVSY4kecfaOQAAAMAqLynanD16w/Ekd+3PlCTJ3UfPfv3483zmfUk+vHn9ic33+x5t3nrXfcezz7/bQ3e/8/jzfOaJ7r4pSarq5iT3VtWN3d37uAMAAADYAlt1pk1VvS4Xzj65sao6yVVJuqruvNLCRnd/rqoOJTmc5NzqPQAAAMDLa9vuaXNbknu7+/XdvdfdNyT5lyQ/vnjXvts8GeuqJP+5egsAAADw8qttOkGlqh5I8tvdfeJp7/1Kkh/q7g8sG7ZPqurJJGf+/9skv97dn144CQAAAFhkq6INAAAAwK7YtsujAAAAAHaCaAMAAAAwkGgDAAAAMJBoAwAAADCQaAMAAAAw0NWX8+FDhw713t7eAU0BAAAA2D2nTp16vLsPP/v9y4o2e3t7OXny5P6tAgAAANhxVfW1S73v8igAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIEuK9o8+c1vHtQOAIBlvv3796yesBP+5P6vrJ4AAFvlsqLNU6INAHAF+s49f7B6wk74yANfXT0BALaKy6MAAAAABhJtAAAAAAa6+nJ/4OzRGw5iBwAAO+Ctd923egIAbA1n2gAAAAAMJNoAAAAADHTZl0cdPfv1g9gBALCMy79fPg/d/c7VEwBgnPqNS7/vTBsAAACAgUQbAAAAgIFEGwAAAICBLivavOK66w5qBwDAMq+549dWT9gJt7/9DasnAMBWqe5+wR8+duxYnzx58gDnAAAAAOyWqjrV3cee/b7LowAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABqrufuEfrvqPJF87uDnwvA4leXz1COAZHJcwj+MSZnJswjxTjsvXd/fhZ795WdEGVquqk919bPUO4CLHJczjuISZHJswz/Tj0uVRAAAAAAOJNgAAAAADiTZsmz9ePQD4Ho5LmMdxCTM5NmGe0cele9oAAAAADORMGwAAAICBRBu2QlV9tKrOVdU/rN4CXFBVN1TV/VX1aFV9qao+uHoT7LqqemVVfb6qHtkcl3ev3gRcUFVXVdUXqupTq7cASVU9VlVnqup0VZ1cvee5uDyKrVBVP5HkfJJ7u/vG1XuApKqOJDnS3Q9X1WuSnErynu7+8uJpsLOqqpJc293nq+qaJA8m+WB3P7R4Guy8qrojybEkr+3ud6/eA7uuqh5Lcqy7H1+95ftxpg1bobv/Nsm3Vu8ALurub3T3w5vX30nyaJKja1fBbusLzm++vWbz5S90sFhVXZ/k1iR/unoLsF1EGwBesqraS/JjSf5u8RTYeZtLME4nOZfks93tuIT1PpzkziRPLd4BXNRJPlNVp6rql1aPeS6iDQAvSVW9Osknk/xqd3979R7Ydd39ZHfflOT6JG+pKpcVw0JV9e4k57r71OotwDPc0t1vSvLTSX55c0uOcUQbAF60zT0zPpnk4939F6v3ABd1938neSDJu9YugZ13S5Kf3dw/4xNJ3lFVH1s7Cejuf9v8ey7JXyZ5y9pFlybaAPCibG54+pEkj3b3Pav3AElVHa6qH9i8flWSn0zyj0tHwY7r7g919/XdvZfkvUn+urvfv3gW7LSqunbzII1U1bVJfirJyCcVizZshar68ySfS/LGqvrXqrp99SYgtyT5xVz4i+HpzdfPrB4FO+5Ikvur6otJ/j4X7mnj8cIA8EzXJXmwqh5J8vkkn+7uE4s3XZJHfgMAAAAM5EwbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAYLyqet3THi3/71V1dvP6fFX90ep9AAAHwSO/AYCtUlXHk5zv7t9bvQUA4CA50wYA2FpV9faq+tTm9fGq+rOq+kxVPVZVv1BVv1tVZ6rqRFVds/ncm6vqb6rqVFXdV1VH1v4WAACXJtoAAFeSNyS5NcnPJflYkvu7+0eSPJHk1k24+cMkt3X3m5N8NMlvrhoLAPD9XL16AADAPvqr7v5uVZ1JclWSE5v3zyTZS/LGJDcm+WxVZfOZbyzYCQDwvEQbAOBK8j9J0t1PVdV3++LN+57Khf/3VJIvdffNqwYCALxQLo8CAHbJPyU5XFU3J0lVXVNVP7x4EwDAJYk2AMDO6O7/TXJbkt+pqkeSnE7ytqWjAACeg0d+AwAAAAzkTBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIH+D9ZmsbQRn7DhAAAAAElFTkSuQmCC\n", - "text/plain": [ - "" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "labels = rttm_to_labels(an4_rttm)\n", "reference = labels_to_pyannote_object(labels)\n", @@ -410,17 +336,9 @@ }, { "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"audio_filepath\": \"/home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/an4_diarize_test.wav\", \"offset\": 0, \"duration\": null, \"label\": \"infer\", \"text\": \"-\", \"num_speakers\": 2, \"rttm_filepath\": \"/home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/data/an4_diarize_test.rttm\", \"uem_filepath\": null}\r\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Create a manifest for input with below format. \n", "# {'audio_filepath': /path/to/audio_file, 'offset': 0, 'duration':None, 'label': 'infer', 'text': '-', \n", @@ -474,107 +392,9 @@ }, { "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "name: ClusterDiarizer\n", - "num_workers: 4\n", - "sample_rate: 16000\n", - "batch_size: 64\n", - "diarizer:\n", - " manifest_filepath: ???\n", - " out_dir: ???\n", - " oracle_vad: false\n", - " collar: 0.25\n", - " ignore_overlap: true\n", - " vad:\n", - " model_path: ???\n", - " external_vad_manifest: null\n", - " parameters:\n", - " window_length_in_sec: 0.15\n", - " shift_length_in_sec: 0.01\n", - " smoothing: median\n", - " overlap: 0.5\n", - " onset: 0.1\n", - " offset: 0.1\n", - " pad_onset: 0.1\n", - " pad_offset: 0\n", - " min_duration_on: 0\n", - " min_duration_off: 0.2\n", - " filter_speech_first: true\n", - " speaker_embeddings:\n", - " model_path: titanet_large\n", - " parameters:\n", - " window_length_in_sec:\n", - " - 1.5\n", - " - 1.25\n", - " - 1.0\n", - " - 0.75\n", - " - 0.5\n", - " shift_length_in_sec:\n", - " - 0.75\n", - " - 0.625\n", - " - 0.5\n", - " - 0.375\n", - " - 0.25\n", - " multiscale_weights:\n", - " - 1\n", - " - 1\n", - " - 1\n", - " - 1\n", - " - 1\n", - " save_embeddings: true\n", - " clustering:\n", - " parameters:\n", - " oracle_num_speakers: false\n", - " max_num_speakers: 8\n", - " enhanced_count_thres: 80\n", - " max_rp_threshold: 0.25\n", - " sparse_search_volume: 30\n", - " maj_vote_spk_count: false\n", - " msdd_model:\n", - " model_path: ???\n", - " parameters:\n", - " use_speaker_model_from_ckpt: true\n", - " infer_batch_size: 25\n", - " sigmoid_threshold:\n", - " - 0.7\n", - " seq_eval_mode: false\n", - " split_infer: true\n", - " diar_window_length: 50\n", - " overlap_infer_spk_limit: 5\n", - " asr:\n", - " model_path: ???\n", - " parameters:\n", - " asr_based_vad: false\n", - " asr_based_vad_threshold: 0.05\n", - " asr_batch_size: null\n", - " lenient_overlap_WDER: true\n", - " decoder_delay_in_sec: null\n", - " word_ts_anchor_offset: null\n", - " word_ts_anchor_pos: start\n", - " fix_word_ts_with_VAD: false\n", - " colored_text: false\n", - " print_time: true\n", - " break_lines: false\n", - " ctc_decoder_parameters:\n", - " pretrained_language_model: null\n", - " beam_width: 32\n", - " alpha: 0.5\n", - " beta: 2.5\n", - " realigning_lm_parameters:\n", - " arpa_language_model: null\n", - " min_number_of_words: 3\n", - " max_number_of_words: 10\n", - " logprob_diff_threshold: 1.2\n", - "\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from omegaconf import OmegaConf\n", "MODEL_CONFIG = os.path.join(data_dir,'diar_infer_telephonic.yaml')\n", @@ -597,7 +417,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -628,93 +448,9 @@ }, { "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:23 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'external_vad_manifest': None, 'parameters': {'window_length_in_sec': 0.15, 'shift_length_in_sec': 0.01, 'smoothing': 'median', 'overlap': 0.5, 'onset': 0.1, 'offset': 0.1, 'pad_onset': 0.1, 'pad_offset': 0, 'min_duration_on': 0, 'min_duration_off': 0.2, 'filter_speech_first': True}}\n", - " Reason: Missing mandatory value: diarizer.vad.model_path\n", - " full_key: diarizer.vad.model_path\n", - " object_type=dict.\n", - "[NeMo W 2022-11-10 16:14:23 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'parameters': {'use_speaker_model_from_ckpt': True, 'infer_batch_size': 25, 'sigmoid_threshold': [0.7], 'seq_eval_mode': False, 'split_infer': True, 'diar_window_length': 50, 'overlap_infer_spk_limit': 5}}\n", - " Reason: Missing mandatory value: diarizer.msdd_model.model_path\n", - " full_key: diarizer.msdd_model.model_path\n", - " object_type=dict.\n", - "[NeMo W 2022-11-10 16:14:23 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'parameters': {'asr_based_vad': False, 'asr_based_vad_threshold': 0.05, 'asr_batch_size': None, 'lenient_overlap_WDER': True, 'decoder_delay_in_sec': None, 'word_ts_anchor_offset': None, 'word_ts_anchor_pos': 'start', 'fix_word_ts_with_VAD': False, 'colored_text': False, 'print_time': True, 'break_lines': False}, 'ctc_decoder_parameters': {'pretrained_language_model': None, 'beam_width': 32, 'alpha': 0.5, 'beta': 2.5}, 'realigning_lm_parameters': {'arpa_language_model': None, 'min_number_of_words': 3, 'max_number_of_words': 10, 'logprob_diff_threshold': 1.2}}\n", - " Reason: Missing mandatory value: diarizer.asr.model_path\n", - " full_key: diarizer.asr.model_path\n", - " object_type=dict.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:23 clustering_diarizer:156] Loading pretrained titanet_large model from NGC\n", - "[NeMo I 2022-11-10 16:14:23 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/titanet-l/492c0ab8416139171dc18c21879a9e45/titanet-l.nemo.\n", - "[NeMo I 2022-11-10 16:14:23 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/titanet-l/492c0ab8416139171dc18c21879a9e45/titanet-l.nemo\n", - "[NeMo I 2022-11-10 16:14:23 common:911] Instantiating model from pre-trained checkpoint\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:24 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath: /manifests/combined_fisher_swbd_voxceleb12_librispeech/train.json\n", - " sample_rate: 16000\n", - " labels: null\n", - " batch_size: 64\n", - " shuffle: true\n", - " time_length: 3\n", - " is_tarred: false\n", - " tarred_audio_filepaths: null\n", - " tarred_shard_strategy: scatter\n", - " augmentor:\n", - " noise:\n", - " manifest_path: /manifests/noise/rir_noise_manifest.json\n", - " prob: 0.5\n", - " min_snr_db: 0\n", - " max_snr_db: 15\n", - " speed:\n", - " prob: 0.5\n", - " sr: 16000\n", - " resample_type: kaiser_fast\n", - " min_speed_rate: 0.95\n", - " max_speed_rate: 1.05\n", - " num_workers: 15\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:14:24 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath: /manifests/combined_fisher_swbd_voxceleb12_librispeech/dev.json\n", - " sample_rate: 16000\n", - " labels: null\n", - " batch_size: 128\n", - " shuffle: false\n", - " time_length: 3\n", - " num_workers: 15\n", - " pin_memory: true\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:24 label_models:126] Setting angular: true/false in decoder is deprecated and will be removed in 1.13 version, use specific loss with _target_\n", - "[NeMo I 2022-11-10 16:14:24 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:27 save_restore_connector:243] Model EncDecSpeakerLabelModel was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/titanet-l/492c0ab8416139171dc18c21879a9e45/titanet-l.nemo.\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from nemo.collections.asr.models import ClusteringDiarizer\n", "oracle_vad_clusdiar_model = ClusteringDiarizer(cfg=config)" @@ -722,137 +458,9 @@ }, { "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:27 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:27 clustering_diarizer:281] Subsegmentation for embedding extraction: scale0, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale0.json\n", - "[NeMo I 2022-11-10 16:14:27 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:27 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:27 collections:300] # 5 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:29 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:29 clustering_diarizer:281] Subsegmentation for embedding extraction: scale1, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale1.json\n", - "[NeMo I 2022-11-10 16:14:29 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:29 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:29 collections:300] # 6 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:29 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:29 clustering_diarizer:281] Subsegmentation for embedding extraction: scale2, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale2.json\n", - "[NeMo I 2022-11-10 16:14:29 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:29 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:29 collections:300] # 7 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:30 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:30 clustering_diarizer:281] Subsegmentation for embedding extraction: scale3, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale3.json\n", - "[NeMo I 2022-11-10 16:14:30 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:30 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:30 collections:300] # 11 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:30 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:30 clustering_diarizer:281] Subsegmentation for embedding extraction: scale4, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale4.json\n", - "[NeMo I 2022-11-10 16:14:30 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:30 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:30 collections:300] # 37 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:31 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:33 nemo_logging:349] /home/taejinp/anaconda3/lib/python3.9/site-packages/pyannote/metrics/utils.py:200: UserWarning: 'uem' was approximated by the union of 'reference' and 'hypothesis' extents.\n", - " warnings.warn(\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:33 der:96] Cumulative Results for collar 0.25 sec and ignore_overlap True: \n", - " FA: 0.0000\t MISS 0.0000\t Diarization ER: 0.0000\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:33 clustering_diarizer:455] Outputs are saved in /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad directory\n" - ] - }, - { - "data": { - "text/plain": [ - "(,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0))" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# And lets diarize\n", "oracle_vad_clusdiar_model.diarize()" @@ -867,18 +475,9 @@ }, { "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "SPEAKER an4_diarize_test 1 0.299 2.471 speaker_1 \r\n", - "SPEAKER an4_diarize_test 1 3.164 1.983 speaker_0 \r\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "!cat {output_dir}/pred_rttms/an4_diarize_test.rttm" ] @@ -903,7 +502,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -913,76 +512,9 @@ }, { "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:33 msdd_models:1081] Loading pretrained diar_msdd_telephonic model from NGC\n", - "[NeMo I 2022-11-10 16:14:33 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/diar_msdd_telephonic/9c319f27168dc4980b8ba9a4ddd711bc/diar_msdd_telephonic.nemo.\n", - "[NeMo I 2022-11-10 16:14:33 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/diar_msdd_telephonic/9c319f27168dc4980b8ba9a4ddd711bc/diar_msdd_telephonic.nemo\n", - "[NeMo I 2022-11-10 16:14:33 common:911] Instantiating model from pre-trained checkpoint\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:34 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath: null\n", - " emb_dir: null\n", - " sample_rate: 16000\n", - " num_spks: 2\n", - " soft_label_thres: 0.5\n", - " labels: null\n", - " batch_size: 15\n", - " emb_batch_size: 0\n", - " shuffle: true\n", - " \n", - "[NeMo W 2022-11-10 16:14:34 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath: null\n", - " emb_dir: null\n", - " sample_rate: 16000\n", - " num_spks: 2\n", - " soft_label_thres: 0.5\n", - " labels: null\n", - " batch_size: 15\n", - " emb_batch_size: 0\n", - " shuffle: false\n", - " \n", - "[NeMo W 2022-11-10 16:14:34 modelPT:155] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).\n", - " Test config : \n", - " manifest_filepath: null\n", - " emb_dir: null\n", - " sample_rate: 16000\n", - " num_spks: 2\n", - " soft_label_thres: 0.5\n", - " labels: null\n", - " batch_size: 15\n", - " emb_batch_size: 0\n", - " shuffle: false\n", - " seq_eval_mode: false\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:34 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:34 label_models:126] Setting angular: true/false in decoder is deprecated and will be removed in 1.13 version, use specific loss with _target_\n", - "[NeMo I 2022-11-10 16:14:34 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:35 save_restore_connector:243] Model EncDecDiarLabelModel was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/diar_msdd_telephonic/9c319f27168dc4980b8ba9a4ddd711bc/diar_msdd_telephonic.nemo.\n", - "[NeMo I 2022-11-10 16:14:35 label_models:126] Setting angular: true/false in decoder is deprecated and will be removed in 1.13 version, use specific loss with _target_\n", - "[NeMo I 2022-11-10 16:14:35 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:35 speaker_utils:92] Number of files to diarize: 1\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from nemo.collections.asr.models.msdd_models import NeuralDiarizer\n", "oracle_vad_msdd_model = NeuralDiarizer(cfg=config)" @@ -1003,234 +535,9 @@ }, { "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:35 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'external_vad_manifest': None, 'parameters': {'window_length_in_sec': 0.15, 'shift_length_in_sec': 0.01, 'smoothing': 'median', 'overlap': 0.5, 'onset': 0.1, 'offset': 0.1, 'pad_onset': 0.1, 'pad_offset': 0, 'min_duration_on': 0, 'min_duration_off': 0.2, 'filter_speech_first': True}}\n", - " Reason: Missing mandatory value: diarizer.vad.model_path\n", - " full_key: diarizer.vad.model_path\n", - " object_type=dict.\n", - "[NeMo W 2022-11-10 16:14:35 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'parameters': {'asr_based_vad': False, 'asr_based_vad_threshold': 0.05, 'asr_batch_size': None, 'lenient_overlap_WDER': True, 'decoder_delay_in_sec': None, 'word_ts_anchor_offset': None, 'word_ts_anchor_pos': 'start', 'fix_word_ts_with_VAD': False, 'colored_text': False, 'print_time': True, 'break_lines': False}, 'ctc_decoder_parameters': {'pretrained_language_model': None, 'beam_width': 32, 'alpha': 0.5, 'beta': 2.5}, 'realigning_lm_parameters': {'arpa_language_model': None, 'min_number_of_words': 3, 'max_number_of_words': 10, 'logprob_diff_threshold': 1.2}}\n", - " Reason: Missing mandatory value: diarizer.asr.model_path\n", - " full_key: diarizer.asr.model_path\n", - " object_type=dict.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:35 msdd_models:855] Multiscale Weights: [1, 1, 1, 1, 1]\n", - "[NeMo I 2022-11-10 16:14:35 msdd_models:856] Clustering Parameters: {\n", - " \"oracle_num_speakers\": false,\n", - " \"max_num_speakers\": 8,\n", - " \"enhanced_count_thres\": 80,\n", - " \"max_rp_threshold\": 0.25,\n", - " \"sparse_search_volume\": 30,\n", - " \"maj_vote_spk_count\": false\n", - " }\n", - "[NeMo I 2022-11-10 16:14:35 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:35 clustering_diarizer:281] Subsegmentation for embedding extraction: scale0, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale0.json\n", - "[NeMo I 2022-11-10 16:14:35 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:35 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:35 collections:300] # 5 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:36 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:36 clustering_diarizer:281] Subsegmentation for embedding extraction: scale1, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale1.json\n", - "[NeMo I 2022-11-10 16:14:36 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:36 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:36 collections:300] # 6 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:36 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:36 clustering_diarizer:281] Subsegmentation for embedding extraction: scale2, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale2.json\n", - "[NeMo I 2022-11-10 16:14:36 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:36 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:36 collections:300] # 7 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:36 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:36 clustering_diarizer:281] Subsegmentation for embedding extraction: scale3, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale3.json\n", - "[NeMo I 2022-11-10 16:14:36 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:36 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:36 collections:300] # 11 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:37 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:37 clustering_diarizer:281] Subsegmentation for embedding extraction: scale4, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale4.json\n", - "[NeMo I 2022-11-10 16:14:37 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:37 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:37 collections:300] # 37 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:37 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:41 der:96] Cumulative Results for collar 0.25 sec and ignore_overlap True: \n", - " FA: 0.0000\t MISS 0.0000\t Diarization ER: 0.0000\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:41 clustering_diarizer:455] Outputs are saved in /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad directory\n", - "[NeMo I 2022-11-10 16:14:41 msdd_models:951] Loading embedding pickle file of scale:0 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings/subsegments_scale0_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:41 msdd_models:951] Loading embedding pickle file of scale:1 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings/subsegments_scale1_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:41 msdd_models:951] Loading embedding pickle file of scale:2 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings/subsegments_scale2_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:41 msdd_models:951] Loading embedding pickle file of scale:3 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings/subsegments_scale3_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:41 msdd_models:951] Loading embedding pickle file of scale:4 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/embeddings/subsegments_scale4_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:41 msdd_models:929] Loading cluster label file from /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad/speaker_outputs/subsegments_scale4_cluster.label\n", - "[NeMo I 2022-11-10 16:14:41 collections:611] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:41 collections:614] Total 1 session files loaded accounting to # 1 audio clips\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " 0%| | 0/1 [00:00,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0)),\n", - " (,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0)),\n", - " (,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0017961383026493304,\n", - " 0.0,\n", - " 0.0015716210148181671,\n", - " 0.00022451728783116318))],\n", - " [(,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0)),\n", - " (,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0)),\n", - " (,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0017961383026493304,\n", - " 0.0,\n", - " 0.0015716210148181671,\n", - " 0.00022451728783116318))]]" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "oracle_vad_msdd_model.diarize()" ] @@ -1244,18 +551,9 @@ }, { "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "SPEAKER an4_diarize_test 1 0.300 2.470 speaker_1 \r\n", - "SPEAKER an4_diarize_test 1 3.160 1.990 speaker_0 \r\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "!cat {output_dir}/pred_ovl_rttms/an4_diarize_test.rttm" ] @@ -1276,28 +574,9 @@ }, { "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Clustering Diarizer Result\n" - ] - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABG0AAACtCAYAAAAKyYJgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAMHUlEQVR4nO3de6xld13G4felraLlotBKqlUnClYRErSTJrWGqCkEmYa2SANqERITIdGIEuMlGh2MF4pEIIgJRjA1hWJI5RIaSxssYrFQZkqvlIsxbQJUa21MLQFF+/OPWU0vzLQz7Tms3+55nmRn77P22nu++6ysZPI5a63dMUYAAAAAmMtj1h4AAAAAgK8l2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBos2j78rZ/tsXv+aS2l7X93HL/rVv5/o9W27Qtzml7Y9u72+7eyvcGAACA7SDabIG2Rx3iqd9M8qExxtOSfGj5mW30INvihiQvTPKRr+M4AAAA8LBtVLRpe2zbi9te2/aGti9ue3Pb89petdyeuqx7fNuL2n5iuZ22LD+l7T+1/eRyf9JB/p09ba9se1zb5y6Pr2777raPW9a5ue3vtr0iyTmHGPnMJOcvj89PctZW/07WsmnbYoxx0xjjM9v4KwEAAIAtdfQjefFbz7xgb5Lf25pRkiSvecX7zt37IM8/L8kXxxh7kqTtE5Ocl+TOMcYpbX8uyRuTnJHkTUneMMa4ou13Jflgkh9I8ukkzx5j/G/b05P8UZKfuucfaHt2klcneX6So5L8TpLTxxhfavsby3O/v6z+lTHGjz7IvE8ZY9yaJGOMW9t+2xH8Lg7bC967Z2+2eDu8/6yL9z7EOpu2LQAAAGCjPKJos4Lrk7y+7XlJPjDG+Me2SXLh8vyFSd6wPD49ydOX55PkCW0fn+SJSc5v+7QkI8kx93n/H0+yO8lzxxh3tj0jydOTfHR5n29IcuV91v+bLf58m8S2AAAAgG20UdFmjPHZtifnwJEXf9z20nueuu9qy/1jkpw6xvjyfd+j7ZuTXD7GOLvtriQfvs/T/5Lke5J8X5J9SZrksjHGTx9ipC89xMj/1vaE5SibE5Lc9hDrb4wN3BYAAACwUR5RtFlOZdq7JZMchrbfnuSOMcYFbe9K8vLlqRcnee1yf8/RF5cm+aUkf7K89lljjGty4OiOLyzr3PP6e9yS5NeSvKftOUk+luQtbZ86xvjntt+c5MQxxmcPc+T3J3nZMtvLkrzv8D/t4VtOZdq7He99KBu4LQAAAGCjbNSFiJM8M8lVba9J8ttJ/mBZ/o1tP57kVUl+dVn2y0l2t72u7aeSvHJZ/rocODLkozlwnZT7WS5W+7NJ3p3kCTkQEy5se10OhIPvP4J5X5vkOW0/l+Q5y8+PFhu1Ldqe3fbzSU5NcnHbDx7BZwUAAICvu44xHnqtibW9OcnuMcbta8+y09kWAAAAsHU27UgbAAAAgB1h44+0mUHbtyQ57QGL3zTG+Ks15tnJbAsAAAAeLUQbAAAAgAk5PQoAAABgQqINAAAAwISOPpKVjzvuuLFr165tGgUAAABg59m/f//tY4zjH7j8iKLNrl27sm/fvq2bCgAAAGCHa3vLwZY7PQoAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmNARRZs7vnLHds0BALCad970jrVH2BH2XXjt2iMAwEY5wmjzH9s1BwDAat71mXeuPcKOsP9d1689AgBsFKdHAQAAAExItAEAAACY0NFH+oIXvHfPdswBAMAO8NYzL1h7BADYGI60AQAAAJiQaAMAAAAwoSM+Per9Z128HXMAAKzG6d9fP69437lrjwAA03llX3rQ5Y60AQAAAJiQaAMAAAAwIdEGAAAAYEJHFG2e9Ngnb9ccAACreclJP7P2CDvCyS955tojAMBG6RjjsFfevXv32Ldv3zaOAwAAALCztN0/xtj9wOVOjwIAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJtQxxuGv3P57klu2bxx4SMcluX3tIYD7sV/CfOyXMCf7Jsxnlv3yu8cYxz9w4RFFG1hb231jjN1rzwHcy34J87FfwpzsmzCf2fdLp0cBAAAATEi0AQAAAJiQaMOm+Yu1BwC+hv0S5mO/hDnZN2E+U++XrmkDAAAAMCFH2gAAAABMSLRhI7R9e9vb2t6w9izAAW2/s+3lbW9qe2PbV609E+x0bR/b9qq21y775WvWngk4oO1RbT/Z9gNrzwIkbW9ue33ba9ruW3ueQ3F6FBuh7bOT3JXkr8cYz1h7HiBpe0KSE8YYV7d9fJL9Sc4aY3xq5dFgx2rbJMeOMe5qe0ySK5K8aozxsZVHgx2v7auT7E7yhDHGGWvPAztd25uT7B5j3L72LA/GkTZshDHGR5LcsfYcwL3GGLeOMa5eHv9XkpuSfMe6U8HONg64a/nxmOXmL3SwsrYnJtmT5C/XngXYLKINAI9Y211JfijJx1ceBXa85RSMa5LcluSyMYb9Etb3xiS/nuTulecA7jWSXNp2f9tfWHuYQxFtAHhE2j4uyUVJfmWMcefa88BON8b4vzHGs5KcmOSUtk4rhhW1PSPJbWOM/WvPAtzPaWOMH07yk0l+cbkkx3REGwAetuWaGRcleccY42/Xnge41xjjP5N8OMnz1p0EdrzTkrxguX7Gu5L8RNsL1h0JGGN8cbm/Lcl7kpyy7kQHJ9oA8LAsFzx9W5Kbxhh/uvY8QNL2+Lbfsjz+piSnJ/n0qkPBDjfG+K0xxoljjF1JXpLk78cY5648FuxobY9dvkgjbY9N8twkU35TsWjDRmh7YZIrk5zU9vNtf37tmYCcluSlOfAXw2uW2/PXHgp2uBOSXN72uiSfyIFr2vh6YQC4v6ckuaLttUmuSnLxGOOSlWc6KF/5DQAAADAhR9oAAAAATEi0AQAAAJiQaAMAAAAwIdEGAAAAYEKiDQAAAMCERBsAYHptn3yfr5b/17ZfWB7f1fbP154PAGA7+MpvAGCjtN2b5K4xxuvXngUAYDs50gYA2Fhtf6ztB5bHe9ue3/bStje3fWHb17W9vu0lbY9Z1ju57T+03d/2g21PWPdTAAAcnGgDADyafG+SPUnOTHJBksvHGM9M8uUke5Zw8+YkLxpjnJzk7Un+cK1hAQAezNFrDwAAsIX+bozx1bbXJzkqySXL8uuT7EpyUpJnJLmsbZZ1bl1hTgCAhyTaAACPJv+dJGOMu9t+ddx78b67c+D/PU1y4xjj1LUGBAA4XE6PAgB2ks8kOb7tqUnS9pi2P7jyTAAAByXaAAA7xhjjf5K8KMl5ba9Nck2SH1l1KACAQ/CV3wAAAAATcqQNAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQv8Pmvq//WhXoMQAAAAASUVORK5CYII=\n", - "text/plain": [ - "" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "print(\"Clustering Diarizer Result\")\n", "pred_labels_clus = rttm_to_labels(f'{output_dir}/pred_rttms/an4_diarize_test.rttm')\n", @@ -1307,28 +586,9 @@ }, { "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Neural Diarizer Result\n" - ] - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABG0AAACtCAYAAAAKyYJgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAMH0lEQVR4nO3df+yudV3H8ddLoCz8UQo5iuqsNMpkszhjI5qrhs48TMB0UmG6taVbLcq1fqxWx9YPMZc6s82WNhqKzZHKYAnMMMNQPEdBVPzRGm4qRcQa4bQsPv1xLsYPz4Fz5Pv1+tzn+3hs9+77e93XfZ/3/b12bWfP73Vdd8cYAQAAAGAuj1p7AAAAAAC+mmgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCos2i7Uva/tkWv+cT2l7T9jPL/bdu5fsfrbZpW7yg7cfb3tN291a+NwAAAGwH0WYLtD3mEE/9ZpL3jDGekuQ9y89so4fYFh9L8rwk7/s6jgMAAABfs42KNm2Pb3tl25vafqztC9ve2vaitjcstycv657Y9rK2H1puZy7LT2/7T20/styfcpB/Z0/b69ue0PZZy+MPt31728cs69za9nfbXpfkBYcY+ZwkFy+PL05y7lb/TtayadtijHHLGONT2/grAQAAgC117CN58RvPuWRvkt/bmlGSJK946bsu2PsQzz87yRfGGHuSpO3jk1yU5K4xxultfy7Ja5OcneR1SV4zxriu7XcluSrJDyT5ZJJnjDH+t+1ZSf4oyU/d+w+0PS/Jy5M8J8kxSX4nyVljjC+2/Y3lud9fVv/yGONHH2LeJ40xbkuSMcZtbb/tCH4Xh+2579yzN1u8HS4/98q9D7POpm0LAAAA2CiPKNqs4OYkr257UZIrxhj/2DZJLl2evzTJa5bHZyV56vJ8kjyu7WOTPD7JxW2fkmQkOe5+7//jSXYnedYY4662Zyd5apL3L+/zDUmuv9/6f7PFn2+T2BYAAACwjTYq2owxPt32tBw48uKP215971P3X225f1SSM8YYX7r/e7R9fZJrxxjntd2V5L33e/pfknxPku9Lsi9Jk1wzxvjpQ4z0xYcZ+d/anrQcZXNSktsfZv2NsYHbAgAAADbKI4o2y6lMe7dkksPQ9tuT3DnGuKTt3Ulesjz1wiSvXO7vPfri6iS/lORPltc+fYxxYw4c3fH5ZZ17X3+vzyb5tSTvaPuCJB9I8oa2Tx5j/HPbb05y8hjj04c58uVJXrzM9uIk7zr8T3v4llOZ9m7Hex/KBm4LAAAA2CgbdSHiJKcmuaHtjUl+O8kfLMu/se0Hk1yY5FeXZb+cZHfbj7b9RJKXLctflQNHhrw/B66T8gDLxWp/NsnbkzwuB2LCpW0/mgPh4PuPYN5XJnlm288keeby89Fio7ZF2/Pafi7JGUmubHvVEXxWAAAA+LrrGOPh15pY21uT7B5j3LH2LDudbQEAAABbZ9OOtAEAAADYETb+SJsZtH1DkjMftPh1Y4y/WmOency2AAAA4Ggh2gAAAABMyOlRAAAAABMSbQAAAAAmdOyRrHzCCSeMXbt2bdMoAAAAADvP/v377xhjnPjg5UcUbXbt2pV9+/Zt3VQAAAAAO1zbzx5sudOjAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADAh0QYAAABgQqINAAAAwIREGwAAAIAJHVG0ufPLd27XHAAAq3nrLW9Ze4Sj3r5Lb1p7BADYOEcYbf5ju+YAAFjN2z711rVHOOrtf9vNa48AABvH6VEAAAAAExJtAAAAACZ07JG+4Lnv3LMdcwAAcJR74zmXrD0CAGwUR9oAAAAATEi0AQAAAJjQEZ8edfm5V27HHAAAq3H699fHS991wdojAMCUXtYXHXS5I20AAAAAJiTaAAAAAExItAEAAACY0BFFmyc8+onbNQcAwGrOP+Vn1h7hqHfa+aeuPQIAbJyOMQ575d27d499+/Zt4zgAAAAAO0vb/WOM3Q9e7vQoAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCHWMc/srtvyf57PaNAw/rhCR3rD0E8AD2S5iP/RLmZN+E+cyyX373GOPEBy88omgDa2u7b4yxe+05gPvYL2E+9kuYk30T5jP7fun0KAAAAIAJiTYAAAAAExJt2DR/sfYAwFexX8J87JcwJ/smzGfq/dI1bQAAAAAm5EgbAAAAgAmJNmyEtm9ue3vbj609C3BA2+9se23bW9p+vO2Fa88EO13bR7e9oe1Ny375irVnAg5oe0zbj7S9Yu1ZgKTtrW1vbntj231rz3MoTo9iI7R9RpK7k/z1GONpa88DJG1PSnLSGOPDbR+bZH+Sc8cYn1h5NNix2jbJ8WOMu9sel+S6JBeOMT6w8miw47V9eZLdSR43xjh77Xlgp2t7a5LdY4w71p7loTjSho0wxnhfkjvXngO4zxjjtjHGh5fH/5XkliTfse5UsLONA+5efjxuufkLHays7clJ9iT5y7VnATaLaAPAI9Z2V5IfSvLBlUeBHW85BePGJLcnuWaMYb+E9b02ya8nuWflOYD7jCRXt93f9hfWHuZQRBsAHpG2j0lyWZJfGWPctfY8sNONMf5vjPH0JCcnOb2t04phRW3PTnL7GGP/2rMAD3DmGOOHk/xkkl9cLskxHdEGgK/Zcs2My5K8ZYzxt2vPA9xnjPGfSd6b5NnrTgI73plJnrtcP+NtSX6i7SXrjgSMMb6w3N+e5B1JTl93ooMTbQD4miwXPH1TklvGGH+69jxA0vbEtt+yPP6mJGcl+eSqQ8EON8b4rTHGyWOMXUnOT/L3Y4wLVh4LdrS2xy9fpJG2xyd5VpIpv6lYtGEjtL00yfVJTmn7ubY/v/ZMQM5M8qIc+IvhjcvtOWsPBTvcSUmubfvRJB/KgWva+HphAHigJyW5ru1NSW5IcuUY490rz3RQvvIbAAAAYEKOtAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgDA9No+8X5fLf+vbT+/PL677Z+vPR8AwHbwld8AwEZpuzfJ3WOMV689CwDAdnKkDQCwsdr+WNsrlsd7217c9uq2t7Z9XttXtb257bvbHresd1rbf2i7v+1VbU9a91MAABycaAMAHE2+N8meJOckuSTJtWOMU5N8KcmeJdy8PsnzxxinJXlzkj9ca1gAgIdy7NoDAABsob8bY3yl7c1Jjkny7mX5zUl2JTklydOSXNM2yzq3rTAnAMDDEm0AgKPJfyfJGOOetl8Z9128754c+H9Pk3x8jHHGWgMCABwup0cBADvJp5Kc2PaMJGl7XNsfXHkmAICDEm0AgB1jjPE/SZ6f5KK2NyW5McmPrDoUAMAh+MpvAAAAgAk50gYAAABgQqINAAAAwIREGwAAAIAJiTYAAAAAExJtAAAAACYk2gAAAABMSLQBAAAAmJBoAwAAADCh/we8Cr/9oYmSHAAAAABJRU5ErkJggg==\n", - "text/plain": [ - "" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "print(\"Neural Diarizer Result\")\n", "pred_labels_neural = rttm_to_labels(f'{output_dir}/pred_ovl_rttms/an4_diarize_test.rttm')\n", @@ -1338,30 +598,11 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": { "scrolled": true }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Ground-truth Speaker Label\n" - ] - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABG0AAACsCAYAAADBlVHFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAKIElEQVR4nO3dT6jl91nH8c/TJNCStgidIQyT0AtdVDFibIfSNCKliK2maJUsWqirgFAEq1kE6iYTQfyDxoLgQm3B0GI31U2LkxZMlEBjnUknHduotJhSx9oxVmkHgpbkcTFHJkknTSa5N9/nzHm94DLnHs6Fz138YHjf35/q7gAAAAAwyytWDwAAAADge4k2AAAAAAOJNgAAAAADiTYAAAAAA4k2AAAAAAOJNgAAAAADiTYAAAAAA4k2AAAAAAOJNgAAAAADiTYAAAAAA21ltKmqn6+qrqofXL1lP1XVk1V1uqoeqaqHq+ptqzcBAAAAa2xltEnyviQPJnnv6iH77Inuvqm7fzTJh5L81upBAAAAwBpbF22q6tVJbklye668aPN0r03yX6tHAAAAAGtcvXrAi/CeJCe6+5+r6ltV9abufnj1qH3yqqo6neSVSY4kecfaOQAAAMAqLynanD16w/Ekd+3PlCTJ3UfPfv3483zmfUk+vHn9ic33+x5t3nrXfcezz7/bQ3e/8/jzfOaJ7r4pSarq5iT3VtWN3d37uAMAAADYAlt1pk1VvS4Xzj65sao6yVVJuqruvNLCRnd/rqoOJTmc5NzqPQAAAMDLa9vuaXNbknu7+/XdvdfdNyT5lyQ/vnjXvts8GeuqJP+5egsAAADw8qttOkGlqh5I8tvdfeJp7/1Kkh/q7g8sG7ZPqurJJGf+/9skv97dn144CQAAAFhkq6INAAAAwK7YtsujAAAAAHaCaAMAAAAwkGgDAAAAMJBoAwAAADCQaAMAAAAw0NWX8+FDhw713t7eAU0BAAAA2D2nTp16vLsPP/v9y4o2e3t7OXny5P6tAgAAANhxVfW1S73v8igAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIEuK9o8+c1vHtQOAIBlvv3796yesBP+5P6vrJ4AAFvlsqLNU6INAHAF+s49f7B6wk74yANfXT0BALaKy6MAAAAABhJtAAAAAAa6+nJ/4OzRGw5iBwAAO+Ctd923egIAbA1n2gAAAAAMJNoAAAAADHTZl0cdPfv1g9gBALCMy79fPg/d/c7VEwBgnPqNS7/vTBsAAACAgUQbAAAAgIFEGwAAAICBLivavOK66w5qBwDAMq+549dWT9gJt7/9DasnAMBWqe5+wR8+duxYnzx58gDnAAAAAOyWqjrV3cee/b7LowAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABqrufuEfrvqPJF87uDnwvA4leXz1COAZHJcwj+MSZnJswjxTjsvXd/fhZ795WdEGVquqk919bPUO4CLHJczjuISZHJswz/Tj0uVRAAAAAAOJNgAAAAADiTZsmz9ePQD4Ho5LmMdxCTM5NmGe0cele9oAAAAADORMGwAAAICBRBu2QlV9tKrOVdU/rN4CXFBVN1TV/VX1aFV9qao+uHoT7LqqemVVfb6qHtkcl3ev3gRcUFVXVdUXqupTq7cASVU9VlVnqup0VZ1cvee5uDyKrVBVP5HkfJJ7u/vG1XuApKqOJDnS3Q9X1WuSnErynu7+8uJpsLOqqpJc293nq+qaJA8m+WB3P7R4Guy8qrojybEkr+3ud6/eA7uuqh5Lcqy7H1+95ftxpg1bobv/Nsm3Vu8ALurub3T3w5vX30nyaJKja1fBbusLzm++vWbz5S90sFhVXZ/k1iR/unoLsF1EGwBesqraS/JjSf5u8RTYeZtLME4nOZfks93tuIT1PpzkziRPLd4BXNRJPlNVp6rql1aPeS6iDQAvSVW9Osknk/xqd3979R7Ydd39ZHfflOT6JG+pKpcVw0JV9e4k57r71OotwDPc0t1vSvLTSX55c0uOcUQbAF60zT0zPpnk4939F6v3ABd1938neSDJu9YugZ13S5Kf3dw/4xNJ3lFVH1s7Cejuf9v8ey7JXyZ5y9pFlybaAPCibG54+pEkj3b3Pav3AElVHa6qH9i8flWSn0zyj0tHwY7r7g919/XdvZfkvUn+urvfv3gW7LSqunbzII1U1bVJfirJyCcVizZshar68ySfS/LGqvrXqrp99SYgtyT5xVz4i+HpzdfPrB4FO+5Ikvur6otJ/j4X7mnj8cIA8EzXJXmwqh5J8vkkn+7uE4s3XZJHfgMAAAAM5EwbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAYLyqet3THi3/71V1dvP6fFX90ep9AAAHwSO/AYCtUlXHk5zv7t9bvQUA4CA50wYA2FpV9faq+tTm9fGq+rOq+kxVPVZVv1BVv1tVZ6rqRFVds/ncm6vqb6rqVFXdV1VH1v4WAACXJtoAAFeSNyS5NcnPJflYkvu7+0eSPJHk1k24+cMkt3X3m5N8NMlvrhoLAPD9XL16AADAPvqr7v5uVZ1JclWSE5v3zyTZS/LGJDcm+WxVZfOZbyzYCQDwvEQbAOBK8j9J0t1PVdV3++LN+57Khf/3VJIvdffNqwYCALxQLo8CAHbJPyU5XFU3J0lVXVNVP7x4EwDAJYk2AMDO6O7/TXJbkt+pqkeSnE7ytqWjAACeg0d+AwAAAAzkTBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIH+D9ZmsbQRn7DhAAAAAElFTkSuQmCC\n", - "text/plain": [ - "" - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "print(\"Ground-truth Speaker Label\")\n", "reference" @@ -1391,108 +632,9 @@ }, { "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "name: ClusterDiarizer\n", - "num_workers: 4\n", - "sample_rate: 16000\n", - "batch_size: 64\n", - "diarizer:\n", - " manifest_filepath: data/input_manifest.json\n", - " out_dir: /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/oracle_vad\n", - " oracle_vad: true\n", - " collar: 0.25\n", - " ignore_overlap: true\n", - " vad:\n", - " model_path: ???\n", - " external_vad_manifest: null\n", - " parameters:\n", - " window_length_in_sec: 0.15\n", - " shift_length_in_sec: 0.01\n", - " smoothing: median\n", - " overlap: 0.5\n", - " onset: 0.1\n", - " offset: 0.1\n", - " pad_onset: 0.1\n", - " pad_offset: 0\n", - " min_duration_on: 0\n", - " min_duration_off: 0.2\n", - " filter_speech_first: true\n", - " speaker_embeddings:\n", - " model_path: titanet_large\n", - " parameters:\n", - " window_length_in_sec:\n", - " - 1.5\n", - " - 1.25\n", - " - 1.0\n", - " - 0.75\n", - " - 0.5\n", - " shift_length_in_sec:\n", - " - 0.75\n", - " - 0.625\n", - " - 0.5\n", - " - 0.375\n", - " - 0.1\n", - " multiscale_weights:\n", - " - 1\n", - " - 1\n", - " - 1\n", - " - 1\n", - " - 1\n", - " save_embeddings: true\n", - " clustering:\n", - " parameters:\n", - " oracle_num_speakers: false\n", - " max_num_speakers: 8\n", - " enhanced_count_thres: 80\n", - " max_rp_threshold: 0.25\n", - " sparse_search_volume: 30\n", - " maj_vote_spk_count: false\n", - " msdd_model:\n", - " model_path: diar_msdd_telephonic\n", - " parameters:\n", - " use_speaker_model_from_ckpt: true\n", - " infer_batch_size: 25\n", - " sigmoid_threshold:\n", - " - 0.7\n", - " - 1.0\n", - " seq_eval_mode: false\n", - " split_infer: true\n", - " diar_window_length: 50\n", - " overlap_infer_spk_limit: 5\n", - " asr:\n", - " model_path: ???\n", - " parameters:\n", - " asr_based_vad: false\n", - " asr_based_vad_threshold: 0.05\n", - " asr_batch_size: null\n", - " lenient_overlap_WDER: true\n", - " decoder_delay_in_sec: null\n", - " word_ts_anchor_offset: null\n", - " word_ts_anchor_pos: start\n", - " fix_word_ts_with_VAD: false\n", - " colored_text: false\n", - " print_time: true\n", - " break_lines: false\n", - " ctc_decoder_parameters:\n", - " pretrained_language_model: null\n", - " beam_width: 32\n", - " alpha: 0.5\n", - " beta: 2.5\n", - " realigning_lm_parameters:\n", - " arpa_language_model: null\n", - " min_number_of_words: 3\n", - " max_number_of_words: 10\n", - " logprob_diff_threshold: 1.2\n", - "\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "print(OmegaConf.to_yaml(config))" ] @@ -1514,7 +656,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1537,7 +679,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -1574,162 +716,9 @@ }, { "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:42 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'parameters': {'asr_based_vad': False, 'asr_based_vad_threshold': 0.05, 'asr_batch_size': None, 'lenient_overlap_WDER': True, 'decoder_delay_in_sec': None, 'word_ts_anchor_offset': None, 'word_ts_anchor_pos': 'start', 'fix_word_ts_with_VAD': False, 'colored_text': False, 'print_time': True, 'break_lines': False}, 'ctc_decoder_parameters': {'pretrained_language_model': None, 'beam_width': 32, 'alpha': 0.5, 'beta': 2.5}, 'realigning_lm_parameters': {'arpa_language_model': None, 'min_number_of_words': 3, 'max_number_of_words': 10, 'logprob_diff_threshold': 1.2}}\n", - " Reason: Missing mandatory value: diarizer.asr.model_path\n", - " full_key: diarizer.asr.model_path\n", - " object_type=dict.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:42 clustering_diarizer:129] Loading pretrained vad_multilingual_marblenet model from NGC\n", - "[NeMo I 2022-11-10 16:14:42 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.\n", - "[NeMo I 2022-11-10 16:14:42 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo\n", - "[NeMo I 2022-11-10 16:14:42 common:911] Instantiating model from pre-trained checkpoint\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:42 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath: /manifests/ami_train_0.63.json,/manifests/freesound_background_train.json,/manifests/freesound_laughter_train.json,/manifests/fisher_2004_background.json,/manifests/fisher_2004_speech_sampled.json,/manifests/google_train_manifest.json,/manifests/icsi_all_0.63.json,/manifests/musan_freesound_train.json,/manifests/musan_music_train.json,/manifests/musan_soundbible_train.json,/manifests/mandarin_train_sample.json,/manifests/german_train_sample.json,/manifests/spanish_train_sample.json,/manifests/french_train_sample.json,/manifests/russian_train_sample.json\n", - " sample_rate: 16000\n", - " labels:\n", - " - background\n", - " - speech\n", - " batch_size: 256\n", - " shuffle: true\n", - " is_tarred: false\n", - " tarred_audio_filepaths: null\n", - " tarred_shard_strategy: scatter\n", - " augmentor:\n", - " shift:\n", - " prob: 0.5\n", - " min_shift_ms: -10.0\n", - " max_shift_ms: 10.0\n", - " white_noise:\n", - " prob: 0.5\n", - " min_level: -90\n", - " max_level: -46\n", - " norm: true\n", - " noise:\n", - " prob: 0.5\n", - " manifest_path: /manifests/noise_0_1_musan_fs.json\n", - " min_snr_db: 0\n", - " max_snr_db: 30\n", - " max_gain_db: 300.0\n", - " norm: true\n", - " gain:\n", - " prob: 0.5\n", - " min_gain_dbfs: -10.0\n", - " max_gain_dbfs: 10.0\n", - " norm: true\n", - " num_workers: 16\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:14:42 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath: /manifests/ami_dev_0.63.json,/manifests/freesound_background_dev.json,/manifests/freesound_laughter_dev.json,/manifests/ch120_moved_0.63.json,/manifests/fisher_2005_500_speech_sampled.json,/manifests/google_dev_manifest.json,/manifests/musan_music_dev.json,/manifests/mandarin_dev.json,/manifests/german_dev.json,/manifests/spanish_dev.json,/manifests/french_dev.json,/manifests/russian_dev.json\n", - " sample_rate: 16000\n", - " labels:\n", - " - background\n", - " - speech\n", - " batch_size: 256\n", - " shuffle: false\n", - " val_loss_idx: 0\n", - " num_workers: 16\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:14:42 modelPT:155] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).\n", - " Test config : \n", - " manifest_filepath: null\n", - " sample_rate: 16000\n", - " labels:\n", - " - background\n", - " - speech\n", - " batch_size: 128\n", - " shuffle: false\n", - " test_loss_idx: 0\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:42 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:42 save_restore_connector:243] Model EncDecClassificationModel was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.\n", - "[NeMo I 2022-11-10 16:14:42 clustering_diarizer:156] Loading pretrained titanet_large model from NGC\n", - "[NeMo I 2022-11-10 16:14:42 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/titanet-l/492c0ab8416139171dc18c21879a9e45/titanet-l.nemo.\n", - "[NeMo I 2022-11-10 16:14:42 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/titanet-l/492c0ab8416139171dc18c21879a9e45/titanet-l.nemo\n", - "[NeMo I 2022-11-10 16:14:42 common:911] Instantiating model from pre-trained checkpoint\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:43 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath: /manifests/combined_fisher_swbd_voxceleb12_librispeech/train.json\n", - " sample_rate: 16000\n", - " labels: null\n", - " batch_size: 64\n", - " shuffle: true\n", - " time_length: 3\n", - " is_tarred: false\n", - " tarred_audio_filepaths: null\n", - " tarred_shard_strategy: scatter\n", - " augmentor:\n", - " noise:\n", - " manifest_path: /manifests/noise/rir_noise_manifest.json\n", - " prob: 0.5\n", - " min_snr_db: 0\n", - " max_snr_db: 15\n", - " speed:\n", - " prob: 0.5\n", - " sr: 16000\n", - " resample_type: kaiser_fast\n", - " min_speed_rate: 0.95\n", - " max_speed_rate: 1.05\n", - " num_workers: 15\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:14:43 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath: /manifests/combined_fisher_swbd_voxceleb12_librispeech/dev.json\n", - " sample_rate: 16000\n", - " labels: null\n", - " batch_size: 128\n", - " shuffle: false\n", - " time_length: 3\n", - " num_workers: 15\n", - " pin_memory: true\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:43 label_models:126] Setting angular: true/false in decoder is deprecated and will be removed in 1.13 version, use specific loss with _target_\n", - "[NeMo I 2022-11-10 16:14:43 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:44 save_restore_connector:243] Model EncDecSpeakerLabelModel was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/titanet-l/492c0ab8416139171dc18c21879a9e45/titanet-l.nemo.\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from nemo.collections.asr.models import ClusteringDiarizer\n", "sd_model = ClusteringDiarizer(cfg=config)" @@ -1744,209 +733,9 @@ }, { "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:44 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:44 clustering_diarizer:303] Split long audio file to avoid CUDA memory issue\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:45 vad_utils:100] The prepared manifest file exists. Overwriting!\n", - "[NeMo I 2022-11-10 16:14:45 classification_models:247] Perform streaming frame-level VAD\n", - "[NeMo I 2022-11-10 16:14:45 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:45 collections:300] # 1 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:45 clustering_diarizer:246] Generating predictions with overlapping input segments\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:45 clustering_diarizer:258] Converting frame level prediction to speech/no-speech segment in start and end times format.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:45 clustering_diarizer:281] Subsegmentation for embedding extraction: scale0, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale0.json\n", - "[NeMo I 2022-11-10 16:14:45 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:45 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:45 collections:300] # 5 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:45 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:45 clustering_diarizer:281] Subsegmentation for embedding extraction: scale1, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale1.json\n", - "[NeMo I 2022-11-10 16:14:45 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:45 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:45 collections:300] # 7 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:281] Subsegmentation for embedding extraction: scale2, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale2.json\n", - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:46 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:46 collections:300] # 8 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:281] Subsegmentation for embedding extraction: scale3, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale3.json\n", - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:46 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:46 collections:300] # 11 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\r" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:281] Subsegmentation for embedding extraction: scale4, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale4.json\n", - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:46 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:46 collections:300] # 38 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:46 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:48 nemo_logging:349] /home/taejinp/anaconda3/lib/python3.9/site-packages/pyannote/metrics/utils.py:200: UserWarning: 'uem' was approximated by the union of 'reference' and 'hypothesis' extents.\n", - " warnings.warn(\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:48 der:96] Cumulative Results for collar 0.25 sec and ignore_overlap True: \n", - " FA: 0.0000\t MISS 0.0000\t Diarization ER: 0.0000\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:48 clustering_diarizer:455] Outputs are saved in /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs directory\n" - ] - }, - { - "data": { - "text/plain": [ - "(,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0))" - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "sd_model.diarize()" ] @@ -1971,40 +760,9 @@ }, { "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "VAD params:window_length_in_sec: 0.15\n", - "shift_length_in_sec: 0.01\n", - "smoothing: median\n", - "overlap: 0.5\n", - "onset: 0.8\n", - "offset: 0.6\n", - "pad_onset: 0.1\n", - "pad_offset: -0.05\n", - "min_duration_on: 0\n", - "min_duration_off: 0.2\n", - "filter_speech_first: 1.0\n", - "\n" - ] - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# VAD predicted time stamps\n", "# you can also use single threshold(=onset=offset) for binarization and plot here\n", @@ -2034,18 +792,9 @@ }, { "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "SPEAKER an4_diarize_test 1 0.300 2.540 speaker_1 \r\n", - "SPEAKER an4_diarize_test 1 3.180 1.970 speaker_0 \r\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "!cat {output_dir}/pred_rttms/an4_diarize_test.rttm" ] @@ -2066,76 +815,9 @@ }, { "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:50 msdd_models:1081] Loading pretrained diar_msdd_telephonic model from NGC\n", - "[NeMo I 2022-11-10 16:14:50 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/diar_msdd_telephonic/9c319f27168dc4980b8ba9a4ddd711bc/diar_msdd_telephonic.nemo.\n", - "[NeMo I 2022-11-10 16:14:50 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/diar_msdd_telephonic/9c319f27168dc4980b8ba9a4ddd711bc/diar_msdd_telephonic.nemo\n", - "[NeMo I 2022-11-10 16:14:50 common:911] Instantiating model from pre-trained checkpoint\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:50 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath: null\n", - " emb_dir: null\n", - " sample_rate: 16000\n", - " num_spks: 2\n", - " soft_label_thres: 0.5\n", - " labels: null\n", - " batch_size: 15\n", - " emb_batch_size: 0\n", - " shuffle: true\n", - " \n", - "[NeMo W 2022-11-10 16:14:50 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath: null\n", - " emb_dir: null\n", - " sample_rate: 16000\n", - " num_spks: 2\n", - " soft_label_thres: 0.5\n", - " labels: null\n", - " batch_size: 15\n", - " emb_batch_size: 0\n", - " shuffle: false\n", - " \n", - "[NeMo W 2022-11-10 16:14:50 modelPT:155] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).\n", - " Test config : \n", - " manifest_filepath: null\n", - " emb_dir: null\n", - " sample_rate: 16000\n", - " num_spks: 2\n", - " soft_label_thres: 0.5\n", - " labels: null\n", - " batch_size: 15\n", - " emb_batch_size: 0\n", - " shuffle: false\n", - " seq_eval_mode: false\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:50 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:50 label_models:126] Setting angular: true/false in decoder is deprecated and will be removed in 1.13 version, use specific loss with _target_\n", - "[NeMo I 2022-11-10 16:14:50 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:51 save_restore_connector:243] Model EncDecDiarLabelModel was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/diar_msdd_telephonic/9c319f27168dc4980b8ba9a4ddd711bc/diar_msdd_telephonic.nemo.\n", - "[NeMo I 2022-11-10 16:14:51 label_models:126] Setting angular: true/false in decoder is deprecated and will be removed in 1.13 version, use specific loss with _target_\n", - "[NeMo I 2022-11-10 16:14:51 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:52 speaker_utils:92] Number of files to diarize: 1\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "config.diarizer.msdd_model.model_path = 'diar_msdd_telephonic' # Telephonic speaker diarization model \n", "config.diarizer.msdd_model.parameters.sigmoid_threshold = [0.7, 1.0] # Evaluate with T=0.7 and T=1.0\n", @@ -2144,362 +826,9 @@ }, { "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:52 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'parameters': {'asr_based_vad': False, 'asr_based_vad_threshold': 0.05, 'asr_batch_size': None, 'lenient_overlap_WDER': True, 'decoder_delay_in_sec': None, 'word_ts_anchor_offset': None, 'word_ts_anchor_pos': 'start', 'fix_word_ts_with_VAD': False, 'colored_text': False, 'print_time': True, 'break_lines': False}, 'ctc_decoder_parameters': {'pretrained_language_model': None, 'beam_width': 32, 'alpha': 0.5, 'beta': 2.5}, 'realigning_lm_parameters': {'arpa_language_model': None, 'min_number_of_words': 3, 'max_number_of_words': 10, 'logprob_diff_threshold': 1.2}}\n", - " Reason: Missing mandatory value: diarizer.asr.model_path\n", - " full_key: diarizer.asr.model_path\n", - " object_type=dict.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:52 clustering_diarizer:129] Loading pretrained vad_multilingual_marblenet model from NGC\n", - "[NeMo I 2022-11-10 16:14:52 cloud:56] Found existing object /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.\n", - "[NeMo I 2022-11-10 16:14:52 cloud:62] Re-using file from: /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo\n", - "[NeMo I 2022-11-10 16:14:52 common:911] Instantiating model from pre-trained checkpoint\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:52 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath: /manifests/ami_train_0.63.json,/manifests/freesound_background_train.json,/manifests/freesound_laughter_train.json,/manifests/fisher_2004_background.json,/manifests/fisher_2004_speech_sampled.json,/manifests/google_train_manifest.json,/manifests/icsi_all_0.63.json,/manifests/musan_freesound_train.json,/manifests/musan_music_train.json,/manifests/musan_soundbible_train.json,/manifests/mandarin_train_sample.json,/manifests/german_train_sample.json,/manifests/spanish_train_sample.json,/manifests/french_train_sample.json,/manifests/russian_train_sample.json\n", - " sample_rate: 16000\n", - " labels:\n", - " - background\n", - " - speech\n", - " batch_size: 256\n", - " shuffle: true\n", - " is_tarred: false\n", - " tarred_audio_filepaths: null\n", - " tarred_shard_strategy: scatter\n", - " augmentor:\n", - " shift:\n", - " prob: 0.5\n", - " min_shift_ms: -10.0\n", - " max_shift_ms: 10.0\n", - " white_noise:\n", - " prob: 0.5\n", - " min_level: -90\n", - " max_level: -46\n", - " norm: true\n", - " noise:\n", - " prob: 0.5\n", - " manifest_path: /manifests/noise_0_1_musan_fs.json\n", - " min_snr_db: 0\n", - " max_snr_db: 30\n", - " max_gain_db: 300.0\n", - " norm: true\n", - " gain:\n", - " prob: 0.5\n", - " min_gain_dbfs: -10.0\n", - " max_gain_dbfs: 10.0\n", - " norm: true\n", - " num_workers: 16\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:14:52 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath: /manifests/ami_dev_0.63.json,/manifests/freesound_background_dev.json,/manifests/freesound_laughter_dev.json,/manifests/ch120_moved_0.63.json,/manifests/fisher_2005_500_speech_sampled.json,/manifests/google_dev_manifest.json,/manifests/musan_music_dev.json,/manifests/mandarin_dev.json,/manifests/german_dev.json,/manifests/spanish_dev.json,/manifests/french_dev.json,/manifests/russian_dev.json\n", - " sample_rate: 16000\n", - " labels:\n", - " - background\n", - " - speech\n", - " batch_size: 256\n", - " shuffle: false\n", - " val_loss_idx: 0\n", - " num_workers: 16\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:14:52 modelPT:155] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).\n", - " Test config : \n", - " manifest_filepath: null\n", - " sample_rate: 16000\n", - " labels:\n", - " - background\n", - " - speech\n", - " batch_size: 128\n", - " shuffle: false\n", - " test_loss_idx: 0\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:52 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:14:52 save_restore_connector:243] Model EncDecClassificationModel was successfully restored from /home/taejinp/.cache/torch/NeMo/NeMo_1.13.0rc0/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.\n", - "[NeMo I 2022-11-10 16:14:52 msdd_models:855] Multiscale Weights: [1, 1, 1, 1, 1]\n", - "[NeMo I 2022-11-10 16:14:52 msdd_models:856] Clustering Parameters: {\n", - " \"oracle_num_speakers\": false,\n", - " \"max_num_speakers\": 8,\n", - " \"enhanced_count_thres\": 80,\n", - " \"max_rp_threshold\": 0.25,\n", - " \"sparse_search_volume\": 30,\n", - " \"maj_vote_spk_count\": false\n", - " }\n", - "[NeMo I 2022-11-10 16:14:52 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:52 clustering_diarizer:303] Split long audio file to avoid CUDA memory issue\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:53 vad_utils:100] The prepared manifest file exists. Overwriting!\n", - "[NeMo I 2022-11-10 16:14:53 classification_models:247] Perform streaming frame-level VAD\n", - "[NeMo I 2022-11-10 16:14:53 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:53 collections:300] # 1 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:53 clustering_diarizer:246] Generating predictions with overlapping input segments\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:53 clustering_diarizer:258] Converting frame level prediction to speech/no-speech segment in start and end times format.\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:53 clustering_diarizer:281] Subsegmentation for embedding extraction: scale0, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale0.json\n", - "[NeMo I 2022-11-10 16:14:53 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:53 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:53 collections:300] # 5 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:53 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:53 clustering_diarizer:281] Subsegmentation for embedding extraction: scale1, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale1.json\n", - "[NeMo I 2022-11-10 16:14:53 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:53 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:53 collections:300] # 7 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " \r" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:281] Subsegmentation for embedding extraction: scale2, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale2.json\n", - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:54 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:54 collections:300] # 8 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:281] Subsegmentation for embedding extraction: scale3, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale3.json\n", - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:54 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:54 collections:300] # 11 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n", - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:281] Subsegmentation for embedding extraction: scale4, /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale4.json\n", - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:336] Extracting embeddings for Diarization\n", - "[NeMo I 2022-11-10 16:14:54 collections:296] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:54 collections:300] # 38 files loaded accounting to # 1 labels\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - " " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:54 clustering_diarizer:380] Saved embedding files to /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:55 nemo_logging:349] /home/taejinp/anaconda3/lib/python3.9/site-packages/pyannote/metrics/utils.py:200: UserWarning: 'uem' was approximated by the union of 'reference' and 'hypothesis' extents.\n", - " warnings.warn(\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:55 der:96] Cumulative Results for collar 0.25 sec and ignore_overlap True: \n", - " FA: 0.0000\t MISS 0.0000\t Diarization ER: 0.0000\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:55 clustering_diarizer:455] Outputs are saved in /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs directory\n", - "[NeMo I 2022-11-10 16:14:55 msdd_models:951] Loading embedding pickle file of scale:0 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings/subsegments_scale0_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:55 msdd_models:951] Loading embedding pickle file of scale:1 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings/subsegments_scale1_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:55 msdd_models:951] Loading embedding pickle file of scale:2 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings/subsegments_scale2_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:55 msdd_models:951] Loading embedding pickle file of scale:3 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings/subsegments_scale3_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:55 msdd_models:951] Loading embedding pickle file of scale:4 at /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/embeddings/subsegments_scale4_embeddings.pkl\n", - "[NeMo I 2022-11-10 16:14:55 msdd_models:929] Loading cluster label file from /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/speaker_outputs/subsegments_scale4_cluster.label\n", - "[NeMo I 2022-11-10 16:14:55 collections:611] Filtered duration for loading collection is 0.000000.\n", - "[NeMo I 2022-11-10 16:14:55 collections:614] Total 1 session files loaded accounting to # 1 audio clips\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 131.58it/s]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:14:55 msdd_models:1393] [Threshold: 0.7000] [use_clus_as_main=False] [diar_window=50]\n", - "[NeMo I 2022-11-10 16:14:55 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:55 der:96] Cumulative Results for collar 0.25 sec and ignore_overlap True: \n", - " FA: 0.0000\t MISS 0.0000\t Diarization ER: 0.0000\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:55 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:55 der:96] Cumulative Results for collar 0.25 sec and ignore_overlap False: \n", - " FA: 0.0000\t MISS 0.0000\t Diarization ER: 0.0000\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:55 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:55 der:96] Cumulative Results for collar 0.0 sec and ignore_overlap False: \n", - " FA: 0.0164\t MISS 0.0038\t Diarization ER: 0.0202\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:55 msdd_models:1414] \n", - " \n", - "[NeMo I 2022-11-10 16:14:55 msdd_models:1393] [Threshold: 1.0000] [use_clus_as_main=False] [diar_window=50]\n", - "[NeMo I 2022-11-10 16:14:55 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:55 der:96] Cumulative Results for collar 0.25 sec and ignore_overlap True: \n", - " FA: 0.0000\t MISS 0.0000\t Diarization ER: 0.0000\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:55 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:55 der:96] Cumulative Results for collar 0.25 sec and ignore_overlap False: \n", - " FA: 0.0000\t MISS 0.0000\t Diarization ER: 0.0000\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:55 speaker_utils:92] Number of files to diarize: 1\n", - "[NeMo I 2022-11-10 16:14:55 der:96] Cumulative Results for collar 0.0 sec and ignore_overlap False: \n", - " FA: 0.0164\t MISS 0.0038\t Diarization ER: 0.0202\t, Confusion ER:0.0000\n", - "[NeMo I 2022-11-10 16:14:55 msdd_models:1414] \n", - " \n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\n" - ] - }, - { - "data": { - "text/plain": [ - "[[(,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0)),\n", - " (,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0)),\n", - " (,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.02020655590480466, 0.0, 0.016389762011674885, 0.003816793893129774))],\n", - " [(,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0)),\n", - " (,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.0, 0.0, 0.0, 0.0)),\n", - " (,\n", - " {'an4_diarize_test': {'speaker_0': 'B', 'speaker_1': 'A'}},\n", - " (0.02020655590480466, 0.0, 0.016389762011674885, 0.003816793893129774))]]" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "system_vad_msdd_model.diarize()" ] @@ -2513,18 +842,9 @@ }, { "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "SPEAKER an4_diarize_test 1 0.300 2.540 speaker_1 \r\n", - "SPEAKER an4_diarize_test 1 3.180 1.970 speaker_0 \r\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "!cat {output_dir}/pred_ovl_rttms/an4_diarize_test.rttm" ] @@ -2545,28 +865,9 @@ }, { "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Clustering Diarizer Result (RTTM format)\n" - ] - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABG0AAACtCAYAAAAKyYJgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAMHUlEQVR4nO3df+yudV3H8ddLoCwUSyFHUZ2VRplsFmdsRHPVkJmHCZhOK023tnSrZbnWj9Xq2Poh5lJnttnSRkOhOVKYLH7MMMNUPEdBRERbw02liFgznJbFpz/OxfjhOXAOfL9en5vv47Hdu+/vdV/3fd7399q1nT2/13XdHWMEAAAAgLk8Zu0BAAAAAPhaog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNou2L2v7Z1v8nk9se3Xbzyz337qV7/9otU3b4gVtb2p7d9vdW/neAAAAsB1Emy3Q9qhDPPWbSd47xnhqkvcuP7ONHmRbfCLJ85K8/+s4DgAAADxsGxVt2h7b9vK2N7T9RNsXtr217fltr1tuT1nWPaHtJW0/stzOWJaf1vaf2n5suT/5IP/OnrYfbHt827OWxx9t+862j1vWubXt77a9NskLDjHyOUkuWB5fkOTcrf6drGXTtsUY4+Yxxi3b+CsBAACALXX0I3nxW865cG+S39uaUZIkr375pS/e+yDPPzvJF8YYe5Kk7ROSnJ/ki2OM09r+XJI3JDk7yRuTvH6McW3b70pyZZIfSPKpJM8cY/xv2zOT/FGSn7rnH2h7XpJXJXlOkqOS/E6SM8cYX2r7G8tzv7+s/pUxxo8+yLxPHmPcliRjjNvaftsR/C4O23PfvWdvtng7XHbu5XsfYp1N2xYAAACwUR5RtFnBjUle1/b8JO8ZY/xj2yS5aHn+oiSvXx6fmeRpy/NJclzbxyd5QpIL2j41yUhyzH3e/8eT7E5y1hjji23PTvK0JB9Y3ucbknzwPuv/zRZ/vk1iWwAAAMA22qhoM8b4dNtTc+DIiz9ue9U9T913teX+MUlOH2N8+b7v0fZNSa4ZY5zXdleS993n6X9J8j1Jvi/JviRNcvUY46cPMdKXHmLkf2t74nKUzYlJbn+I9TfGBm4LAAAA2CiPKNospzLt3ZJJDkPbb09y5xjjwrZ3JXnZ8tQLk7xmub/n6IurkvxSkj9ZXvuMMcb1OXB0x+eXde55/T0+m+TXkryr7QuSfCjJm9s+ZYzxz22/OclJY4xPH+bIlyV56TLbS5Ncevif9vAtpzLt3Y73PpQN3BYAAACwUTbqQsRJTklyXdvrk/x2kj9Yln9j2w8neWWSX12W/XKS3W0/3vaTSV6xLH9tDhwZ8oEcuE7K/SwXq/3ZJO9MclwOxISL2n48B8LB9x/BvK9J8qy2n0nyrOXnR4uN2hZtz2v7uSSnJ7m87ZVH8FkBAADg665jjIdea2Jtb02ye4xxx9qz7HS2BQAAAGydTTvSBgAAAGBH2PgjbWbQ9s1JznjA4jeOMf5qjXl2MtsCAACARwvRBgAAAGBCTo8CAAAAmJBoAwAAADCho49k5eOPP37s2rVrm0YBAAAA2Hn2799/xxjjhAcuP6Jos2vXruzbt2/rpgIAAADY4dp+9mDLnR4FAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAEzoiKLNnV+5c7vmAADYWO+4+e1rj7CR9l10w9ojAMDUjjDa/Md2zQEAsLEuvuUda4+wkfZffOPaIwDA1JweBQAAADAh0QYAAABgQkcf6Que++492zEHAAA70FvOuXDtEQBgWo60AQAAAJiQaAMAAAAwoSM+Peqycy/fjjkAADaW08cfvpdf+uK1RwCA1b2iLznockfaAAAAAExItAEAAACYkGgDAAAAMKEjijZPfOyTtmsOAICN9aKTf2btETbSqS86Ze0RAGBqHWMc9sq7d+8e+/bt28ZxAAAAAHaWtvvHGLsfuNzpUQAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhDrGOPyV239P8tntGwce0vFJ7lh7COB+7JcwH/slzMm+CfOZZb/87jHGCQ9ceETRBtbWdt8YY/facwD3sl/CfOyXMCf7Jsxn9v3S6VEAAAAAExJtAAAAACYk2rBp/mLtAYCvYb+E+dgvYU72TZjP1Pula9oAAAAATMiRNgAAAAATEm3YCG3f1vb2tp9YexbggLbf2faatje3vantK9eeCXa6to9te13bG5b98tVrzwQc0Paoth9r+561ZwGStre2vbHt9W33rT3PoTg9io3Q9plJ7kry12OMp689D5C0PTHJiWOMj7Z9fJL9Sc4dY3xy5dFgx2rbJMeOMe5qe0ySa5O8cozxoZVHgx2v7auS7E5y3Bjj7LXngZ2u7a1Jdo8x7lh7lgfjSBs2whjj/UnuXHsO4F5jjNvGGB9dHv9XkpuTfMe6U8HONg64a/nxmOXmL3SwsrYnJdmT5C/XngXYLKINAI9Y211JfijJh1ceBXa85RSM65PcnuTqMYb9Etb3hiS/nuTulecA7jWSXNV2f9tfWHuYQxFtAHhE2j4uySVJfmWM8cW154Gdbozxf2OMZyQ5KclpbZ1WDCtqe3aS28cY+9eeBbifM8YYP5zkJ5P84nJJjumINgA8bMs1My5J8vYxxt+uPQ9wrzHGfyZ5X5JnrzsJ7HhnJHnucv2Mi5P8RNsL1x0JGGN8Ybm/Pcm7kpy27kQHJ9oA8LAsFzx9a5Kbxxh/uvY8QNL2hLbfsjz+piRnJvnUqkPBDjfG+K0xxkljjF1JXpTk78cYL155LNjR2h67fJFG2h6b5KwkU35TsWjDRmh7UZIPJjm57efa/vzaMwE5I8lLcuAvhtcvt+esPRTscCcmuabtx5N8JAeuaePrhQHg/p6c5Nq2NyS5LsnlY4wrVp7poHzlNwAAAMCEHGkDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AgOm1fdJ9vlr+X9t+fnl8V9s/X3s+AIDt4Cu/AYCN0nZvkrvGGK9bexYAgO3kSBsAYGO1/bG271ke7217Qdur2t7a9nltX9v2xrZXtD1mWe/Utv/Qdn/bK9ueuO6nAAA4ONEGAHg0+d4ke5Kck+TCJNeMMU5J8uUke5Zw86Ykzx9jnJrkbUn+cK1hAQAezNFrDwAAsIX+bozx1bY3JjkqyRXL8huT7EpycpKnJ7m6bZZ1blthTgCAhyTaAACPJv+dJGOMu9t+ddx78b67c+D/PU1y0xjj9LUGBAA4XE6PAgB2kluSnND29CRpe0zbH1x5JgCAgxJtAIAdY4zxP0men+T8tjckuT7Jj6w6FADAIfjKbwAAAIAJOdIGAAAAYEKiDQAAAMCERBsAAACACYk2AAAAABMSbQAAAAAmJNoAAAAATEi0AQAAAJiQaAMAAAAwof8He9y//XaU+U0AAAAASUVORK5CYII=\n", - "text/plain": [ - "" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "print(\"Clustering Diarizer Result (RTTM format)\")\n", "pred_labels_clus = rttm_to_labels(f'{output_dir}/pred_rttms/an4_diarize_test.rttm')\n", @@ -2576,28 +877,9 @@ }, { "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Neural Diarizer Result (RTTM format)\n" - ] - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABG0AAACtCAYAAAAKyYJgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAMHUlEQVR4nO3df+yudV3H8ddLoCwUSyFHUZ2VRplsFmdsRHPVkJmHCZhOK023tnSrZbnWj9Xq2Poh5lJnttnSRkOhOVKYLH7MMMNUPEdBRERbw02liFgznJbFpz/OxfjhOXAOfL9en5vv47Hdu+/vdV/3fd7399q1nT2/13XdHWMEAAAAgLk8Zu0BAAAAAPhaog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNou2L2v7Z1v8nk9se3Xbzyz337qV7/9otU3b4gVtb2p7d9vdW/neAAAAsB1Emy3Q9qhDPPWbSd47xnhqkvcuP7ONHmRbfCLJ85K8/+s4DgAAADxsGxVt2h7b9vK2N7T9RNsXtr217fltr1tuT1nWPaHtJW0/stzOWJaf1vaf2n5suT/5IP/OnrYfbHt827OWxx9t+862j1vWubXt77a9NskLDjHyOUkuWB5fkOTcrf6drGXTtsUY4+Yxxi3b+CsBAACALXX0I3nxW865cG+S39uaUZIkr375pS/e+yDPPzvJF8YYe5Kk7ROSnJ/ki2OM09r+XJI3JDk7yRuTvH6McW3b70pyZZIfSPKpJM8cY/xv2zOT/FGSn7rnH2h7XpJXJXlOkqOS/E6SM8cYX2r7G8tzv7+s/pUxxo8+yLxPHmPcliRjjNvaftsR/C4O23PfvWdvtng7XHbu5XsfYp1N2xYAAACwUR5RtFnBjUle1/b8JO8ZY/xj2yS5aHn+oiSvXx6fmeRpy/NJclzbxyd5QpIL2j41yUhyzH3e/8eT7E5y1hjji23PTvK0JB9Y3ucbknzwPuv/zRZ/vk1iWwAAAMA22qhoM8b4dNtTc+DIiz9ue9U9T913teX+MUlOH2N8+b7v0fZNSa4ZY5zXdleS993n6X9J8j1Jvi/JviRNcvUY46cPMdKXHmLkf2t74nKUzYlJbn+I9TfGBm4LAAAA2CiPKNospzLt3ZJJDkPbb09y5xjjwrZ3JXnZ8tQLk7xmub/n6IurkvxSkj9ZXvuMMcb1OXB0x+eXde55/T0+m+TXkryr7QuSfCjJm9s+ZYzxz22/OclJY4xPH+bIlyV56TLbS5Ncevif9vAtpzLt3Y73PpQN3BYAAACwUTbqQsRJTklyXdvrk/x2kj9Yln9j2w8neWWSX12W/XKS3W0/3vaTSV6xLH9tDhwZ8oEcuE7K/SwXq/3ZJO9MclwOxISL2n48B8LB9x/BvK9J8qy2n0nyrOXnR4uN2hZtz2v7uSSnJ7m87ZVH8FkBAADg665jjIdea2Jtb02ye4xxx9qz7HS2BQAAAGydTTvSBgAAAGBH2PgjbWbQ9s1JznjA4jeOMf5qjXl2MtsCAACARwvRBgAAAGBCTo8CAAAAmJBoAwAAADCho49k5eOPP37s2rVrm0YBAAAA2Hn2799/xxjjhAcuP6Jos2vXruzbt2/rpgIAAADY4dp+9mDLnR4FAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAEzoiKLNnV+5c7vmAADYWO+4+e1rj7CR9l10w9ojAMDUjjDa/Md2zQEAsLEuvuUda4+wkfZffOPaIwDA1JweBQAAADAh0QYAAABgQkcf6Que++492zEHAAA70FvOuXDtEQBgWo60AQAAAJiQaAMAAAAwoSM+Peqycy/fjjkAADaW08cfvpdf+uK1RwCA1b2iLznockfaAAAAAExItAEAAACYkGgDAAAAMKEjijZPfOyTtmsOAICN9aKTf2btETbSqS86Ze0RAGBqHWMc9sq7d+8e+/bt28ZxAAAAAHaWtvvHGLsfuNzpUQAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AAAAAJiTaAAAAAExItAEAAACYkGgDAAAAMCHRBgAAAGBCog0AAADAhDrGOPyV239P8tntGwce0vFJ7lh7COB+7JcwH/slzMm+CfOZZb/87jHGCQ9ceETRBtbWdt8YY/facwD3sl/CfOyXMCf7Jsxn9v3S6VEAAAAAExJtAAAAACYk2rBp/mLtAYCvYb+E+dgvYU72TZjP1Pula9oAAAAATMiRNgAAAAATEm3YCG3f1vb2tp9YexbggLbf2faatje3vantK9eeCXa6to9te13bG5b98tVrzwQc0Paoth9r+561ZwGStre2vbHt9W33rT3PoTg9io3Q9plJ7kry12OMp689D5C0PTHJiWOMj7Z9fJL9Sc4dY3xy5dFgx2rbJMeOMe5qe0ySa5O8cozxoZVHgx2v7auS7E5y3Bjj7LXngZ2u7a1Jdo8x7lh7lgfjSBs2whjj/UnuXHsO4F5jjNvGGB9dHv9XkpuTfMe6U8HONg64a/nxmOXmL3SwsrYnJdmT5C/XngXYLKINAI9Y211JfijJh1ceBXa85RSM65PcnuTqMYb9Etb3hiS/nuTulecA7jWSXNV2f9tfWHuYQxFtAHhE2j4uySVJfmWM8cW154Gdbozxf2OMZyQ5KclpbZ1WDCtqe3aS28cY+9eeBbifM8YYP5zkJ5P84nJJjumINgA8bMs1My5J8vYxxt+uPQ9wrzHGfyZ5X5JnrzsJ7HhnJHnucv2Mi5P8RNsL1x0JGGN8Ybm/Pcm7kpy27kQHJ9oA8LAsFzx9a5Kbxxh/uvY8QNL2hLbfsjz+piRnJvnUqkPBDjfG+K0xxkljjF1JXpTk78cYL155LNjR2h67fJFG2h6b5KwkU35TsWjDRmh7UZIPJjm57efa/vzaMwE5I8lLcuAvhtcvt+esPRTscCcmuabtx5N8JAeuaePrhQHg/p6c5Nq2NyS5LsnlY4wrVp7poHzlNwAAAMCEHGkDAAAAMCHRBgAAAGBCog0AAADAhEQbAAAAgAmJNgAAAAATEm0AgOm1fdJ9vlr+X9t+fnl8V9s/X3s+AIDt4Cu/AYCN0nZvkrvGGK9bexYAgO3kSBsAYGO1/bG271ke7217Qdur2t7a9nltX9v2xrZXtD1mWe/Utv/Qdn/bK9ueuO6nAAA4ONEGAHg0+d4ke5Kck+TCJNeMMU5J8uUke5Zw86Ykzx9jnJrkbUn+cK1hAQAezNFrDwAAsIX+bozx1bY3JjkqyRXL8huT7EpycpKnJ7m6bZZ1blthTgCAhyTaAACPJv+dJGOMu9t+ddx78b67c+D/PU1y0xjj9LUGBAA4XE6PAgB2kluSnND29CRpe0zbH1x5JgCAgxJtAIAdY4zxP0men+T8tjckuT7Jj6w6FADAIfjKbwAAAIAJOdIGAAAAYEKiDQAAAMCERBsAAACACYk2AAAAABMSbQAAAAAmJNoAAAAATEi0AQAAAJiQaAMAAAAwof8He9y//XaU+U0AAAAASUVORK5CYII=\n", - "text/plain": [ - "" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "print(\"Neural Diarizer Result (RTTM format)\")\n", "pred_labels_neural = rttm_to_labels(f'{output_dir}/pred_ovl_rttms/an4_diarize_test.rttm')\n", @@ -2607,28 +889,9 @@ }, { "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Ground-truth Speaker Label (RTTM format)\n" - ] - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABG0AAACsCAYAAADBlVHFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAKIElEQVR4nO3dT6jl91nH8c/TJNCStgidIQyT0AtdVDFibIfSNCKliK2maJUsWqirgFAEq1kE6iYTQfyDxoLgQm3B0GI31U2LkxZMlEBjnUknHduotJhSx9oxVmkHgpbkcTFHJkknTSa5N9/nzHm94DLnHs6Fz138YHjf35/q7gAAAAAwyytWDwAAAADge4k2AAAAAAOJNgAAAAADiTYAAAAAA4k2AAAAAAOJNgAAAAADiTYAAAAAA4k2AAAAAAOJNgAAAAADiTYAAAAAA21ltKmqn6+qrqofXL1lP1XVk1V1uqoeqaqHq+ptqzcBAAAAa2xltEnyviQPJnnv6iH77Inuvqm7fzTJh5L81upBAAAAwBpbF22q6tVJbklye668aPN0r03yX6tHAAAAAGtcvXrAi/CeJCe6+5+r6ltV9abufnj1qH3yqqo6neSVSY4kecfaOQAAAMAqLynanD16w/Ekd+3PlCTJ3UfPfv3483zmfUk+vHn9ic33+x5t3nrXfcezz7/bQ3e/8/jzfOaJ7r4pSarq5iT3VtWN3d37uAMAAADYAlt1pk1VvS4Xzj65sao6yVVJuqruvNLCRnd/rqoOJTmc5NzqPQAAAMDLa9vuaXNbknu7+/XdvdfdNyT5lyQ/vnjXvts8GeuqJP+5egsAAADw8qttOkGlqh5I8tvdfeJp7/1Kkh/q7g8sG7ZPqurJJGf+/9skv97dn144CQAAAFhkq6INAAAAwK7YtsujAAAAAHaCaAMAAAAwkGgDAAAAMJBoAwAAADCQaAMAAAAw0NWX8+FDhw713t7eAU0BAAAA2D2nTp16vLsPP/v9y4o2e3t7OXny5P6tAgAAANhxVfW1S73v8igAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIEuK9o8+c1vHtQOAIBlvv3796yesBP+5P6vrJ4AAFvlsqLNU6INAHAF+s49f7B6wk74yANfXT0BALaKy6MAAAAABhJtAAAAAAa6+nJ/4OzRGw5iBwAAO+Ctd923egIAbA1n2gAAAAAMJNoAAAAADHTZl0cdPfv1g9gBALCMy79fPg/d/c7VEwBgnPqNS7/vTBsAAACAgUQbAAAAgIFEGwAAAICBLivavOK66w5qBwDAMq+549dWT9gJt7/9DasnAMBWqe5+wR8+duxYnzx58gDnAAAAAOyWqjrV3cee/b7LowAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABhJtAAAAAAYSbQAAAAAGEm0AAAAABqrufuEfrvqPJF87uDnwvA4leXz1COAZHJcwj+MSZnJswjxTjsvXd/fhZ795WdEGVquqk919bPUO4CLHJczjuISZHJswz/Tj0uVRAAAAAAOJNgAAAAADiTZsmz9ePQD4Ho5LmMdxCTM5NmGe0cele9oAAAAADORMGwAAAICBRBu2QlV9tKrOVdU/rN4CXFBVN1TV/VX1aFV9qao+uHoT7LqqemVVfb6qHtkcl3ev3gRcUFVXVdUXqupTq7cASVU9VlVnqup0VZ1cvee5uDyKrVBVP5HkfJJ7u/vG1XuApKqOJDnS3Q9X1WuSnErynu7+8uJpsLOqqpJc293nq+qaJA8m+WB3P7R4Guy8qrojybEkr+3ud6/eA7uuqh5Lcqy7H1+95ftxpg1bobv/Nsm3Vu8ALurub3T3w5vX30nyaJKja1fBbusLzm++vWbz5S90sFhVXZ/k1iR/unoLsF1EGwBesqraS/JjSf5u8RTYeZtLME4nOZfks93tuIT1PpzkziRPLd4BXNRJPlNVp6rql1aPeS6iDQAvSVW9Osknk/xqd3979R7Ydd39ZHfflOT6JG+pKpcVw0JV9e4k57r71OotwDPc0t1vSvLTSX55c0uOcUQbAF60zT0zPpnk4939F6v3ABd1938neSDJu9YugZ13S5Kf3dw/4xNJ3lFVH1s7Cejuf9v8ey7JXyZ5y9pFlybaAPCibG54+pEkj3b3Pav3AElVHa6qH9i8flWSn0zyj0tHwY7r7g919/XdvZfkvUn+urvfv3gW7LSqunbzII1U1bVJfirJyCcVizZshar68ySfS/LGqvrXqrp99SYgtyT5xVz4i+HpzdfPrB4FO+5Ikvur6otJ/j4X7mnj8cIA8EzXJXmwqh5J8vkkn+7uE4s3XZJHfgMAAAAM5EwbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAYLyqet3THi3/71V1dvP6fFX90ep9AAAHwSO/AYCtUlXHk5zv7t9bvQUA4CA50wYA2FpV9faq+tTm9fGq+rOq+kxVPVZVv1BVv1tVZ6rqRFVds/ncm6vqb6rqVFXdV1VH1v4WAACXJtoAAFeSNyS5NcnPJflYkvu7+0eSPJHk1k24+cMkt3X3m5N8NMlvrhoLAPD9XL16AADAPvqr7v5uVZ1JclWSE5v3zyTZS/LGJDcm+WxVZfOZbyzYCQDwvEQbAOBK8j9J0t1PVdV3++LN+57Khf/3VJIvdffNqwYCALxQLo8CAHbJPyU5XFU3J0lVXVNVP7x4EwDAJYk2AMDO6O7/TXJbkt+pqkeSnE7ytqWjAACeg0d+AwAAAAzkTBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIFEGwAAAICBRBsAAACAgUQbAAAAgIH+D9ZmsbQRn7DhAAAAAElFTkSuQmCC\n", - "text/plain": [ - "" - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "print(\"Ground-truth Speaker Label (RTTM format)\")\n", "reference" @@ -2650,31 +913,9 @@ }, { "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:14:55 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'external_vad_manifest': None, 'parameters': {'window_length_in_sec': 0.15, 'shift_length_in_sec': 0.01, 'smoothing': 'median', 'overlap': 0.5, 'onset': 0.1, 'offset': 0.1, 'pad_onset': 0.1, 'pad_offset': 0, 'min_duration_on': 0, 'min_duration_off': 0.2, 'filter_speech_first': True}}\n", - " Reason: Missing mandatory value: diarizer.vad.model_path\n", - " full_key: diarizer.vad.model_path\n", - " object_type=dict.\n", - "[NeMo W 2022-11-10 16:14:55 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'parameters': {'use_speaker_model_from_ckpt': True, 'infer_batch_size': 25, 'sigmoid_threshold': [0.7], 'seq_eval_mode': False, 'split_infer': True, 'diar_window_length': 50, 'overlap_infer_spk_limit': 5}}\n", - " Reason: Missing mandatory value: diarizer.msdd_model.model_path\n", - " full_key: diarizer.msdd_model.model_path\n", - " object_type=dict.\n", - "[NeMo W 2022-11-10 16:14:55 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'parameters': {'asr_based_vad': False, 'asr_based_vad_threshold': 0.05, 'asr_batch_size': None, 'lenient_overlap_WDER': True, 'decoder_delay_in_sec': None, 'word_ts_anchor_offset': None, 'word_ts_anchor_pos': 'start', 'fix_word_ts_with_VAD': False, 'colored_text': False, 'print_time': True, 'break_lines': False}, 'ctc_decoder_parameters': {'pretrained_language_model': None, 'beam_width': 32, 'alpha': 0.5, 'beta': 2.5}, 'realigning_lm_parameters': {'arpa_language_model': None, 'min_number_of_words': 3, 'max_number_of_words': 10, 'logprob_diff_threshold': 1.2}}\n", - " Reason: Missing mandatory value: diarizer.asr.model_path\n", - " full_key: diarizer.asr.model_path\n", - " object_type=dict.\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "oracle_vad_clusdiar_model.save_to(os.path.join(output_dir,'clustering_diarizer.nemo'))" ] @@ -2688,98 +929,14 @@ }, { "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:15:02 clustering_diarizer:523] Model ClusteringDiarizer does not contain a VAD model. A VAD model or manifest file withspeech segments need for diarization with this model\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[NeMo W 2022-11-10 16:15:02 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'external_vad_manifest': None, 'parameters': {'window_length_in_sec': 0.15, 'shift_length_in_sec': 0.01, 'smoothing': 'median', 'overlap': 0.5, 'onset': 0.1, 'offset': 0.1, 'pad_onset': 0.1, 'pad_offset': 0, 'min_duration_on': 0, 'min_duration_off': 0.2, 'filter_speech_first': True}}\n", - " Reason: Missing mandatory value: diarizer.vad.model_path\n", - " full_key: diarizer.vad.model_path\n", - " object_type=dict.\n", - "[NeMo W 2022-11-10 16:15:02 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'parameters': {'use_speaker_model_from_ckpt': True, 'infer_batch_size': 25, 'sigmoid_threshold': [0.7], 'seq_eval_mode': False, 'split_infer': True, 'diar_window_length': 50, 'overlap_infer_spk_limit': 5}}\n", - " Reason: Missing mandatory value: diarizer.msdd_model.model_path\n", - " full_key: diarizer.msdd_model.model_path\n", - " object_type=dict.\n", - "[NeMo W 2022-11-10 16:15:02 model_utils:422] Skipped conversion for config/subconfig:\n", - " {'model_path': '???', 'parameters': {'asr_based_vad': False, 'asr_based_vad_threshold': 0.05, 'asr_batch_size': None, 'lenient_overlap_WDER': True, 'decoder_delay_in_sec': None, 'word_ts_anchor_offset': None, 'word_ts_anchor_pos': 'start', 'fix_word_ts_with_VAD': False, 'colored_text': False, 'print_time': True, 'break_lines': False}, 'ctc_decoder_parameters': {'pretrained_language_model': None, 'beam_width': 32, 'alpha': 0.5, 'beta': 2.5}, 'realigning_lm_parameters': {'arpa_language_model': None, 'min_number_of_words': 3, 'max_number_of_words': 10, 'logprob_diff_threshold': 1.2}}\n", - " Reason: Missing mandatory value: diarizer.asr.model_path\n", - " full_key: diarizer.asr.model_path\n", - " object_type=dict.\n", - "[NeMo W 2022-11-10 16:15:02 modelPT:142] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.\n", - " Train config : \n", - " manifest_filepath: /manifests/combined_fisher_swbd_voxceleb12_librispeech/train.json\n", - " sample_rate: 16000\n", - " labels: null\n", - " batch_size: 64\n", - " shuffle: true\n", - " time_length: 3\n", - " is_tarred: false\n", - " tarred_audio_filepaths: null\n", - " tarred_shard_strategy: scatter\n", - " augmentor:\n", - " noise:\n", - " manifest_path: /manifests/noise/rir_noise_manifest.json\n", - " prob: 0.5\n", - " min_snr_db: 0\n", - " max_snr_db: 15\n", - " speed:\n", - " prob: 0.5\n", - " sr: 16000\n", - " resample_type: kaiser_fast\n", - " min_speed_rate: 0.95\n", - " max_speed_rate: 1.05\n", - " num_workers: 15\n", - " pin_memory: true\n", - " \n", - "[NeMo W 2022-11-10 16:15:02 modelPT:149] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). \n", - " Validation config : \n", - " manifest_filepath: /manifests/combined_fisher_swbd_voxceleb12_librispeech/dev.json\n", - " sample_rate: 16000\n", - " labels: null\n", - " batch_size: 128\n", - " shuffle: false\n", - " time_length: 3\n", - " num_workers: 15\n", - " pin_memory: true\n", - " \n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[NeMo I 2022-11-10 16:15:02 label_models:126] Setting angular: true/false in decoder is deprecated and will be removed in 1.13 version, use specific loss with _target_\n", - "[NeMo I 2022-11-10 16:15:02 features:225] PADDING: 16\n", - "[NeMo I 2022-11-10 16:15:03 save_restore_connector:243] Model EncDecSpeakerLabelModel was successfully restored from /tmp/tmpbimcm66m/speaker_model.nemo.\n", - "[NeMo I 2022-11-10 16:15:03 clustering_diarizer:146] Speaker Model restored locally from /tmp/tmpbimcm66m/speaker_model.nemo\n", - "[NeMo I 2022-11-10 16:15:03 clustering_diarizer:533] Model ClusteringDiarizer was successfully restored from /home/taejinp/projects/add_cpwer/NeMo/tutorials/speaker_tasks/outputs/clustering_diarizer.nemo.\n" - ] - } - ], + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "del oracle_vad_clusdiar_model\n", "import nemo.collections.asr as nemo_asr\n", "restored_model = nemo_asr.models.ClusteringDiarizer.restore_from(os.path.join(output_dir,'clustering_diarizer.nemo'))" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": {