forked from pyannote/pyannote-audio
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
community: add tutorial for offline use of pyannote/speaker-diarizati…
…on-3.1
- Loading branch information
1 parent
cf876c2
commit 1a397b0
Showing
2 changed files
with
174 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
172 changes: 172 additions & 0 deletions
172
tutorials/community/offline_usage_speaker_diarization.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Offline Speaker Diarization (speaker-diarization-3.1)\n", | ||
"\n", | ||
"This notebooks gives a short introduction how to use the [speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) pipeline with local models.\n", | ||
"\n", | ||
"In order to use local models, you first need to download them from huggingface and place them in a local folder. \n", | ||
"Then you need to create a local config file, similar to the one in HF, but with local model paths.\n", | ||
"\n", | ||
"❗ **Naming of the model files is REALLY important! See end of notebook for details.** ❗\n", | ||
"\n", | ||
"## Get the models\n", | ||
"\n", | ||
"1. Install the `pyannote-audio` package: `!pip install pyannote.audio`\n", | ||
"2. Create a huggingface account https://huggingface.co/join\n", | ||
"3. Accept [pyannote/segmentation-3.0](https://hf.co/pyannote/segmentation-3.0) user conditions\n", | ||
"4. Create a local folder `models`, place all downloaded files there\n", | ||
" 1. [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/blob/main/pytorch_model.bin), to be placed in `models/pyannote_model_wespeaker-voxceleb-resnet34-LM.bin`\n", | ||
" 2. [segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0/blob/main/pytorch_model.bin), to be placed in `models/pyannote_model_segmentation-3.0.bin`\n", | ||
"\n", | ||
"Running `ls models` should show the following files:\n", | ||
"```\n", | ||
"pyannote_model_segmentation-3.0.bin (5.7M)\n", | ||
"pyannote_model_wespeaker-voxceleb-resnet34-LM.bin (26MB)\n", | ||
"```\n", | ||
"\n", | ||
"❗ **make sure the 'wespeaker-voxceleb-resnet34-LM' model is named 'pyannote_model_wespeaker-voxceleb-resnet34-LM.bin'** ❗" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Config for local models\n", | ||
"\n", | ||
"Create a local config, similar to the one in HF: [speaker-diarization-3.1/blob/main/config.yaml](https://huggingface.co/pyannote/speaker-diarization-3.1/blob/main/config.yaml), but with local model paths\n", | ||
"\n", | ||
"Contents of `models/pyannote_diarization_config.yaml`:\n", | ||
"\n", | ||
"```yaml\n", | ||
"version: 3.1.0\n", | ||
"\n", | ||
"pipeline:\n", | ||
" name: pyannote.audio.pipelines.SpeakerDiarization\n", | ||
" params:\n", | ||
" clustering: AgglomerativeClustering\n", | ||
" # embedding: pyannote/wespeaker-voxceleb-resnet34-LM # if you want to use the HF model\n", | ||
" embedding: models/pyannote_model_wespeaker-voxceleb-resnet34-LM.bin # if you want to use the local model\n", | ||
" embedding_batch_size: 32\n", | ||
" embedding_exclude_overlap: true\n", | ||
" # segmentation: pyannote/segmentation-3.0 # if you want to use the HF model\n", | ||
" segmentation: models/pyannote_model_segmentation-3.0.bin # if you want to use the local model\n", | ||
" segmentation_batch_size: 32\n", | ||
"\n", | ||
"params:\n", | ||
" clustering:\n", | ||
" method: centroid\n", | ||
" min_cluster_size: 12\n", | ||
" threshold: 0.7045654963945799\n", | ||
" segmentation:\n", | ||
" min_duration_off: 0.0\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Loading the local pipeline\n", | ||
"\n", | ||
"**Hint**: The paths in the config are relative to the current working directory, not relative to the config file.\n", | ||
"If you want to start your notebook/script from a different directory, you can use `os.chdir` temporarily, to 'emulate' config-relative paths.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from pathlib import Path\n", | ||
"from pyannote.audio import Pipeline\n", | ||
"\n", | ||
"def load_pipeline_from_pretrained(path_to_config: str | Path) -> Pipeline:\n", | ||
" path_to_config = Path(path_to_config)\n", | ||
"\n", | ||
" print(f\"Loading pyannote pipeline from {path_to_config}...\")\n", | ||
" # the paths in the config are relative to the current working directory\n", | ||
" # so we need to change the working directory to the model path\n", | ||
" # and then change it back\n", | ||
"\n", | ||
" cwd = Path.cwd().resolve() # store current working directory\n", | ||
"\n", | ||
" # first .parent is the folder of the config, second .parent is the folder containing the 'models' folder\n", | ||
" cd_to = path_to_config.parent.parent.resolve()\n", | ||
"\n", | ||
" print(f\"Changing working directory to {cd_to}\")\n", | ||
" os.chdir(cd_to)\n", | ||
"\n", | ||
" pipeline = Pipeline.from_pretrained(path_to_config)\n", | ||
"\n", | ||
" print(f\"Changing working directory back to {cwd}\")\n", | ||
" os.chdir(cwd)\n", | ||
"\n", | ||
" return pipeline\n", | ||
"\n", | ||
"PATH_TO_CONFIG = \"path/to/your/pyannote_diarization_config.yaml\"\n", | ||
"pipeline = load_pipeline_from_pretrained(PATH_TO_CONFIG)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Notes on file naming (pyannote-audio 3.1.1)\n", | ||
"\n", | ||
"Pyannote uses some internal logic to determine the model type.\n", | ||
"\n", | ||
"The funtion `def PretrainedSpeakerEmbedding(...` in (speaker_verification.py)[https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/speaker_verification.py#L712] uses the the file path of the model to infer the model type.\n", | ||
"\n", | ||
"```python\n", | ||
"def PretrainedSpeakerEmbedding(\n", | ||
" embedding: PipelineModel,\n", | ||
" device: torch.device = None,\n", | ||
" use_auth_token: Union[Text, None] = None,\n", | ||
"):\n", | ||
" #...\n", | ||
" if isinstance(embedding, str) and \"pyannote\" in embedding:\n", | ||
" return PyannoteAudioPretrainedSpeakerEmbedding(\n", | ||
" embedding, device=device, use_auth_token=use_auth_token\n", | ||
" )\n", | ||
"\n", | ||
" elif isinstance(embedding, str) and \"speechbrain\" in embedding:\n", | ||
" return SpeechBrainPretrainedSpeakerEmbedding(\n", | ||
" embedding, device=device, use_auth_token=use_auth_token\n", | ||
" )\n", | ||
"\n", | ||
" elif isinstance(embedding, str) and \"nvidia\" in embedding:\n", | ||
" return NeMoPretrainedSpeakerEmbedding(embedding, device=device)\n", | ||
"\n", | ||
" elif isinstance(embedding, str) and \"wespeaker\" in embedding:\n", | ||
" return ONNXWeSpeakerPretrainedSpeakerEmbedding(embedding, device=device) # <-- this is called, but the wespeaker-voxceleb-resnet34-LM is not an ONNX model\n", | ||
"\n", | ||
" else:\n", | ||
" # fallback to pyannote in case we are loading a local model\n", | ||
" return PyannoteAudioPretrainedSpeakerEmbedding(\n", | ||
" embedding, device=device, use_auth_token=use_auth_token\n", | ||
" )\n", | ||
"```\n", | ||
"\n", | ||
"The [wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/blob/main/pytorch_model.bin) model is not an ONNX model, but a `PyannoteAudioPretrainedSpeakerEmbedding`. So if `wespeaker` is in the file name, the code will infer the model type incorrectly. If `pyannote` is somewhere in the file name, the model type will be inferred correctly, as the first if statement will be true..." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"name": "python", | ||
"version": "3.11.7" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |