Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speaker diarization errors #33

Open
WilliamVenner opened this issue Oct 21, 2024 · 11 comments · Fixed by k2-fsa/sherpa-onnx#1461
Open

Speaker diarization errors #33

WilliamVenner opened this issue Oct 21, 2024 · 11 comments · Fixed by k2-fsa/sherpa-onnx#1461

Comments

@WilliamVenner
Copy link
Contributor

WilliamVenner commented Oct 21, 2024

I am having a lot of trouble with speaker diarization across lots of different platforms and models.

[target.'cfg(any(windows, target_os = "linux"))'.dependencies]
sherpa-rs = { version = "0.5.1-beta.0", default-features = false, features = [
    "static",
    "cuda",
] }

[target.'cfg(target_os = "macos")'.dependencies]
sherpa-rs = { version = "0.5.1-beta.0", default-features = false, features = [
    "static",
    "directml",
] }

Embeddings model: nemo_en_titanet_large.onnx

Segmentation model: sherpa-onnx-pyannote_segmentation-3.0.onnx and models/sherpa-onnx-reverb-diarization-v1.onnx

GPU: RTX 3080

OS: Ubuntu 24.04 x86-64

2024-10-21 17:06:25.244693632 [E:onnxruntime:, cuda_call.cc:116 CudaCall] CUDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=VENNERPC ; file=/onnxruntime_src/onnxruntime/contrib_ops/cuda/fused_conv.cc ; line=86 ; expr=cudnnAddTensor(cudnnHandle, &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data);
2024-10-21 17:06:25.244736163 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'/encoder/encoder/encoder.1/res.0.0/conv/Conv' Status Message: CUDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=VENNERPC ; file=/onnxruntime_src/onnxruntime/contrib_ops/cuda/fused_conv.cc ; line=86 ; expr=cudnnAddTensor(cudnnHandle, &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data);
fatal runtime error: Rust cannot catch foreign exceptions

Embeddings model: nemo_en_titanet_large.onnx

Segmentation model: sherpa-onnx-reverb-diarization-v1.onnx

GPU: RTX 3080

OS: Windows 11 x86-64

C:\BillyEnterprises\GitHub\patientprism-speaker-recognition\target\debug\build\sherpa-rs-sys-6375a45fd70e41d7\out\sherpa-onnx\sherpa-onnx/csrc/offline-speaker-diarization-pyannote-impl.h:ComputeEmbeddings:456 This segment is too short, which should not happen since we have already filtered short segments

Embeddings model: nemo_en_titanet_large.onnx

Segmentation model: sherpa-onnx-pyannote_segmentation-3.0.onnx

OS: macOS M2 Max Sequoia 15.0.1 Arm64

This was the best information I could get regarding this foreign exception, which Rust wouldn't catch or display:

(lldb) bt
* thread #2, name = 'models::sherpa::test_diarization', stop reason = breakpoint 1.1
  * frame #0: 0x000000018f643c28 libc++abi.dylib`__cxa_throw
    frame #1: 0x0000000246d13304 MLAssetIO`MPL::detail::ModelPackageImpl::ModelPackageImpl(std::__1::__fs::filesystem::path const&, bool, bool) + 1564
    frame #2: 0x0000000246d163a0 MLAssetIO`MPL::detail::ModelPackageImpl::isValid(std::__1::__fs::filesystem::path const&) + 36
    frame #3: 0x0000000246d164e8 MLAssetIO`MPL::ModelPackage::isValid(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 44
    frame #4: 0x0000000246d374d8 MLAssetIO`-[MIOModel initWithContentsOfURL:error:] + 180
    frame #5: 0x000000019936d138 CoreML`+[MLCompiler compileModelAtURL:toURL:options:error:] + 536
    frame #6: 0x000000019940e114 CoreML`+[MLModel(MLModelCompilation) compileModelWithoutAutoreleaseAtURL:options:error:] + 212
    frame #7: 0x000000019940df40 CoreML`+[MLModel(MLModelCompilation) _compileModelAtURL:options:error:] + 192
    frame #8: 0x000000019940de40 CoreML`+[MLModel(MLModelCompilation) compileModelAtURL:error:] + 88
    frame #9: 0x00000001047557ec libonnxruntime.1.17.1.dylib`-[CoreMLExecution loadModel] + 108
    frame #10: 0x0000000104758c50 libonnxruntime.1.17.1.dylib`onnxruntime::coreml::Model::LoadModel() + 80
    frame #11: 0x00000001047268ac libonnxruntime.1.17.1.dylib`onnxruntime::CoreMLExecutionProvider::Compile(std::__1::vector<onnxruntime::IExecutionProvider::FusedNodeAndGraph, std::__1::allocator<onnxruntime::IExecutionProvider::FusedNodeAndGraph>> const&, std::__1::vector<onnxruntime::NodeComputeInfo, std::__1::allocator<onnxruntime::NodeComputeInfo>>&) + 184
    frame #12: 0x0000000104e6970c libonnxruntime.1.17.1.dylib`onnxruntime::PartitionOnnxFormatModelImpl(onnxruntime::Graph&, onnxruntime::FuncManager&, onnxruntime::KernelRegistryManager&, onnxruntime::KernelRegistry&, onnxruntime::IExecutionProvider&, onnxruntime::GraphPartitioner::Mode, int&, std::__1::function<onnxruntime::common::Status (onnxruntime::Graph&, bool&, onnxruntime::IExecutionProvider&, std::__1::function<void (onnxruntime::Graph const&)> const&)> const&, std::__1::function<void (onnxruntime::Graph const&)> const&) + 2776
    frame #13: 0x0000000104e67018 libonnxruntime.1.17.1.dylib`onnxruntime::GraphPartitioner::Partition(onnxruntime::Graph&, onnxruntime::FuncManager&, std::__1::function<onnxruntime::common::Status (onnxruntime::Graph&, bool&, onnxruntime::IExecutionProvider&, std::__1::function<void (onnxruntime::Graph const&)> const&)> const&, onnxruntime::ConfigOptions const&, onnxruntime::logging::Logger const&, onnxruntime::GraphPartitioner::Mode, std::__1::function<void (onnxruntime::Graph const&)> const&) const + 244
    frame #14: 0x00000001046be560 libonnxruntime.1.17.1.dylib`onnxruntime::InferenceSession::TransformGraph(onnxruntime::Graph&, bool) + 1308
    frame #15: 0x00000001046c1180 libonnxruntime.1.17.1.dylib`onnxruntime::InferenceSession::Initialize() + 4348
    frame #16: 0x00000001046fb278 libonnxruntime.1.17.1.dylib`(anonymous namespace)::InitializeSession(OrtSessionOptions const*, std::__1::unique_ptr<onnxruntime::InferenceSession, std::__1::default_delete<onnxruntime::InferenceSession>>&, OrtPrepackedWeightsContainer*) + 392
    frame #17: 0x00000001046fb438 libonnxruntime.1.17.1.dylib`OrtApis::CreateSessionFromArray(OrtEnv const*, void const*, unsigned long, OrtSessionOptions const*, OrtSession**) + 104
    frame #18: 0x00000001033786d4 libsherpa-onnx-c-api.dylib`sherpa_onnx::OfflineSpeakerSegmentationPyannoteModel::Impl::Impl(sherpa_onnx::OfflineSpeakerSegmentationModelConfig const&) + 524
    frame #19: 0x000000010336f570 libsherpa-onnx-c-api.dylib`sherpa_onnx::OfflineSpeakerDiarizationPyannoteImpl::OfflineSpeakerDiarizationPyannoteImpl(sherpa_onnx::OfflineSpeakerDiarizationConfig const&) + 72
    frame #20: 0x000000010336f488 libsherpa-onnx-c-api.dylib`sherpa_onnx::OfflineSpeakerDiarizationImpl::Create(sherpa_onnx::OfflineSpeakerDiarizationConfig const&) + 72
    frame #21: 0x000000010329ce64 libsherpa-onnx-c-api.dylib`SherpaOnnxCreateOfflineSpeakerDiarization + 480
[....my frames]

(lldb) break set -E c++
Breakpoint 1: 2 locations.
(lldb) continue
Process 50699 resuming
Process 50699 stopped
* thread #2, name = 'models::sherpa::test_diarization', stop reason = breakpoint 1.1
    frame #0: 0x000000018f643c28 libc++abi.dylib`__cxa_throw
libc++abi.dylib`__cxa_throw:
->  0x18f643c28 <+0>:  pacibsp 
    0x18f643c2c <+4>:  stp    x22, x21, [sp, #-0x30]!
    0x18f643c30 <+8>:  stp    x20, x19, [sp, #0x10]
    0x18f643c34 <+12>: stp    x29, x30, [sp, #0x20]
Target 0: (patientprism_speaker_recognition-1e82013835e05ea3) stopped.
(lldb) register read
General Purpose Registers:
        x0 = 0x00000001255a2ba0
        x1 = 0x00000001f4c776d8  libc++abi.dylib`typeinfo for std::runtime_error
        x2 = 0x000000018f641090  libc++abi.dylib`std::runtime_error::~runtime_error()
        x3 = 0x00000001255a2d2c
        x4 = 0xfffffffffffffee8
        x5 = 0x0000000000000018
        x6 = 0x0000000000000002
        x7 = 0x0000000246e43f3a  "A valid manifest does not exist at path: "
        x8 = 0x0000000000000000
        x9 = 0x0000000000000b03
       x10 = 0x0000000000018008
       x11 = 0x0000000000008008
       x12 = 0x0000000000008008
       x13 = 0x0000000000018008
       x14 = 0x00000000ffffffff
       x15 = 0x0000000000000019
       x16 = 0x000000018f643c28  libc++abi.dylib`__cxa_throw
       x17 = 0x00000002842a0d70
       x18 = 0x0000000000000000
       x19 = 0x000000016f6c8768
       x20 = 0x000000016f6c8780
       x21 = 0x000000016f6c8798
       x22 = 0x000000016f6c87b0
       x23 = 0x0000000000000000
       x24 = 0x00000001255a2ba0
       x25 = 0x0000000000000000
       x26 = 0x00000001255a2690
       x27 = 0x0000000103f0a7e0
       x28 = 0x000000016f6c8d00
        fp = 0x000000016f6c8750
        lr = 0x0000000246d13304  MLAssetIO`MPL::detail::ModelPackageImpl::ModelPackageImpl(std::__1::__fs::filesystem::path const&, bool, bool) + 1564
        sp = 0x000000016f6c8470
        pc = 0x000000018f643c28  libc++abi.dylib`__cxa_throw
      cpsr = 0x80001400

Sometimes the example does work. I think it's something related to the actual audio I'm trying to diarize. I have attached it below (the sine tones are from PII I have removed btw)

a_pii_removed.wav.zip

@thewh1teagle
Copy link
Owner

@WilliamVenner

Few things you can try:

disable static feature at least for now it can cause issues.

run the tests exactly as the diarize example in the repository with the same commands mentioned there.

disable cuda / directml features and check if it works with the CPU. if it works it's an issue in sherpa-onnx.

@WilliamVenner
Copy link
Contributor Author

WilliamVenner commented Oct 21, 2024

Thank you for your response.

Disabling static and directml seemed to have no effect and disabling static and cuda had no effect on Windows, but I did not actually inspect the foreign exception that was being thrown to see if it was in fact different. I have a suspicion it would just be this anyway:

Disabling cuda and using CPU yielded this:

(gdb) print *((std::type_info*)$rsi)
$2 = {_vptr.type_info = 0x7fc1284e3d28 <vtable for __cxxabiv1::__class_type_info+16>, __name = 0x7fc129902ee0 <typeinfo name for fastclustercpp::nan_error> "N14fastclustercpp9nan_errorE"}

I did notice this on my Mac earlier but was ultimately unable to reproduce it. This also happened on Ubuntu.

I'm unsure of where said NaNs could be coming from. I am using the default config options.

@csukuangfj
Copy link

Have you tried other speaker embedding models?

I notice that you only posted info about NeMo models.

@WilliamVenner
Copy link
Contributor Author

@csukuangfj

I tried, on Ubuntu x86-64:

3dspeaker_speech_eres2net_sv_en_voxceleb_16k.onnx - worked

3dspeaker_speech_campplus_sv_en_voxceleb_16k.onnx - NaN errors

nemo_en_speakerverification_speakernet.onnx - NaN errors

On Arm64 macOS, I managed to track down the NaN errors for NeMo to the SpeakerEmbeddingExtractorNeMoImpl::NormalizePerFeature function. It's something with the Eigen map. NaNs showed up simply when calculating the mean of m. Haven't looked much further into the Linux NaN errors, but they're NaN errors nonetheless. Are you able to reproduce?

@csukuangfj
Copy link

Could you describe how to reproduce it with sherpa-onnx?

Please also attach the test wave.

@csukuangfj
Copy link

https://github.com/k2-fsa/sherpa-onnx/blob/ceb69ebd946a6f022d8349ba793d173fd0f5d204/sherpa-onnx/csrc/speaker-embedding-extractor-nemo-impl.h#L125

We need to add an epsilon to the denominator to prevent division by 0

Could you try that?

@WilliamVenner
Copy link
Contributor Author

Could you describe how to reproduce it with sherpa-onnx?

Please also attach the test wave.

I can write some reproduction code. Is C OK?

https://github.com/k2-fsa/sherpa-onnx/blob/ceb69ebd946a6f022d8349ba793d173fd0f5d204/sherpa-onnx/csrc/speaker-embedding-extractor-nemo-impl.h#L125

We need to add an epsilon to the denominator to prevent division by 0

Could you try that?

I actually found that NaNs were showing up far earlier than that line. Have you been able to reproduce (without steps) yet?

@csukuangfj
Copy link

sorry, I need to sleep now. Have not tried to reproduce it yet.

Yes, C code is fine for reproducing.

@WilliamVenner
Copy link
Contributor Author

WilliamVenner commented Oct 23, 2024

No problem, thank you for your help.

I was able to reproduce the problem on Ubuntu 24.04 x86-64 using the offline-speaker-diarization-c-api example in sherpa-onnx, but with nemo_en_titanet_large.onnx model used as the embedding_extractor_model instead. I built the example using the same command found here.

The test wave file: a_pii_removed.wav.zip

[...]
progress 100.00%
terminate called after throwing an instance of 'fastclustercpp::nan_error'
Aborted (core dumped)

@csukuangfj
Copy link

Please see
k2-fsa/sherpa-onnx#1461

It should fix this issue.

csukuangfj added a commit to k2-fsa/sherpa-onnx that referenced this issue Oct 24, 2024
@WilliamVenner
Copy link
Contributor Author

Awesome thank you so much. Gave it a try today and it works 😁

I will leave this issue open until the fix has been merged into sherpa-rs as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants