IndexError in Whisper model: Index out of bounds during token timestamp extraction #12

GrahLnn · 2024-11-21T13:04:19Z

I tried to transcribe an hour-long audio, but I got this error. I had good results with a two-minute task attempt, so I wanted to try the long audio. Is there any way to fix it? Thank you.

def transcribe_audio(file_path):
    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    print(f"{device=}")
    torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

    model_id = "nyrahealth/CrisperWhisper"

    model = AutoModelForSpeechSeq2Seq.from_pretrained(
        model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
    )
    model.to(device)

    processor = AutoProcessor.from_pretrained(model_id)

    pipe = pipeline(
        "automatic-speech-recognition",
        model=model,
        tokenizer=processor.tokenizer,
        feature_extractor=processor.feature_extractor,
        chunk_length_s=30,
        stride_length_s=4,
        batch_size=1,
        return_timestamps="word",
        torch_dtype=torch_dtype,
        device=device,
    )

    result = pipe(file_path)
    return result

and error

Traceback (most recent call last):
  File "C:\Users\grahlnn\test\CrisperWhisper.py", line 71, in <module>
    res = transcribe_audio(
          ^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\CrisperWhisper.py", line 66, in transcribe_audio
    result = pipe(file_path)
             ^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 283, in __call__
    return super().__call__(inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\base.py", line 1294, in __call__
    return next(
           ^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__
    item = next(self.iterator)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 269, in __next__
    processed = self.infer(next(self.iterator), **self.params)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\base.py", line 1209, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 515, in _forward
    tokens = self.model.generate(
             ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 684, in generate
    ) = self.generate_with_fallback(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 862, in generate_with_fallback
    seek_sequences, seek_outputs = self._postprocess_outputs(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 963, in _postprocess_outputs
    seek_outputs["token_timestamps"] = self._extract_token_timestamps(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 221, in _extract_token_timestamps
    [
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 222, in <listcomp>
    torch.index_select(weights[:, :, i, :], dim=0, index=beam_indices[:, i])
                       ~~~~~~~^^^^^^^^^^^^
IndexError: index 447 is out of bounds for dimension 2 with size 447

The text was updated successfully, but these errors were encountered:

LaurinmyReha · 2024-11-21T13:48:19Z

Hey,

the longform logic is something we will work on next since the transformers implementation is not ideal for our model.

However, hopefully for a quick fix you can try to install our custom fork and see if this fixes your problem:
pip install git+https://github.com/nyrahealth/transformers.git@crisper_whisper

If this does not do it let me know and we look into it further together.

Best,

Laurin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError in Whisper model: Index out of bounds during token timestamp extraction #12

IndexError in Whisper model: Index out of bounds during token timestamp extraction #12

GrahLnn commented Nov 21, 2024

LaurinmyReha commented Nov 21, 2024 •

edited

Loading

IndexError in Whisper model: Index out of bounds during token timestamp extraction #12

IndexError in Whisper model: Index out of bounds during token timestamp extraction #12

Comments

GrahLnn commented Nov 21, 2024

LaurinmyReha commented Nov 21, 2024 • edited Loading

LaurinmyReha commented Nov 21, 2024 •

edited

Loading