Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError in Whisper model: Index out of bounds during token timestamp extraction #12

Open
GrahLnn opened this issue Nov 21, 2024 · 1 comment

Comments

@GrahLnn
Copy link

GrahLnn commented Nov 21, 2024

I tried to transcribe an hour-long audio, but I got this error. I had good results with a two-minute task attempt, so I wanted to try the long audio. Is there any way to fix it? Thank you.

def transcribe_audio(file_path):
    device = "cuda:0" if torch.cuda.is_available() else "cpu"
    print(f"{device=}")
    torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

    model_id = "nyrahealth/CrisperWhisper"

    model = AutoModelForSpeechSeq2Seq.from_pretrained(
        model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
    )
    model.to(device)

    processor = AutoProcessor.from_pretrained(model_id)

    pipe = pipeline(
        "automatic-speech-recognition",
        model=model,
        tokenizer=processor.tokenizer,
        feature_extractor=processor.feature_extractor,
        chunk_length_s=30,
        stride_length_s=4,
        batch_size=1,
        return_timestamps="word",
        torch_dtype=torch_dtype,
        device=device,
    )

    result = pipe(file_path)
    return result

and error

Traceback (most recent call last):
  File "C:\Users\grahlnn\test\CrisperWhisper.py", line 71, in <module>
    res = transcribe_audio(
          ^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\CrisperWhisper.py", line 66, in transcribe_audio
    result = pipe(file_path)
             ^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 283, in __call__
    return super().__call__(inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\base.py", line 1294, in __call__
    return next(
           ^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__
    item = next(self.iterator)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\pt_utils.py", line 269, in __next__
    processed = self.infer(next(self.iterator), **self.params)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\base.py", line 1209, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\pipelines\automatic_speech_recognition.py", line 515, in _forward
    tokens = self.model.generate(
             ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 684, in generate
    ) = self.generate_with_fallback(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 862, in generate_with_fallback
    seek_sequences, seek_outputs = self._postprocess_outputs(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 963, in _postprocess_outputs
    seek_outputs["token_timestamps"] = self._extract_token_timestamps(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 221, in _extract_token_timestamps
    [
  File "C:\Users\grahlnn\test\.venv\Lib\site-packages\transformers\models\whisper\generation_whisper.py", line 222, in <listcomp>
    torch.index_select(weights[:, :, i, :], dim=0, index=beam_indices[:, i])
                       ~~~~~~~^^^^^^^^^^^^
IndexError: index 447 is out of bounds for dimension 2 with size 447
@LaurinmyReha
Copy link
Contributor

LaurinmyReha commented Nov 21, 2024

Hey,

the longform logic is something we will work on next since the transformers implementation is not ideal for our model.

However, hopefully for a quick fix you can try to install our custom fork and see if this fixes your problem:
pip install git+https://github.com/nyrahealth/transformers.git@crisper_whisper

If this does not do it let me know and we look into it further together.

Best,

Laurin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants