Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper_timestamped None error when streaming to whisper_online_server #129

Open
lynaghk opened this issue Oct 11, 2024 · 8 comments
Open

Comments

@lynaghk
Copy link

lynaghk commented Oct 11, 2024

I'm on a Mac M1 and trying to use the whisper_timestamped backend, but when I run the server via

uv run whisper_online_server.py --model base.en --backend whisper_timestamped

and then stream audio to it via

ffmpeg -hide_banner -f avfoundation -i ":1" -ac 1 -ar 16000 -f s16le -loglevel error - | nc localhost 43007

the server crashes with

AttributeError: 'NoneType' object has no attribute 'shape'

after a few seconds, before any transcription output is returned.
I'm using UV to manage the python dependencies, so the exact versions of everything in this setup are specified in the my uv.lock lockfile.

Perhaps one of the dependencies has made a breaking change and I should be on an older version of something?
Please let me know if there's any additional information I can provide to help debug.

Full stacktrace from the server process:

$ uv run whisper_online_server.py --model base.en --backend whisper_timestamped
INFO	Loading Whisper base.en model for auto...
/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/__init__.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(fp, map_location=device)
INFO	done. It took 2.35 seconds.
WARNING	Whisper is not warmed up. The first chunk processing may take longer.
INFO	Listening on('localhost', 43007)
INFO	Connected to client on ('127.0.0.1', 64243)
DEBUG	PROMPT: 
DEBUG	CONTEXT: 
DEBUG	transcribing 1.00 seconds from 0.00
Traceback (most recent call last):
  File "/Users/dev/software/whisper_streaming/whisper_online_server.py", line 181, in <module>
    proc.process()
  File "/Users/dev/software/whisper_streaming/whisper_online_server.py", line 158, in process
    o = online.process_iter()
  File "/Users/dev/software/whisper_streaming/whisper_online.py", line 376, in process_iter
    res = self.asr.transcribe(self.audio_buffer, init_prompt=prompt)
  File "/Users/dev/software/whisper_streaming/whisper_online.py", line 73, in transcribe
    result = self.transcribe_timestamped(self.model,
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 296, in transcribe_timestamped
    (transcription, words) = _transcribe_timestamped_efficient(model, audio,
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 888, in _transcribe_timestamped_efficient
    transcription = model.transcribe(audio, **whisper_options)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/transcribe.py", line 279, in transcribe
    result: DecodingResult = decode_with_fallback(mel_segment)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/transcribe.py", line 195, in decode_with_fallback
    decode_result = model.decode(segment, options)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 824, in decode
    result = DecodingTask(model, options).run(mel)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 737, in run
    tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 687, in _main_loop
    logits = self.inference.logits(tokens, audio_features)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 163, in logits
    return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/model.py", line 242, in forward
    x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/model.py", line 169, in forward
    x = x + self.cross_attn(self.cross_attn_ln(x), xa, kv_cache=kv_cache)[0]
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1616, in _call_impl
    hook_result = hook(self, args, result)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 882, in <lambda>
    lambda layer, ins, outs, index=j: hook_attention_weights(layer, ins, outs, index))
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 777, in hook_attention_weights
    if w.shape[-2] > 1:
AttributeError: 'NoneType' object has no attribute 'shape'
@Gldkslfmsd
Copy link
Collaborator

hi,
it looks like a bug in whisper_timestamped. Can you check if it works on the same audio with plain offline whisper_timestamped?

@lynaghk
Copy link
Author

lynaghk commented Oct 13, 2024

Yeah, looks like the same error when I try to transcribe the jfk.wav audio sample.

Do you have a specific version of whisper_timestamped that's known to work? I can pin to that version and give it another shot.

21:43:32 $ uv run whisper_online.py --model base.en --backend whisper_timestamped ~/software/whisper.cpp/samples/jfk.wav
INFO	Audio duration is: 11.00 seconds
INFO	Loading Whisper base.en model for auto...
/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/__init__.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(fp, map_location=device)
INFO	done. It took 1.25 seconds.
Traceback (most recent call last):
  File "/Users/dev/software/whisper_streaming/whisper_online.py", line 763, in <module>
    asr.transcribe(a)
  File "/Users/dev/software/whisper_streaming/whisper_online.py", line 73, in transcribe
    result = self.transcribe_timestamped(self.model,
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 296, in transcribe_timestamped
    (transcription, words) = _transcribe_timestamped_efficient(model, audio,
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 888, in _transcribe_timestamped_efficient
    transcription = model.transcribe(audio, **whisper_options)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/transcribe.py", line 279, in transcribe
    result: DecodingResult = decode_with_fallback(mel_segment)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/transcribe.py", line 195, in decode_with_fallback
    decode_result = model.decode(segment, options)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 824, in decode
    result = DecodingTask(model, options).run(mel)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 737, in run
    tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 687, in _main_loop
    logits = self.inference.logits(tokens, audio_features)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 163, in logits
    return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/model.py", line 242, in forward
    x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/model.py", line 169, in forward
    x = x + self.cross_attn(self.cross_attn_ln(x), xa, kv_cache=kv_cache)[0]
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1616, in _call_impl
    hook_result = hook(self, args, result)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 882, in <lambda>
    lambda layer, ins, outs, index=j: hook_attention_weights(layer, ins, outs, index))
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 777, in hook_attention_weights
    if w.shape[-2] > 1:
AttributeError: 'NoneType' object has no attribute 'shape'

@lynaghk
Copy link
Author

lynaghk commented Oct 13, 2024

I tried a few earlier versions of whisper-timestamped, but they all give the same error.
Here are the deps for the earliest published version of whisper-timestamped (1.12.20).

I'm not super familiar with Python, but my understanding is that most packages don't lock their dependency versions, so I believe the issue may be related to a transitive dependency. I'll keep digging.

whisper-streaming v0.1.0
├── librosa v0.10.2.post1
│   ├── audioread v3.0.1
│   ├── decorator v5.1.1
│   ├── joblib v1.4.2
│   ├── lazy-loader v0.4
│   │   └── packaging v24.1
│   ├── msgpack v1.1.0
│   ├── numba v0.60.0
│   │   ├── llvmlite v0.43.0
│   │   └── numpy v2.0.2
│   ├── numpy v2.0.2
│   ├── pooch v1.8.2
│   │   ├── packaging v24.1
│   │   ├── platformdirs v4.3.6
│   │   └── requests v2.32.3
│   │       ├── certifi v2024.8.30
│   │       ├── charset-normalizer v3.4.0
│   │       ├── idna v3.10
│   │       └── urllib3 v2.2.3
│   ├── scikit-learn v1.5.2
│   │   ├── joblib v1.4.2
│   │   ├── numpy v2.0.2
│   │   ├── scipy v1.13.1
│   │   │   └── numpy v2.0.2
│   │   └── threadpoolctl v3.5.0
│   ├── scipy v1.13.1 (*)
│   ├── soundfile v0.12.1
│   │   └── cffi v1.17.1
│   │       └── pycparser v2.22
│   ├── soxr v0.5.0.post1
│   │   └── numpy v2.0.2
│   └── typing-extensions v4.12.2
├── soundfile v0.12.1 (*)
└── whisper-timestamped v1.12.20
    ├── cython v3.0.11
    ├── dtw-python v1.5.3
    │   ├── numpy v2.0.2
    │   └── scipy v1.13.1 (*)
    └── openai-whisper v20240930
        ├── more-itertools v10.5.0
        ├── numba v0.60.0 (*)
        ├── numpy v2.0.2
        ├── tiktoken v0.8.0
        │   ├── regex v2024.9.11
        │   └── requests v2.32.3 (*)
        ├── torch v2.4.1
        │   ├── filelock v3.16.1
        │   ├── fsspec v2024.9.0
        │   ├── jinja2 v3.1.4
        │   │   └── markupsafe v3.0.1
        │   ├── networkx v3.2.1
        │   ├── sympy v1.13.3
        │   │   └── mpmath v1.3.0
        │   └── typing-extensions v4.12.2
        └── tqdm v4.66.5
(*) Package tree already displayed

@lynaghk
Copy link
Author

lynaghk commented Oct 13, 2024

Just realized that by "plain offline whisper_timestamped" you may have meant whisper_online.py --offline. Running that below, unfortunately it's the same error.

22:14:26 $ uv run whisper_online.py --offline --model base.en --backend whisper_timestamped ~/software/whisper.cpp/samples/jfk.wav
INFO	Audio duration is: 11.00 seconds
INFO	Loading Whisper base.en model for auto...
/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/__init__.py:150: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(fp, map_location=device)
INFO	done. It took 1.63 seconds.
Traceback (most recent call last):
  File "/Users/dev/software/whisper_streaming/whisper_online.py", line 763, in <module>
    asr.transcribe(a)
  File "/Users/dev/software/whisper_streaming/whisper_online.py", line 73, in transcribe
    result = self.transcribe_timestamped(self.model,
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 296, in transcribe_timestamped
    (transcription, words) = _transcribe_timestamped_efficient(model, audio,
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 888, in _transcribe_timestamped_efficient
    transcription = model.transcribe(audio, **whisper_options)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/transcribe.py", line 279, in transcribe
    result: DecodingResult = decode_with_fallback(mel_segment)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/transcribe.py", line 195, in decode_with_fallback
    decode_result = model.decode(segment, options)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 824, in decode
    result = DecodingTask(model, options).run(mel)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 737, in run
    tokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 687, in _main_loop
    logits = self.inference.logits(tokens, audio_features)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/decoding.py", line 163, in logits
    return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/model.py", line 242, in forward
    x = block(x, xa, mask=self.mask, kv_cache=kv_cache)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper/model.py", line 169, in forward
    x = x + self.cross_attn(self.cross_attn_ln(x), xa, kv_cache=kv_cache)[0]
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1616, in _call_impl
    hook_result = hook(self, args, result)
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 882, in <lambda>
    lambda layer, ins, outs, index=j: hook_attention_weights(layer, ins, outs, index))
  File "/Users/dev/software/whisper_streaming/.venv/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 777, in hook_attention_weights
    if w.shape[-2] > 1:
AttributeError: 'NoneType' object has no attribute 'shape'

@Gldkslfmsd
Copy link
Collaborator

see this: linto-ai/whisper-timestamped#212
So they pinned the openai-whisper version.

@Gldkslfmsd
Copy link
Collaborator

pip install openai-whisper==20231117 -- this worked to me

@lynaghk
Copy link
Author

lynaghk commented Oct 15, 2024

Did it work for you on an ARM Mac? I gave that a shot but it failed with

error: distribution triton==2.3.1 @ registry+https://pypi.org/simple can't be installed because it doesn't have a source distribution or wheel for the current platform

@Gldkslfmsd
Copy link
Collaborator

I haven't tried, I have Linux. It's whisper-timestamped's issue, so please follow up there: linto-ai/whisper-timestamped#212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants