-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent number of segments error #64
Comments
This is a duplicate of #59 I fixed this issue recently, and the fix landed in master a few minutes ago. |
I'm closing assuming it is fixed. |
Still happening for me with both Whisper and Whisper Timestamped updated: |
Thanks! If it's a blocker for you, you can try |
However I'm not seeing anything particular in the last release that would explain the failure... |
Sure. Audio: Command and error: |
Thanks a lot @darnn Now I can reproduce :) I will work on this soon |
This should be finally fixed in version 1.12.5 (Sorry about the inconvenience in previous versions, it took me some times before finding the good solution to some corner case, now I think I did it right) Thanks again for well reporting this issue @darnn |
Still work in progress actually. I encountered another corner case that fails |
ty! |
Is this error still considered a work-in-progress? If it is, my thanks for your work and please disregard the info below (unless it's useful to you). If not, I'm still encountering it using the medium model (I'm currently trying the other model sizes to see if they fail):
Another bit of information - the raw version of this audio stream does not crash the transcription script. However, the file is noisy and the transcription quality isn't great (lots of repeated text) so I ran the logmmse version of the Kalman filter on it. This substantially improved the audio quality, but transcribing now fails.
Versions:
|
Oh dear, I was not aware this could fail again. This kind of error really depends on what is transcribed by the inner Whisper model. With a "butterfly effect" that makes the issue hardly reproducible. |
Hello, this did the trick for me. Just adding the options
I hope this helps. |
Sorry for the delay, I've been busy with other stuff. Unfortunately I can't share the file (it's a HIPA-protected recording of a healthcare conversation). I've updated to version 1.12.8 and am still encountering this error (although with a different file now - the other one started working when I switched condition_on_previous_text from True to False (this helped with hallucination problems). The call to whisper is as follows:
|
Thank you @jeremymatt for your feeback. If it still fails, could you please use the
Finally, if it's really a blocker for you, a workaround is to disable efficient decoding, as spotted by @stungkuling . This can be done in python by using one of these options with whisper-timestamped's
Just the decoding time will be higher. But transcription results can also be better (especially with |
I finally identified something that could cause this error. I cross fingers very hard so that this bug is finally solved in new version 1.12.11 |
Thanks for your hard work on this! It's a super useful tool. Helping me out an ton, and I'll be using it for at least one paper. I'll re-try the problematic file in a bit and will let you know how it goes. Another solution (sort of) is to transcribe in parts and then just join the transcripts. This is similar to how I'm dealing with the hallucinations. Hallucinations are easy to detect as they consist of repeated phrases - at least for my transcripts, if there wasn't phrase repetition, the quality of transcription is acceptable. There's some funkiness such as if a word shows up twice in a phrase. For example, "I think that I should I think that I should I think that I should" is a period 5 repetition, but "I" has a 3/2/3 pattern. Anyway, I just find repetition, clip that out of the transcript, and then re-transcribe only that section of audio. |
Any updates on this? |
We had no feedback whether it was fixed for @jeremymatt Do you have such an exception @eloukas ? If yes, can you give more details and maybe way to reproduce? |
versions:
-system
got the same error: File "/home/tbot/miniconda3/envs/tbot/lib/python3.11/site-packages/whisper_timestamped/transcribe.py", line 285, in transcribe_timestamped this is my code: import whisper_timestamped as whisper
audio = whisper.load_audio("/media/raid/twitch/papaplatte/papaplatte-stream-2024-01-30/temp_1.5_15.22.mp4")
model = whisper.load_model("tiny", device="cuda")
result = whisper.transcribe(model, audio, language='de')
import json
print(json.dumps(result, indent = 2, ensure_ascii = False)) .. When assertionerror was commented out code was able to print results in json. But im not shure if they´re somewhat reliable blob of the data: https://pastes.io/embed/bsmewxtuyd |
Thanks @iampickle I reopen this issue, that is also being discussed here: #79 Having your openai-whisper version would also help to understand. And I think this bug is problematic for the result (that is probably wrong). |
Shure, |
So I tested this module to see if I get anywhere with my finetuned whisper-v2 model. Unfortunately, the timestamps are often bad, especially if I am using
As you often ask about concrete audio files: these are audios generated via Microsoft TTS from an Icelandic voice. The text itself is not specific. My guess is that you can use the same approach (use a TTS system) to generate enough of test data yourself. The problem lies not in the TTS audio files. These are very clear, have consistent timing and pauses. No background noise at all, etc. |
@lumpidu concerning 1 : Do you mean you need overlapping segments/words? Anyway, this description is not clear enough to me to understand the suggestion.
|
Yes maybe it's a different bug, but maybe it's also related. You need to decide. I see e.g. the following problems when looking at the segments: "segments": [
{
"id": 0,
"seek": 0,
"start": 0.0,
"end": 30.0,
"text": "afbrot og refsjábyrgð eitt efnisyfirlit ...",
"tokens": [...],
"temperature": 0.0,
"avg_logprob": -0.024239512714179786,
"compression_ratio": 1.8644067796610169,
"no_speech_prob": 8.609986252849922e-05,
"confidence": 0.988,
"words": [ ... ]
....
},
{
"id": 1,
"seek": 3000,
"start": 30.0,
"end": 31.58,
"text": "ilög á grundvelli þjóðréttarsamninga tuttugu og tvö þrjú íslensk refsilög og áhrif mannréttindareglna...",
"tokens": [ ... ],
"confidence": 0.031,
...
},
{
"id": 2,
"seek": 6000,
"start": 59.74,
"end": 60.8,
"text": "fsiréttar í fræðikerfi lögfræðinnar tuttugu og sjö fjögur grundvallarhugtökin afbrot og refsing tuttugu og sjö...",
"tokens": [ ... ],
"confidence": 0.011,
...
},
...
] Take a look at the
There is no warning on |
OK @lumpidu so it's another issue. |
@iampickle The failure should not happen anymore (in new version 1.15.0 of whisper-timestamped). Thank you for having given everything to reproduce and investigate that properly. Note that the transcription results are rather poor on your audio with music (it transcripts only "Musik"). |
Hi!
Recently I launched transcription and received such error:
File "/usr/local/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 259, in transcribe_timestamped
(transcription, words) = _transcribe_timestamped_efficient(model, audio,
File "/usr/local/lib/python3.9/site-packages/whisper_timestamped/transcribe.py", line 851, in _transcribe_timestamped_efficient
assert l1 == l2 or l1 == 0, f"Inconsistent number of segments: whisper_segments ({l1}) != timestamped_word_segments ({l2})"
AssertionError: Inconsistent number of segments: whisper_segments (57) != timestamped_word_segments (56)
Could you know the reason behind it?
If you need some more details please let me know
The text was updated successfully, but these errors were encountered: