-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with constant hallucinations #121
Comments
hi, you can check whether it's the same with offline Whisper model with VAD on. Alternatively, just remove that phrase from all transcripts before searching for the longest common prefix. But beware, it won't output it when you actually need it. And it may not work whenever Whisper hallucinates anything else. |
This is not any actual greek phrase. My best guess is that the model was partially trained in greek using community generated subtitles for tv shows and whatnot, and they had the creator's name as an advertisement during moments of silence where actual captioning was not needed. This is using the large-v3 model and I cannot find any model that does greek better than this. Do note that this also shows up when transcribing videos with the base Whisper. For now I am attempting to remove it like this:
Where I check the words retrieved from self.asr.ts_words(res) during process_iter and return early if this is found. Edit: def process_iter(self):
"""Runs on the current audio buffer.
Returns: a tuple (beg_timestamp, end_timestamp, "text"), or (None, None, "").
The non-emty text is confirmed (committed) partial transcript.
"""
prompt, non_prompt = self.prompt()
logger.debug(f"PROMPT: {prompt}")
logger.debug(f"CONTEXT: {non_prompt}")
logger.debug(f"transcribing {len(self.audio_buffer)/self.SAMPLING_RATE:2.2f} seconds from {self.buffer_time_offset:2.2f}")
res = self.asr.transcribe(self.audio_buffer, init_prompt=prompt)
tsw = self.asr.ts_words(res)
# Check if 'AUTHORWAVE' is in the transcription result
if self.contains_unwanted_word(tsw, "AUTHORWAVE"):
logger.debug("Discarding transcription result due to unwanted word 'AUTHORWAVE'")
return None, None, ""
self.transcript_buffer.insert(tsw, self.buffer_time_offset)
o = self.transcript_buffer.flush()
if o:
self.commited.extend(o)
self.last_confirmed_time = time.time()
completed = self.to_flush(o)
logger.debug(f">>>>COMPLETE NOW: {completed}")
else:
completed = None
current_time = time.time()
if current_time - self.last_confirmed_time > self.confirmation_timeout:
logger.debug("Timeout exceeded. Forcing confirmation of available text.")
self.force_confirm_text()
the_rest = self.to_flush(self.transcript_buffer.complete())
logger.debug(f"INCOMPLETE: {the_rest}")
# there is a newly confirmed text
if o and self.buffer_trimming_way == "sentence": # trim the completed sentences
if len(self.audio_buffer)/self.SAMPLING_RATE > self.buffer_trimming_sec: # longer than this
self.chunk_completed_sentence()
if self.buffer_trimming_way == "segment":
s = self.buffer_trimming_sec # trim the completed segments longer than s,
else:
s = 30 # if the audio buffer is longer than 30s, trim it
if len(self.audio_buffer)/self.SAMPLING_RATE > s:
self.chunk_completed_segment(res)
logger.debug(f"len of buffer now: {len(self.audio_buffer)/self.SAMPLING_RATE:2.2f}")
return self.to_flush(o) |
Any help on this matter would be greatly appreciated. |
Hi, I'd like to help but I'm busy now. Small advice: Btw. -- latency measure should be applied as well but can be neglected for start. |
Hi, @J-Korn , if I were you, I would remove the unwanted word from |
@Gldkslfmsd I have found that when this specific hallucination occurs, it always outputs either "AUTHORWAVE" or "Υπότιτλοι AUTHORWAVE" on its own, never alongside any actual relevant transcriptions that I would want to keep.
Will this not work? |
yes, if your observations are true than it makes sense |
Using the large-v3 model to transcribe greek audio from a live stream, I am often met with continuous results writing "Υπότιτλοι AUTHORWAVE"
It seems the model is bugged in a way that outputs that phrase when it does not understand the input.
Setting vac and vad to True dos not seem to reduce that occurrence.
Is there some way I can discard this specific phrase or similar ones so they do not get confirmed and sent to the client?
The text was updated successfully, but these errors were encountered: