How to obtain word-level segmentation timestamps? #1855
-
Hi, I am currently facing a challenge with the transcription output from Whisper. Current timestamps include pauses between words, but I require precise start and end times for each individual word, excluding any pauses. I think it is the segmention timestamp of each word. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
|
Beta Was this translation helpful? Give feedback.
-
Try WhisperX |
Beta Was this translation helpful? Give feedback.
Your observation is correct; Whisper is not explicitly trained for word-level timestamps and the current outputs are produced by an inference-time trick, which does not give perfectly accurate timing, especially when dealing with pauses..