-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to suppress tokens #1697
Comments
Have you tried the grammar functionality: #1229 - could be useful |
No, I didn't know it was a thing. I will look into it. But if I understood it correctly from a brief reading, this is awesome; given a context-specific corpus, we can use it to "boost the weight" of certain words. There should be an example or something in the README to give more attention to this feature because it's a game-changer 👀 |
Oh! I just found exactly what I needed in the source code (the grammar functionality will still be handy for me, though, once I get it running for my own needs). Lines 4473 to 4478 in 2623640
And Lines 4567 to 4575 in 2623640
Very handy. Just need to add the tokens I won't use. I'm closing this issue now. |
Would it be possible to load such lists from an external file instead of having to recompile? |
Seconded. I have the same use case (suppressing |
Hmmm. It's more than what @pprobst found, because there are lots of all-numeric tokens, such as "500" (which is not just "5", "0", "0"). Grammars might indeed be a better choice here... |
Yeah. I noticed that afterward. Then, I got the list of tokens (IDs) that whisperx returned to me when I used the suppress_numerals options and hardcoded them into whispercpp. Ugly but it worked for me. Would be cool if there was a similar option in whispercpp. |
While I work on grammars, here's a quick patch folks can apply as desired: static const std::string numbery = "0123456789%$£";
for (int i = 0; i < vocab.token_beg; i++) {
const std::string & token = vocab.id_to_token.at(i);
if (token.find_first_of(numbery) != std::string::npos) {
logits[i] = -INFINITY;
}
} (This goes just after "suppress non-speech tokens".) It appears to cost about 5% runtime, not inconsiderable. If we wanted to add proper support for a simple "omit these ascii characters", without grammars, this could be made much cheaper by doing the |
@josharian I just want to say thank you for this simple and goated patch. I was having a nightmare of a time dealing with word-level timestamps for dollar amounts (eg. "$45" being three different words but being treated as one by whisper) and after hours and hours of banging my head against unhelpful searches and repos this works perfectly. |
Thanks. :) If you want to get rid of the performance penalty for that patch, you can try pulling in this still-pretty-small commit, which I just rebased onto master: josharian@c664398. (It worked months ago when I wrote it, but I haven't tested it since.) |
Hello. As far as I know, whispercpp does not support the option to suppress particular tokens / numerals during inference, like WhisperX does. This is particularly useful, for example, if we want to transcribe numbers literally, e.g., "one" instead of "1".
Is there any interest in adding support for this feature?
The text was updated successfully, but these errors were encountered: