-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a bool fold_lowercase to whisper_context_params #2005
base: master
Are you sure you want to change the base?
Conversation
If true, it folds language-model tokens to lowercase. By default, it's false. This is intended to make grammar matching more predictable, e.g. no need to account for case in the grammar.
@@ -44,6 +44,7 @@ struct whisper_params { | |||
bool print_energy = false; | |||
bool no_timestamps = true; | |||
bool use_gpu = true; | |||
bool model_fold_lc = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change name to vocab_lc
Co-authored-by: Georgi Gerganov <[email protected]>
fprintf(stderr, " -nt, --no-timestamps [%-7s] do not print timestamps\n", params.no_timestamps ? "true" : "false"); | ||
fprintf(stderr, " --model-fold-lc [%-7s] fold all model tokens to lowercase\n", params.model_fold_lc ? "true" : "false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fprintf(stderr, " -nt, --no-timestamps [%-7s] do not print timestamps\n", params.no_timestamps ? "true" : "false"); | |
fprintf(stderr, " --model-fold-lc [%-7s] fold all model tokens to lowercase\n", params.model_fold_lc ? "true" : "false"); | |
fprintf(stderr, " -nt, --no-timestamps [%-7s] do not print timestamps\n", params.no_timestamps ? "true" : "false"); | |
fprintf(stderr, " --vocab-lc [%-7s] fold all vocab tokens to lowercase\n", params.vocab_lc ? "true" : "false"); |
I have no idea what's wrong with the Java bindings. I loaded them all into Visual Studio Code and fixed all the errors it reported (which didn't seem related to my changes), but still the Java-related tests fail. FYI, I haven't programmed in Java in over 10 years. |
I'm also not good with Java, but I think we are probably observing an issue similar to this one: ggerganov/llama.cpp#1902 (comment) In short, even though the two structs whisper.cpp/bindings/java/src/main/java/io/github/ggerganov/whispercpp/WhisperCpp.java Lines 70 to 81 in 8f253ef
The proper solution is to order the members in decreasing size (i.e. keep the bools at the end of the struct). Or maybe avoid |
If true, it folds language-model tokens to lowercase. By default, it's false.
This is intended to make grammar matching more predictable, e.g. no need to account for case in the grammar.