Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve timestamp accuracy #95

Open
achimmihca opened this issue Jul 28, 2024 · 2 comments
Open

Improve timestamp accuracy #95

achimmihca opened this issue Jul 28, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@achimmihca
Copy link
Contributor

achimmihca commented Jul 28, 2024

Thank you for porting Whisper to Unity!

One issue I face is that the timestamps are not very accurate.
I found that there are related projects that try to improve this

Is it possible somehow to make use of these to improve timestamp accuracy?
What would be necessary to achieve this?

Note that there are also related issues on whisper.cpp

@Macoron
Copy link
Owner

Macoron commented Jul 29, 2024

Have you looked into DTW? ggerganov/whisper.cpp#1485

I believe there is a new setting option here, but I didn't test them yet:

// [EXPERIMENTAL] Token-level timestamps with DTW
[MarshalAs(UnmanagedType.U1)] bool dtw_token_timestamps;
WhisperAlignmentHeadsPreset dtw_aheads_preset;
int dtw_n_top;
WhisperNativeAheads dtw_aheads;
UIntPtr dtw_mem_size; // TODO: remove

@Macoron Macoron added the enhancement New feature or request label Jul 29, 2024
@achimmihca achimmihca changed the title WhisperX for improved timestamp accuracy Improve timestamp accuracy Jul 29, 2024
@achimmihca
Copy link
Contributor Author

Have you looked into DTW?

Thanks for pointing this out. But I don't see an C# API yet to try this out.

From the comment in WhisperNativeParams.cs I think it is only a point in time ("moment in audio") but I would need start and end for each word

        // [EXPERIMENTAL] Token-level timestamps with DTW
        // do not use if you haven't computed token-level timestamps with dtw
        // Roughly corresponds to the moment in audio in which the token was output
        ulong t_dtw;

Anyway, will give it a try when it is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants