Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruby: New segment callback #2506

Merged
merged 45 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
246793c
Add Params#new_segment_callback= method
KitaitiMakoto Oct 17, 2024
2ccddb9
Add tests for Params#new_segment_callback=
KitaitiMakoto Oct 17, 2024
0c1ff90
Group tests for #transcribe
KitaitiMakoto Oct 17, 2024
1d2d772
Don't use static for thread-safety
KitaitiMakoto Oct 17, 2024
37aa3c6
Set new_segment_callback only when necessary
KitaitiMakoto Oct 17, 2024
3fec5d3
Remove redundant check
KitaitiMakoto Oct 17, 2024
be67fa2
[skip ci] Add Ruby version README
KitaitiMakoto Oct 18, 2024
58d3fb6
Revert "Group tests for #transcribe"
KitaitiMakoto Oct 19, 2024
cee8c6a
Revert "Add tests for Params#new_segment_callback="
KitaitiMakoto Oct 19, 2024
b589895
Add test for Context#full_n_segments
KitaitiMakoto Oct 19, 2024
8990107
Add Context#full_n_segments
KitaitiMakoto Oct 19, 2024
a175c5c
Add tests for lang API
KitaitiMakoto Oct 19, 2024
1c74fdc
Add lang API
KitaitiMakoto Oct 19, 2024
588aa1c
Add tests for Context#full_lang_id API
KitaitiMakoto Oct 19, 2024
6e62076
Add Context#full_lang_id
KitaitiMakoto Oct 19, 2024
ad55836
Add abnormal test cases for lang
KitaitiMakoto Oct 19, 2024
e0255a5
Raise appropriate errors from lang APIs
KitaitiMakoto Oct 19, 2024
09eb66d
Add tests for Context#full_get_segment_t{0,1} API
KitaitiMakoto Oct 19, 2024
4f261f6
Add Context#full_get_segment_t{0,1}
KitaitiMakoto Oct 19, 2024
d69e0be
Add tests for Context#full_get_segment_speaker_turn_next API
KitaitiMakoto Oct 19, 2024
9902dcc
Add Context#full_get_segment_speaker_turn_next
KitaitiMakoto Oct 19, 2024
beba539
Add tests for Context#full_get_segment_text
KitaitiMakoto Oct 19, 2024
63830b6
Add Context#full_get_setgment_text
KitaitiMakoto Oct 19, 2024
0d1ec5f
Add tests for Params#new_segment_callback=
KitaitiMakoto Oct 20, 2024
084f450
Run new segment callback
KitaitiMakoto Oct 20, 2024
6128e05
Split tests to multiple files
KitaitiMakoto Oct 20, 2024
c2de24a
Use container struct for new segment callback
KitaitiMakoto Oct 20, 2024
3f28013
Add tests for Params#new_segment_callback_user_data=
KitaitiMakoto Oct 20, 2024
bb4e81c
Add Whisper::Params#new_user_callback_user_data=
KitaitiMakoto Oct 20, 2024
f411507
Add GC-related test for new segment callback
KitaitiMakoto Oct 20, 2024
79617d7
Protect new segment callback related structs from GC
KitaitiMakoto Oct 20, 2024
eae174e
Add meaningful test for build
KitaitiMakoto Oct 20, 2024
30c00c1
Rename: new_segment_callback_user_data -> new_segment_callback_container
KitaitiMakoto Oct 20, 2024
e7f75f1
Add tests for Whisper::Segment
KitaitiMakoto Oct 21, 2024
326055a
Add Whisper::Segment and Whisper::Context#each_segment
KitaitiMakoto Oct 21, 2024
1132c9e
Extract c_ruby_whisper_callback_container_allocate()
KitaitiMakoto Oct 21, 2024
ba0fbec
Add test for Whisper::Params#on_new_segment
KitaitiMakoto Oct 22, 2024
56c2dfd
Add Whisper::Params#on_new_egment
KitaitiMakoto Oct 22, 2024
87797cc
Assign symbol IDs to variables
KitaitiMakoto Oct 23, 2024
d94710c
Make extsources.yaml simpler
KitaitiMakoto Oct 23, 2024
a0cfc22
Update README
KitaitiMakoto Oct 23, 2024
2f925ca
Add document comments
KitaitiMakoto Oct 24, 2024
c94c8d4
Add test for calling Whisper::Params#on_new_segment multiple times
KitaitiMakoto Oct 24, 2024
c5564fc
Add file dependencies to GitHub actions config and .gitignore
KitaitiMakoto Oct 28, 2024
7a6640a
Add more files to ext/.gitignore
KitaitiMakoto Oct 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/bindings-ruby.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ on:
- ggml/src/ggml-quants.h
- ggml/src/ggml-quants.c
- ggml/src/ggml-cpu-impl.h
- ggml/src/ggml-metal.m
- ggml/src/ggml-metal.metal
- ggml/src/ggml-blas.cpp
- ggml/include/ggml.h
- ggml/include/ggml-alloc.h
- ggml/include/ggml-backend.h
Expand All @@ -24,6 +27,8 @@ on:
- ggml/include/ggml-metal.h
- ggml/include/ggml-sycl.h
- ggml/include/ggml-vulkan.h
- ggml/include/ggml-blas.h
- scripts/get-flags.mk
- examples/dr_wav.h
pull_request:
paths:
Expand All @@ -41,6 +46,9 @@ on:
- ggml/src/ggml-quants.h
- ggml/src/ggml-quants.c
- ggml/src/ggml-cpu-impl.h
- ggml/src/ggml-metal.m
- ggml/src/ggml-metal.metal
- ggml/src/ggml-blas.cpp
- ggml/include/ggml.h
- ggml/include/ggml-alloc.h
- ggml/include/ggml-backend.h
Expand All @@ -49,6 +57,8 @@ on:
- ggml/include/ggml-metal.h
- ggml/include/ggml-sycl.h
- ggml/include/ggml-vulkan.h
- ggml/include/ggml-blas.h
- scripts/get-flags.mk
- examples/dr_wav.h

jobs:
Expand Down
1 change: 0 additions & 1 deletion bindings/ruby/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
README.md
LICENSE
pkg/
lib/whisper.*
110 changes: 110 additions & 0 deletions bindings/ruby/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
whispercpp
==========

![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg)

Ruby bindings for [whisper.cpp][], an interface of automatic speech recognition model.

Installation
------------

Install the gem and add to the application's Gemfile by executing:

$ bundle add whispercpp

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install whispercpp

Usage
-----

```ruby
require "whisper"

whisper = Whisper::Context.new("path/to/model.bin")

params = Whisper::Params.new
params.language = "en"
params.offset = 10_000
params.duration = 60_000
params.max_text_tokens = 300
params.translate = true
params.print_timestamps = false

whisper.transcribe("path/to/audio.wav", params) do |whole_text|
puts whole_text
end

```

### Preparing model ###

Use script to download model file(s):

```bash
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
sh ./models/download-ggml-model.sh base.en
```

There are some types of models. See [models][] page for details.

### Preparing audio file ###

Currently, whisper.cpp accepts only 16-bit WAV files.

### API ###

Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:

```ruby
def format_time(time_ms)
sec, decimal_part = time_ms.divmod(1000)
min, sec = sec.divmod(60)
hour, min = min.divmod(60)
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
end

whisper.transcribe("path/to/audio.wav", params)

whisper.each_segment.with_index do |segment, index|
line = "[%{nth}: %{st} --> %{ed}] %{text}" % {
nth: index + 1,
st: format_time(segment.start_time),
ed: format_time(segment.end_time),
text: segment.text
}
line << " (speaker turned)" if segment.speaker_next_turn?
puts line
end

```

You can also add hook to params called on new segment:

```ruby
def format_time(time_ms)
sec, decimal_part = time_ms.divmod(1000)
min, sec = sec.divmod(60)
hour, min = min.divmod(60)
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
end

# Add hook before calling #transcribe
params.on_new_segment do |segment|
line = "[%{st} --> %{ed}] %{text}" % {
st: format_time(segment.start_time),
ed: format_time(segment.end_time),
text: segment.text
}
line << " (speaker turned)" if segment.speaker_next_turn?
puts line
end

whisper.transcribe("path/to/audio.wav", params)

```

[whisper.cpp]: https://github.com/ggerganov/whisper.cpp
[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models
17 changes: 8 additions & 9 deletions bindings/ruby/Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,16 @@ require "yaml"
require "rake/testtask"

extsources = YAML.load_file("extsources.yaml")
extsources.each_pair do |src_dir, dests|
dests.each do |dest|
src = Pathname(src_dir)/File.basename(dest)

file src
file dest => src do |t|
cp t.source, t.name
end
SOURCES = FileList[]
extsources.each do |src|
basename = src.pathmap("%f")
dest = basename == "LICENSE" ? basename : basename.pathmap("ext/%f")
file src
file dest => src do |t|
cp t.source, t.name
end
SOURCES.include dest
end
SOURCES = extsources.values.flatten
CLEAN.include SOURCES
CLEAN.include FileList[
"ext/*.o",
Expand Down
7 changes: 7 additions & 0 deletions bindings/ruby/ext/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ ggml-backend.c
ggml-backend.h
ggml-common.h
ggml-cpu-impl.h
ggml-metal.m
ggml-metal.metal
ggml-metal-embed.metal
ggml-blas.cpp
ggml-cuda.h
ggml-impl.h
ggml-kompute.h
Expand All @@ -20,9 +24,12 @@ ggml-quants.c
ggml-quants.h
ggml-sycl.h
ggml-vulkan.h
ggml-blas.h
get-flags.mk
whisper.cpp
whisper.h
dr_wav.h
depend
whisper.bundle
whisper.so
whisper.dll
Loading
Loading