ruby : support new-segment callback (ggerganov#2506)

* Add Params#new_segment_callback= method * Add tests for Params#new_segment_callback= * Group tests for #transcribe * Don't use static for thread-safety * Set new_segment_callback only when necessary * Remove redundant check * [skip ci] Add Ruby version README * Revert "Group tests for #transcribe" This reverts commit 71b65b0. * Revert "Add tests for Params#new_segment_callback=" This reverts commit 81e6df3. * Add test for Context#full_n_segments * Add Context#full_n_segments * Add tests for lang API * Add lang API * Add tests for Context#full_lang_id API * Add Context#full_lang_id * Add abnormal test cases for lang * Raise appropriate errors from lang APIs * Add tests for Context#full_get_segment_t{0,1} API * Add Context#full_get_segment_t{0,1} * Add tests for Context#full_get_segment_speaker_turn_next API * Add Context#full_get_segment_speaker_turn_next * Add tests for Context#full_get_segment_text * Add Context#full_get_setgment_text * Add tests for Params#new_segment_callback= * Run new segment callback * Split tests to multiple files * Use container struct for new segment callback * Add tests for Params#new_segment_callback_user_data= * Add Whisper::Params#new_user_callback_user_data= * Add GC-related test for new segment callback * Protect new segment callback related structs from GC * Add meaningful test for build * Rename: new_segment_callback_user_data -> new_segment_callback_container * Add tests for Whisper::Segment * Add Whisper::Segment and Whisper::Context#each_segment * Extract c_ruby_whisper_callback_container_allocate() * Add test for Whisper::Params#on_new_segment * Add Whisper::Params#on_new_egment * Assign symbol IDs to variables * Make extsources.yaml simpler * Update README * Add document comments * Add test for calling Whisper::Params#on_new_segment multiple times * Add file dependencies to GitHub actions config and .gitignore * Add more files to ext/.gitignore
adutilleul · Nov 16, 2024 · 7011725 · 7011725
1 parent a9d704b
commit 7011725
Show file tree

Hide file tree

Showing 14 changed files with 1,112 additions and 170 deletions.
diff --git a/.github/workflows/bindings-ruby.yml b/.github/workflows/bindings-ruby.yml
@@ -16,6 +16,9 @@ on:
       - ggml/src/ggml-quants.h
       - ggml/src/ggml-quants.c
       - ggml/src/ggml-cpu-impl.h
+      - ggml/src/ggml-metal.m
+      - ggml/src/ggml-metal.metal
+      - ggml/src/ggml-blas.cpp
       - ggml/include/ggml.h
       - ggml/include/ggml-alloc.h
       - ggml/include/ggml-backend.h
@@ -24,6 +27,8 @@ on:
       - ggml/include/ggml-metal.h
       - ggml/include/ggml-sycl.h
       - ggml/include/ggml-vulkan.h
+      - ggml/include/ggml-blas.h
+      - scripts/get-flags.mk
       - examples/dr_wav.h
   pull_request:
     paths:
@@ -41,6 +46,9 @@ on:
       - ggml/src/ggml-quants.h
       - ggml/src/ggml-quants.c
       - ggml/src/ggml-cpu-impl.h
+      - ggml/src/ggml-metal.m
+      - ggml/src/ggml-metal.metal
+      - ggml/src/ggml-blas.cpp
       - ggml/include/ggml.h
       - ggml/include/ggml-alloc.h
       - ggml/include/ggml-backend.h
@@ -49,6 +57,8 @@ on:
       - ggml/include/ggml-metal.h
       - ggml/include/ggml-sycl.h
       - ggml/include/ggml-vulkan.h
+      - ggml/include/ggml-blas.h
+      - scripts/get-flags.mk
       - examples/dr_wav.h
 
 jobs:

diff --git a/bindings/ruby/.gitignore b/bindings/ruby/.gitignore
@@ -1,4 +1,3 @@
-README.md
 LICENSE
 pkg/
 lib/whisper.*
diff --git a/bindings/ruby/README.md b/bindings/ruby/README.md
@@ -0,0 +1,110 @@
+whispercpp
+==========
+
+![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg)
+
+Ruby bindings for [whisper.cpp][], an interface of automatic speech recognition model.
+
+Installation
+------------
+
+Install the gem and add to the application's Gemfile by executing:
+
+    $ bundle add whispercpp
+
+If bundler is not being used to manage dependencies, install the gem by executing:
+
+    $ gem install whispercpp
+
+Usage
+-----
+
+```ruby
+require "whisper"
+
+whisper = Whisper::Context.new("path/to/model.bin")
+
+params = Whisper::Params.new
+params.language = "en"
+params.offset = 10_000
+params.duration = 60_000
+params.max_text_tokens = 300
+params.translate = true
+params.print_timestamps = false
+
+whisper.transcribe("path/to/audio.wav", params) do |whole_text|
+  puts whole_text
+end
+
+```
+
+### Preparing model ###
+
+Use script to download model file(s):
+
+```bash
+git clone https://github.com/ggerganov/whisper.cpp.git
+cd whisper.cpp
+sh ./models/download-ggml-model.sh base.en
+```
+
+There are some types of models. See [models][] page for details.
+
+### Preparing audio file ###
+
+Currently, whisper.cpp accepts only 16-bit WAV files.
+
+### API ###
+
+Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:
+
+```ruby
+def format_time(time_ms)
+  sec, decimal_part = time_ms.divmod(1000)
+  min, sec = sec.divmod(60)
+  hour, min = min.divmod(60)
+  "%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
+end
+
+whisper.transcribe("path/to/audio.wav", params)
+
+whisper.each_segment.with_index do |segment, index|
+  line = "[%{nth}: %{st} --> %{ed}] %{text}" % {
+    nth: index + 1,
+    st: format_time(segment.start_time),
+    ed: format_time(segment.end_time),
+    text: segment.text
+  }
+  line << " (speaker turned)" if segment.speaker_next_turn?
+  puts line
+end
+
+```
+
+You can also add hook to params called on new segment:
+
+```ruby
+def format_time(time_ms)
+  sec, decimal_part = time_ms.divmod(1000)
+  min, sec = sec.divmod(60)
+  hour, min = min.divmod(60)
+  "%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
+end
+
+# Add hook before calling #transcribe
+params.on_new_segment do |segment|
+  line = "[%{st} --> %{ed}] %{text}" % {
+    st: format_time(segment.start_time),
+    ed: format_time(segment.end_time),
+    text: segment.text
+  }
+  line << " (speaker turned)" if segment.speaker_next_turn?
+  puts line
+end
+
+whisper.transcribe("path/to/audio.wav", params)
+
+```
+
+[whisper.cpp]: https://github.com/ggerganov/whisper.cpp
+[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models
diff --git a/bindings/ruby/Rakefile b/bindings/ruby/Rakefile
@@ -5,17 +5,16 @@ require "yaml"
 require "rake/testtask"
 
 extsources = YAML.load_file("extsources.yaml")
-extsources.each_pair do |src_dir, dests|
-  dests.each do |dest|
-    src = Pathname(src_dir)/File.basename(dest)
-
-    file src
-    file dest => src do |t|
-      cp t.source, t.name
-    end
+SOURCES = FileList[]
+extsources.each do |src|
+  basename = src.pathmap("%f")
+  dest = basename == "LICENSE" ? basename : basename.pathmap("ext/%f")
+  file src
+  file dest => src do |t|
+    cp t.source, t.name
   end
+  SOURCES.include dest
 end
-SOURCES = extsources.values.flatten
 CLEAN.include SOURCES
 CLEAN.include FileList[
                 "ext/*.o",

diff --git a/bindings/ruby/ext/.gitignore b/bindings/ruby/ext/.gitignore
@@ -11,6 +11,10 @@ ggml-backend.c
 ggml-backend.h
 ggml-common.h
 ggml-cpu-impl.h
+ggml-metal.m
+ggml-metal.metal
+ggml-metal-embed.metal
+ggml-blas.cpp
 ggml-cuda.h
 ggml-impl.h
 ggml-kompute.h
@@ -20,9 +24,12 @@ ggml-quants.c
 ggml-quants.h
 ggml-sycl.h
 ggml-vulkan.h
+ggml-blas.h
+get-flags.mk
 whisper.cpp
 whisper.h
 dr_wav.h
+depend
 whisper.bundle
 whisper.so
 whisper.dll