Skip to content

🎤 The easiest way to transcribe audio in Swift

License

Notifications You must be signed in to change notification settings

RyosukeFukatani/SwiftWhisper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SwiftWhisper

The easiest way to use Whisper in Swift

Easily add transcription to your app or package. Powered by whisper.cpp.

Install

Swift Package Manager

Add SwiftWhisper as a dependency in your Package.swift file:

let package = Package(
  ...
  dependencies: [
    // Add the package to your dependencies
    .package(url: "https://github.com/exPHAT/SwiftWhisper.git", branch: "master"),
  ],
  ...
  targets: [
    // Add SwiftWhisper as a dependency on any target you want to use it in
    .target(name: "MyTarget",
            dependencies: [.byName(name: "SwiftWhisper")])
  ]
  ...
)

Xcode

Add https://github.com/exPHAT/SwiftWhisper.git in the "Swift Package Manager" tab.

Usage

API Documentation.

import SwiftWhisper

let whisper = Whisper(fromFileURL: /* Model file URL */)
let segments = try await whisper.transcribe(audioFrames: /* 16kHz PCM audio frames */)

print("Transcribed audio:", segments.map(\.text).joined())

Delegate methods

You can subscribe to segments, transcription progress, and errors by implementing WhisperDelegate and setting whisper.delegate = ...

protocol WhisperDelegate {
  // Progress updates as a percentage from 0-1
  func whisper(_ aWhisper: Whisper, didUpdateProgress progress: Double)

  // Any time a new segments of text have been transcribed
  func whisper(_ aWhisper: Whisper, didProcessNewSegments segments: [Segment], atIndex index: Int)
  
  // Finished transcribing, includes all transcribed segments of text
  func whisper(_ aWhisper: Whisper, didCompleteWithSegments segments: [Segment])

  // Error with transcription
  func whisper(_ aWhisper: Whisper, didErrorWith error: Error)
}

Misc

Downloading Models 📥

You can find the pre-trained models here for download.

CoreML Support 🧠

To use CoreML, you'll need to include a CoreML model file with the suffix -encoder.mlmodelc under the same name as the whisper model (Example: tiny.bin would also sit beside a tiny-encoder.mlmodelc file). In addition to the additonal model file, you will also need to use the Whisper(fromFileURL:) initializer. You can verify CoreML is active by checking the console output during transcription.

Converting audio to 16kHz PCM 🔧

The easiest way to get audio frames into SwiftWhisper is to use AudioKit. The following example takes an input audio file, converts and resamples it, and returns an array of 16kHz PCM floats.

import AudioKit

func convertAudioFileToPCMArray(fileURL: URL, completionHandler: @escaping (Result<[Float], Error>) -> Void) {
    var options = FormatConverter.Options()
    options.format = .wav
    options.sampleRate = 16000
    options.bitDepth = 16
    options.channels = 1
    options.isInterleaved = false

    let tempURL = URL(fileURLWithPath: NSTemporaryDirectory()).appendingPathComponent(UUID().uuidString)
    let converter = FormatConverter(inputURL: fileURL, outputURL: tempURL, options: options)
    converter.start { error in
        if let error {
            completionHandler(.failure(error))
            return
        }

        let data = try! Data(contentsOf: tempURL) // Handle error here

        let floats = stride(from: 44, to: data.count, by: 2).map {
            return data[$0..<$0 + 2].withUnsafeBytes {
                let short = Int16(littleEndian: $0.load(as: Int16.self))
                return max(-1.0, min(Float(short) / 32767.0, 1.0))
            }
        }

        try? FileManager.default.removeItem(at: tempURL)

        completionHandler(.success(floats))
    }
}

Development speed boost 🚀

You may find the performance of the transcription slow when compiling your app for the Debug build configuration. This is because the compiler doesn't fully optimize SwiftWhisper unless the build configuration is set to Release.

You can get around this by installing a version of SwiftWhisper that uses .unsafeFlags(["-O3"]) to force maximum optimization. The easiest way to do this is to use the latest commit on the fast branch. Alternatively, you can configure your scheme to build in the Release configuration.

  ...
  dependencies: [
    // Using latest commit hash for `fast` branch:
    .package(url: "https://github.com/exPHAT/SwiftWhisper.git", revision: "deb1cb6a27256c7b01f5d3d2e7dc1dcc330b5d01"),
  ],
  ...

About

🎤 The easiest way to transcribe audio in Swift

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Swift 100.0%