Skip to content

Run embedding models locally in Swift using MLTensor.

License

Notifications You must be signed in to change notification settings

jkrukowski/swift-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

swift-embeddings

Run embedding models locally in Swift using MLTensor. Inspired by mlx-embeddings.

Supported Models Archictectures

BERT (Bidirectional Encoder Representations from Transformers)

Some of the supported models on Hugging Face:

NOTE: google-bert/bert-base-uncased is supported but weightKeyTransform must be provided:

let modelBundle = try await Bert.loadModelBundle(from: modelId, weightKeyTransform: Bert.googleWeightsKeyTransform)

XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)

Some of the supported models on Hugging Face:

CLIP (Contrastive Language–Image Pre-training)

NOTE: only text encoding is supported for now. Some of the supported models on Hugging Face:

Word2Vec

NOTE: it's a word embedding model. It loads and keeps the whole model in memory. For the more memory efficient solution, you might want to use SQLiteVec. Some of the supported models on Hugging Face:

Installation

Add the following to your Package.swift file. In the package dependencies add:

dependencies: [
    .package(url: "https://github.com/jkrukowski/swift-embeddings", from: "0.0.7")
]

In the target dependencies add:

dependencies: [
    .product(name: "Embeddings", package: "swift-embeddings")
]

Usage

Encoding

import Embeddings

// load model and tokenizer from Hugging Face
let modelBundle = try await Bert.loadModelBundle(
    from: "sentence-transformers/all-MiniLM-L6-v2"
)

// encode text
let encoded = modelBundle.encode("The cat is black")
let result = await encoded.cast(to: Float.self).shapedArray(of: Float.self).scalars

// print result
print(result)

Batch Encoding

import Embeddings
import MLTensorUtils

let texts = [
    "The cat is black",
    "The dog is black",
    "The cat sleeps well"
]
let modelBundle = try await Bert.loadModelBundle(
    from: "sentence-transformers/all-MiniLM-L6-v2"
)
let encoded = modelBundle.batchEncode(texts)
let distance = cosineDistance(encoded, encoded)
let result = await distance.cast(to: Float.self).shapedArray(of: Float.self).scalars
print(result)

Command Line Demo

To run the command line demo, use the following command:

swift run embeddings-cli <subcommand> [--model-id <model-id>] [--model-file <model-file>] [--text <text>] [--max-length <max-length>]

Subcommands:

bert                    Encode text using BERT model
clip                    Encode text using CLIP model
xlm-roberta             Encode text using XLMRoberta model
word2vec                Encode word using Word2Vec model

Command line options:

--model-id <model-id>                       Id of the model to use
--model-file <model-file>                   Path to the model file (only for `Word2Vec`)
--text <text>                               Text to encode
--max-length <max-length>                   Maximum length of the input (not for `Word2Vec`)
-h, --help                                  Show help information.

Code Formatting

This project uses swift-format. To format the code run:

swift format . -i -r --configuration .swift-format

Acknowledgements

This project is based on and uses some of the code from: