Suppose you are implementing RAG for your AI app, and you have used web APIs or an inference engine to generate a large number of embeddings, and now you need to find out the best results matching a query embedding, what do you do?
Use a vector database? No, you only need a few lines of JavaScript.
MLX is a full-featured machine learning framework, with easy-to-understand source code and small binary sizes. And node-mlx is the JavaScript binding of it.
MLX only has GPU support for macOS, and but its CPU support, implemented with vectorized instructions, is still fast on Linux.
import {core as mx, nn} from '@frost-beta/mlx';
Suppose you want to find out the results with highest similaries to query
,
from the embeddings
.
const embeddings = [
[ 0.037035, 0.0760545, ... ],
[ 0.034029, -0.0227216, ... ],
...
[ -0.028612, 0.0052857, ... ],
];
const query = [ -0.019773, 0.006021, ... ];
With node-mlx, you can use the builtin nn.losses.cosineSimilarityLoss
API to
do the search.
const embeddingsTensor = mx.array(embeddings);
const queryTensor = mx.array([ query ]);
const scoresTensor = nn.losses.cosineSimilarityLoss(queryTensor, embeddingsTensor);
const scores: Float32Array = scoresTensor.toTypedArray();
The scores
array stores the
cosine similarities
between the query
and embeddings
.
(If you are wondering how we can compute cosine similarities between a 1x1 tensor and a 1xN tensor, it is called broadcasting.)
console.log(scores);
// [ 0.233244, 0.012492, ..., 0.43298 ]
Once you get the scores
array, you can use the usual JavScript code to filter
and sort the results. But you can also use MLX if the number of results is large
enough to make JavScript engine struggle.
// Get the top 10 scores.
const topTen = mx.topk(scoresTensor, 10);
console.log(topTen.toTypedArray());
// [ 0.894323, 0.594729, ... ]
// Sort the scores.
let sortedScores = mx.sort(scoresTensor);
console.log(sortedScores.toTypedArray());
// [ 0.01287, 0.1502876, ... ]
sortedScores = sortedScores.index(mx.Slice(null, null, -1));
console.log(sortedScores.toTypedArray());
// [ 0.894323, 0.594729, ... ]
// Get the indices of the scores ordered by their values in the array.
const indices = mx.argsort(scoresTensor)
.index(mx.Slice(null, null, -1))
.toTypedArray();
console.log(indices);
// [ 8, 9, ... ]
console.log(indices.map(i => scores[i]));
// [ 0.894323, 0.594729, ... ]
The array.index(mx.Slice(null, null, -1))
code looks alien, it is actually the
JavaScript version of Python's array[::-1]
, which reverse the array. You can
of course convert the result to JavaScript Array frist and then call
reverse()
, but it would be slower if the array is very large.
If after reading above introductions you still find MXL cumbersome to use (which is normal if you had zero experience with NumPy or PyTorch), I have wrapped the code into a very simple Node.js module, which you can use to replace vector databases in many cases.
Install:
npm install not-a-vector-database
APIs:
export type EmbeddingInput = number[] | TypedArray;
export interface SearchOptions {
/**
* Results with scores larger than this value will be returned. Should be
* between -1 and 1. Default is 0.
*/
minimumScore?: number;
/**
* Restrict the number of results to return, default is 16.
*/
maximumResults?: number;
}
export interface SearchResult {
score: number;
data: unknown;
}
/**
* In-memory storage of embeddings and associated data.
*/
export class Storage {
embeddings?: mx.array;
data: unknown[];
/**
* Initialize from the buffer.
*/
loadFromBuffer(buffer: Buffer): void;
/**
* Dump the data to a buffer.
*/
dumpToBuffer(): Buffer;
/**
* Add data to the storage.
*/
push(...items: {embedding: EmbeddingInput; data: unknown;}[]): void;
/**
* Return the data which are most relevant to the embedding.
*/
search(embedding: EmbeddingInput, options?: SearchOptions): SearchResult[];
}
Example:
import {Storage} from 'not-a-vector-database';
const storage = new Stroage();
storage.push({embedding, data: 'some data'});
const results = storage.search(embedding);
fs.writeFileSync('storage.bser', storage.dumpToBuffer());
storage.loadFromBuffer(fs.readFileSync('storage.bser'));
There is also a benchmark.ts
script that you can use to test the performance.
On my 2018 Intel MacBook Pro which has no GPU support, searching from 1 million
embeddings with size of 128 takes about 900ms.