From f734fbde922fe6b2d4c1574f496543c5bb9ebb98 Mon Sep 17 00:00:00 2001 From: Chase McDougall Date: Fri, 1 Dec 2023 22:37:21 -0500 Subject: [PATCH] initial docs --- .github/contributing/INTEGRATIONS.md | 2 +- .../contributing/integrations/EMBEDDINGS.md | 21 +++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 .github/contributing/integrations/EMBEDDINGS.md diff --git a/.github/contributing/INTEGRATIONS.md b/.github/contributing/INTEGRATIONS.md index 69dfcaa3d401..2c97f813b45c 100644 --- a/.github/contributing/INTEGRATIONS.md +++ b/.github/contributing/INTEGRATIONS.md @@ -152,7 +152,7 @@ Below are links to guides with advice and tips for specific types of integration - [Vector stores](https://github.com/langchain-ai/langchainjs/blob/main/.github/contributing/integrations/VECTOR_STORES.md) (e.g. Pinecone) - [Persistent message stores](https://github.com/langchain-ai/langchainjs/blob/main/.github/contributing/integrations/MESSAGE_STORES.md) (used to persistently store and load raw chat histories, e.g. Redis) - [Document loaders](https://github.com/langchain-ai/langchainjs/blob/main/.github/contributing/integrations/DOCUMENT_LOADERS.md) (used to load documents for later storage into vector stores, e.g. Apify) -- Embeddings (TODO) (e.g. Cohere) +- [Embeddings](https://github.com/langchain-ai/langchainjs/blob/main/.github/contributing/integrations/EMBEDDINGS.md) (used to create embeddings of text documents or strings e.g. Cohere) - [Tools](https://github.com/langchain-ai/langchainjs/blob/main/.github/contributing/integrations/TOOLS.md) (used for agents, e.g. the SERP API tool) This is a living document, so please make a pull request if we're missing anything useful! diff --git a/.github/contributing/integrations/EMBEDDINGS.md b/.github/contributing/integrations/EMBEDDINGS.md new file mode 100644 index 000000000000..ba4d87a63b9a --- /dev/null +++ b/.github/contributing/integrations/EMBEDDINGS.md @@ -0,0 +1,21 @@ +# Contributing third-party Text Embeddings + +This page contains some specific guidelines and examples for contributing integrations with third-party Text Embedding providers. + +**Make sure you read the [general guidelines page](https://github.com/langchain-ai/langchainjs/blob/main/.github/contributing/INTEGRATIONS.md) first!** + +## Example PR + +We'll be referencing this PR adding Gradient Embeddings as an example: https://github.com/langchain-ai/langchainjs/pull/3475 + +## General ideas + +The general idea for adding new third-party Text Embeddings is to subclass the `Embeddings` class and implement the `embedDocuments` and `embedQuery` methods. + +The `embedDocuments` method should take a list of documents and return a list of embeddings for each document. The `embedQuery` method should take a query and return an embedding for that query. + +`embedQuery` can typically be implemented by calling `embedDocuments` with a list containing only the query. + +## Wrap Text Embeddings requests in this.caller + +The base Embeddings class contains an instance property called `caller` that will automatically handle retries, errors, timeouts, and more. You should wrap calls to the LLM in `this.caller.call` [as shown here](https://github.com/langchain-ai/langchainjs/blob/f469ec00d945a3f8421b32f4be78bce3f66a74bb/langchain/src/embeddings/gradient_ai.ts#L72)