example error #35

abdoelsayed2016 · 2023-06-06T23:39:36Z

I get an error while trying to use llama for embedding in your example.

embedding_function=self._embedding_function.embed_documents
AttributeError: 'function' object has no attribute 'embed_documents'


from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

import transformers

base_model= "yahma/llama-7b-hf"


model = transformers.AutoModelForCausalLM.from_pretrained(base_model)
tokenizer = transformers.AutoTokenizer.from_pretrained(base_model,model_max_length=512,padding_side="right",use_fast=False)
def embeddings(prompt_request: EmbeddingRequest):
    params = {"prompt": prompt_request.prompt}
    print("Received prompt: ", params["prompt"])
    output = get_embeddings(model, tokenizer, params["prompt"])
    return {"response": [float(x) for x in output]}

def get_embeddings(model, tokenizer, prompt):
    input_ids = tokenizer(prompt).input_ids
    input_embeddings = model.get_input_embeddings()
    embeddings = input_embeddings(torch.LongTensor([input_ids]))
    mean = torch.mean(embeddings[0], 0).cpu().detach()
    return mean

with open("german.txt") as f:
    book = f.read()
    
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(book)
docsearch = Chroma.from_texts(
    texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))]
)


while True:
    query = input("Type your search: ")
    docs = docsearch.similarity_search_with_score(query, k=1)
    for doc in docs:
        print(doc)

The text was updated successfully, but these errors were encountered:

paolorechia · 2023-06-07T14:39:43Z

Hi, @abdoelsayed2016.

Did you try to start from this example instead (using Hugging Face embeddings)?
https://github.com/paolorechia/learn-langchain/blob/main/langchain_app/agents/answer_about_germany.py

These example input embeddings from Vicuna perform really poorly (same for output embeddings), I don't recommend using either.

If you really want to use these embeddings, you need to implement a custom embeddings class for langchain. Try modifying this class that I wrote in the Medium article:

class VicunaEmbeddings(BaseModel, Embeddings):
    def _call(self, prompt: str) -> str:
        p = prompt.strip()
        print("Sending prompt ", p)
        response = requests.post(
            "http://127.0.0.1:8000/embedding",
            json={
                "prompt": p,
            },
        )
        response.raise_for_status()
        return response.json()["response"]

    def embed_documents(
        self, texts: List[str], chunk_size: Optional[int] = 0
    ) -> List[List[float]]:
        """Call out to Vicuna's server embedding endpoint for embedding search docs.

        Args:
            texts: The list of texts to embed.
            chunk_size: The chunk size of embeddings. If None, will use the chunk size
                specified by the class.

        Returns:
            List of embeddings, one for each text.
        """
        results = []
        for text in texts:
            response = self.embed_query(text)
            results.append(response)
        return results

    def embed_query(self, text) -> List[float]:
        """Call out to Vicuna's server embedding endpoint for embedding query text.

        Args:
            text: The text to embed.

        Returns:
            Embedding for the text.
        """
        embedding = self._call(text)
        return embedding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example error #35

example error #35

abdoelsayed2016 commented Jun 6, 2023

paolorechia commented Jun 7, 2023

example error #35

example error #35

Comments

abdoelsayed2016 commented Jun 6, 2023

paolorechia commented Jun 7, 2023