Skip to content

Commit

Permalink
Add encodings folder to azure-ai-resources (Azure#34228)
Browse files Browse the repository at this point in the history
* add encodings

* include encodings folder

* use local embeddings

* use local cache dirs

* ignore unknown words in encodings files
  • Loading branch information
jingyizhu99 authored Feb 12, 2024
1 parent 84f4f2e commit 5e4e2ed
Show file tree
Hide file tree
Showing 8 changed files with 150,268 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -235,9 +235,11 @@ def _embed(self, texts: List[str]) -> List[List[float]]:
"""Embed the given texts."""
import numpy as np
import tiktoken
from azure.ai.generative.index._utils.tokens import tiktoken_cache_dir

try:
encoding = tiktoken.encoding_for_model(self.model)
with tiktoken_cache_dir():
encoding = tiktoken.encoding_for_model(self.model)
except KeyError:
logger.warning("Warning: model not found. Using cl100k_base encoding.")
model = "cl100k_base"
Expand Down
1 change: 1 addition & 0 deletions sdk/ai/azure-ai-resources/MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ include *.md
include azure/__init__.py
include azure/ai/__init__.py
include azure/ai/resources/py.typed
include azure/ai/resources/_index/_utils/encodings/*
include azure/ai/common/operations/component-configs/*
Original file line number Diff line number Diff line change
Expand Up @@ -222,9 +222,11 @@ def _embed(self, texts: List[str]) -> List[List[float]]:
"""Embed the given texts."""
import numpy as np
import tiktoken
from azure.ai.resources._index._utils.tokens import tiktoken_cache_dir

try:
encoding = tiktoken.encoding_for_model(self.model)
with tiktoken_cache_dir():
encoding = tiktoken.encoding_for_model(self.model)
except KeyError:
logger.warning("Warning: model not found. Using cl100k_base encoding.")
model = "cl100k_base"
Expand Down

Large diffs are not rendered by default.

Loading

0 comments on commit 5e4e2ed

Please sign in to comment.