-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cache embeddings (resolves #200) #208
Implement cache embeddings (resolves #200) #208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! 👍 You should fix the "bulk computing" problem.
We also need to add documentation for this and potentially enable the caching behavior to be controlled from the config. Let me know if you want me to make a first proposal for that.
Thanks for implementing this @Pouyanpi! 👍 |
ebe1653
to
3f11484
Compare
Thank you for your feedback, @drazvan . I've made the changes. Please let me know if you need further modifications. If you'd like, I can also update/add the documentation to reflect these updates. Could you please advise on which file I should update and the level of detail you'd prefer? |
@Pouyanpi: do you have availability to wrap this PR this week? We'd like to include it in the |
Sure @drazvan, I will have it done later today. |
nemoguardrails/embeddings/cache.py
Outdated
|
||
@classmethod | ||
def from_dict(cls, d): | ||
key_generator = d.get("key_generator", MD5KeyGenerator()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The d
dictionary here can contain strings rather than actual instances. This will be the case when called with the parameters data from esp_provider
. I think there should be a logic that translates from str
to actual instances. e.g.
if key_generator == "md5":
key_generator = MD5KeyGenerator()
@Pouyanpi : Also, let's make the configuration of the embeddings cache first-class citizen, like this: class EmbeddingsCache(BaseModel):
"""Configuration for the caching embeddings."""
enabled: bool = Field(
default=False,
description="Whether caching of the embeddings should be enabled or not.",
)
key_generator: str = Field(
default="md5",
description="The method to use for generating the cache keys.",
)
store: str = Field(
default="filesystem",
description="What type of store to use for the cached embeddings.",
)
store_config: Dict[str, Any] = Field(
default_factory=dict,
description="Any additional configuration options required for the store. "
"For example, path for `filesystem` or `host`/`port`/`db` for redis.",
)
class EmbeddingSearchProvider(BaseModel):
"""Configuration of a embedding search provider."""
name: str = Field(
default="default",
description="The name of the embedding search provider. If not specified, default is used.",
)
parameters: Dict[str, Any] = Field(default_factory=dict)
cache: EmbeddingsCache = Field(
default_factory=EmbeddingsCache,
description="Configuration option for whether to use a cache for embeddings or not.",
) And the configuration will be done as per:
|
And last but not least, have a look at this as well: #267. This makes the |
Add and parameters to the constructor to allow for optional caching of embeddings. This can improve performance by avoiding recomputation of embeddings.
Implement selective caching in the CacheEmbeddings class to maintain bulk computing behavior of embedding providers. The updated logic checks the cache before making a call and only processes uncached items, reducing unnecessary computations and API calls.
Add the method in the base class to allow custom embeddings to benefit from the @cache_embeddings decorator.
Update the initialization to use the configuration. This includes optional , , and parameters.
Added a new `EmbeddingsCacheConfig` class to support customizable embedding caching strategies, featuring options for enabling cache, cache key generation methods, storage types, and additional store-specific configurations. This update facilitates performance improvements by allowing the reuse of embeddings across tasks. Integrated the cache configuration into the `EmbeddingSearchProvider` class to streamline usage.
Introduced type annotations and 'name' attributes to key and cache store classes for consistent object creation. Added factory methods to KeyGenerator and CacheStore classes, simplifying instantiation through names. Refactored CacheEmbeddings to EmbeddingsCache, now able to convert its configuration to a dictionary and create instances from a dictionary or EmbeddingsCacheConfig. Updated `cache_embeddings` decorator to utilize a class's cache_config attribute, enabling asynchronous retrieval and caching of embeddings. Enhanced logging to display longer text snippets.
Updated the BasicEmbeddingsIndex to use a more flexible cache configuration setup. We replaced the bool flag and the CacheEmbeddings instance with a single unified `cache_config` parameter that can accept either a dictionary or an EmbeddingsCacheConfig instance for enhanced configurability. Additionally, the `_get_embeddings` method is now an asynchronous function to allow for non-blocking I/O operations per NVIDIA#267.
Enhanced the EmbeddingsIndex class by adding a 'cache_config' attribute for customizable cache management. Also, updated the '_get_embeddings' method to be asynchronous per NVIDIA#267.
Removed a direct dependency on `CacheEmbeddings` in the `LLMRails` class, streamlining the embedding cache setup. Configuration now relies on a generic `cache_config` pulled directly from the `esp_config`
Updated the documentation to reflect the new caching feature for embedding searches
34172ec
to
f30876b
Compare
@drazvan : I have implemented the changes based on your suggestions and have pushed the updates for review. Currently, the caching feature is disabled by default. Please let me know if this aligns with your expectations. Regarding the factory-related behavior of CacheStore and KeyGenerator, I acknowledge it might not be the "clean" approach. I have alternative solutions in mind and can share preliminary drafts if you're interested for future releases. However, I believe the current implementation is satisfactory for the moment. Feel free to review the modifications at your convenience. |
Signed-off-by: Razvan Dinu <[email protected]>
Summary
This PR introduces a caching mechanism for embeddings in the
BasicEmbeddingsIndex
class. It uses a decoratorcache_embeddings
to cache the results of the_get_embeddings
method. The caching mechanism includes key generation and cache storage, which are implemented using abstract base classes to allow for extensibility.Changes
cache_embeddings
decorator to cache the results of the_get_embeddings
method._get_embeddings
asynchronous.KeyGenerator
andCacheStore
abstract base classes for key generation and cache storage. (The base class also plays the role of a Factory)HashKeyGenerator
,MD5KeyGenerator
,InMemoryCacheStore
,FilesystemCacheStore
, andRedisCacheStore
as concrete implementations of the abstract base classes.BasicEmbeddingsIndex
to use the caching mechanism.Limitations
OpenAIEmbedding
model does not returnList[List[float]]
but ratherFuture Work
Remarks
An object that needs to use cache embedding must instantiate these two objects (otherwise default values are used). The
cache_embedding
decorator could be used on any method that accepts a list of texts and returns a list of embeddings. So one could have applied it to theEmbeddingModel
as well, but it seems more relevant to have it inBasicEmbeddingIndex
as it is instantiated from the configs within the project. Furthermore, TheCacheStore
andKeyGenerator
could be defined within the embedding config, the constructor could also accept a boolean to define whether to use caching at all.Issue
This PR resolves issue #200.
Example Usage
Using the openai embedding engine
using FastEmbed embedding engine
using in memory cache store