diff --git a/docs/data_sources.md b/docs/data_sources.md index 63208310c7..8c9648d046 100644 --- a/docs/data_sources.md +++ b/docs/data_sources.md @@ -10,9 +10,18 @@ MemGPT supports pre-loading data into archival memory. In order to made data acc You can view available data sources with: -```sh +```sh CLI memgpt list sources ``` +```python Python +from memgpt import create_client + +# Connect to the server as a user +client = create_client() + +# List data source names that belong to user +client.list_sources() +``` ```sh +----------------+----------+----------+ @@ -28,23 +37,30 @@ The `Agents` column indicates which agents have access to the data, while `Locat ### Attaching data to agents -Attaching a data source to your agent loads the data into your agent's archival memory to access. You can attach data to your agent in two ways: +Attaching a data source to your agent loads the data into your agent's archival memory to access. -*[Option 1]* From the CLI, run: -```sh -memgpt attach --agent --data-source -``` - -*[Option 2]* While chatting with the agent, enter the `/attach` command and select the data source - -```sh +```sh CLI +memgpt run +... > Enter your message: /attach ? Select data source (Use arrow keys) ยป short-stories arxiv memgpt-docs ``` +```python Python +from memgpt import create_client + +# Connect to the server as a user +client = create_client() + +# Create an agent +agent = client.create_agent() + +# Attach a source to an agent +client.attach_source_to_agent(source_name="short-storie", agent_id=agent.id) +``` > ๐Ÿ‘ Hint > To encourage your agent to reference its archival memory, we recommend adding phrases like "_search your archival memory..._" for the best results. @@ -57,47 +73,36 @@ You can load a file, list of files, or directly into MemGPT with the following c memgpt load directory --name \ [--input-dir ] [--input-files ...] [--recursive] ``` +```python Python +from memgpt import create_client -### Loading a database dump +# Connect to the server as a user +client = create_client() -You can load database into MemGPT, either from a database dump or a database connection, with the following command: +# Create a data source +source = client.create_source(name="example_source") -```sh -memgpt load database --name \ - --query \ # Query to run on database to get data - --dump-path \ # Path to dump file - --scheme \ # Database scheme - --host \ # Database host - --port \ # Database port - --user \ # Database user - --password \ # Database password - --dbname # Database name +# Add file data into a source +client.load_file_into_source(filename=filename, source_id=source.id) ``` -### Loading a vector database +### Loading with custom connectors +You can implement your own data connectors in MemGPT, and use them to load data into data sources: -If you already have a vector database containing passages and embeddings, you can load them into MemGPT by specifying the table name, database URI, and the columns containing the passage text and embeddings. - -```sh -memgpt load vector-database --name \ - --uri \ # Database URI - --table_name \ # Name of table containing data - --text_column \ # Name of column containing text - --embedding_column # Name of column containing embedding -``` +```python Python +from memgpt.data_sources.connectors import DataConnector -Since embeddings are already provided, MemGPT will not re-compute the embeddings. +class DummyDataConnector(DataConnector): + """Fake data connector for texting which yields document/passage texts from a provided list""" -### Loading a LlamaIndex dump + def __init__(self, texts: List[str]): + self.texts = texts -If you have a Llama Index `VectorIndex` which was saved to disk, you can load it into MemGPT by specifying the directory the index was saved to: + def generate_documents(self) -> Iterator[Tuple[str, Dict]]: + for text in self.texts: + yield text, {"metadata": "dummy"} -```sh -memgpt load index --name --dir + def generate_passages(self, documents: List[Document], chunk_size: int = 1024) -> Iterator[Tuple[str | Dict]]: + for doc in documents: + yield doc.text, doc.metadata ``` - -Since Llama Index will have already computing embeddings, MemGPT will not re-compute embeddings. - -### Loading other types of data - -We highly encourage contributions for new data sources, which can be added as a new [CLI data load command](https://github.com/cpacker/MemGPT/blob/main/memgpt/cli/cli_load.py). We recommend checking for [Llama Index connectors](https://gpt-index.readthedocs.io/en/v0.6.3/how_to/data_connectors.html) that may support ingesting the data you're interested in loading.