docs: update connector documentation (#1136)

letta-ai · Mar 12, 2024 · 994c015 · 994c015
1 parent 225564d
commit 994c015
Showing 1 changed file with 48 additions and 43 deletions.
diff --git a/docs/data_sources.md b/docs/data_sources.md
@@ -10,9 +10,18 @@ MemGPT supports pre-loading data into archival memory. In order to made data acc
 
 You can view available data sources with:
 
-```sh
+```sh CLI
 memgpt list sources
 ```
+```python Python
+from memgpt import create_client
+
+# Connect to the server as a user
+client = create_client()
+
+# List data source names that belong to user
+client.list_sources()
+```
 
 ```sh
 +----------------+----------+----------+
@@ -28,23 +37,30 @@ The `Agents` column indicates which agents have access to the data, while `Locat
 
 ### Attaching data to agents
 
-Attaching a data source to your agent loads the data into your agent's archival memory to access. You can attach data to your agent in two ways:
+Attaching a data source to your agent loads the data into your agent's archival memory to access. 
 
-*[Option 1]* From the CLI, run:
 
-```sh
-memgpt attach --agent <AGENT-NAME> --data-source <DATA-SOURCE-NAME>
-```
-
-*[Option 2]*  While chatting with the agent, enter the `/attach` command and select the data source
-
-```sh
+```sh CLI
+memgpt run 
+...
 > Enter your message: /attach
 ? Select data source (Use arrow keys)
  » short-stories
    arxiv
    memgpt-docs
 ```
+```python Python
+from memgpt import create_client
+
+# Connect to the server as a user
+client = create_client()
+
+# Create an agent 
+agent = client.create_agent()
+
+# Attach a source to an agent 
+client.attach_source_to_agent(source_name="short-storie", agent_id=agent.id)
+```
 
 > 👍 Hint
 > To encourage your agent to reference its archival memory, we recommend adding phrases like "_search your archival memory..._" for the best results.
@@ -57,47 +73,36 @@ You can load a file, list of files, or directly into MemGPT with the following c
 memgpt load directory --name <NAME> \
     [--input-dir <DIRECTORY>] [--input-files <FILE1> <FILE2>...] [--recursive]
 ```
+```python Python
+from memgpt import create_client
 
-### Loading a database dump
+# Connect to the server as a user
+client = create_client()
 
-You can load database into MemGPT, either from a database dump or a database connection, with the following command:
+# Create a data source 
+source = client.create_source(name="example_source")
 
-```sh
-memgpt load database --name <NAME>  \
-    --query <QUERY> \ # Query to run on database to get data
-    --dump-path <PATH> \ # Path to dump file
-    --scheme <SCHEME> \ # Database scheme
-    --host <HOST> \ # Database host
-    --port <PORT> \ # Database port
-    --user <USER> \ # Database user
-    --password <PASSWORD> \ # Database password
-    --dbname <DB_NAME> # Database name
+# Add file data into a source 
+client.load_file_into_source(filename=filename, source_id=source.id)
 ```
 
-### Loading a vector database
+### Loading with custom connectors 
+You can implement your own data connectors in MemGPT, and use them to load data into data sources: 
 
-If you already have a vector database containing passages and embeddings, you can load them into MemGPT by specifying the table name, database URI, and the columns containing the passage text and embeddings.
-
-```sh
-memgpt load vector-database --name <NAME> \
-    --uri <URI> \ # Database URI
-    --table_name <TABLE-NAME> \ # Name of table containing data
-    --text_column <TEXT-COL> \ # Name of column containing text
-    --embedding_column <EMBEDDING-COL> # Name of column containing embedding
-```
+```python Python
+from memgpt.data_sources.connectors import DataConnector
 
-Since embeddings are already provided, MemGPT will not re-compute the embeddings.
+class DummyDataConnector(DataConnector):
+    """Fake data connector for texting which yields document/passage texts from a provided list"""
 
-### Loading a LlamaIndex dump
+    def __init__(self, texts: List[str]):
+        self.texts = texts
 
-If you have a Llama Index `VectorIndex` which was saved to disk, you can load it into MemGPT by specifying the directory the index was saved to:
+    def generate_documents(self) -> Iterator[Tuple[str, Dict]]:
+        for text in self.texts:
+            yield text, {"metadata": "dummy"}
 
-```sh
-memgpt load index --name <NAME> --dir <INDEX-DIR>
+    def generate_passages(self, documents: List[Document], chunk_size: int = 1024) -> Iterator[Tuple[str | Dict]]:
+        for doc in documents:
+            yield doc.text, doc.metadata
 ```
-
-Since Llama Index will have already computing embeddings, MemGPT will not re-compute embeddings.
-
-### Loading other types of data
-
-We highly encourage contributions for new data sources, which can be added as a new [CLI data load command](https://github.com/cpacker/MemGPT/blob/main/memgpt/cli/cli_load.py). We recommend checking for [Llama Index connectors](https://gpt-index.readthedocs.io/en/v0.6.3/how_to/data_connectors.html) that may support ingesting the data you're interested in loading.