Skip to content

Commit

Permalink
docs: update connector documentation (#1136)
Browse files Browse the repository at this point in the history
  • Loading branch information
sarahwooders authored Mar 12, 2024
1 parent 225564d commit 994c015
Showing 1 changed file with 48 additions and 43 deletions.
91 changes: 48 additions & 43 deletions docs/data_sources.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,18 @@ MemGPT supports pre-loading data into archival memory. In order to made data acc

You can view available data sources with:

```sh
```sh CLI
memgpt list sources
```
```python Python
from memgpt import create_client

# Connect to the server as a user
client = create_client()

# List data source names that belong to user
client.list_sources()
```

```sh
+----------------+----------+----------+
Expand All @@ -28,23 +37,30 @@ The `Agents` column indicates which agents have access to the data, while `Locat

### Attaching data to agents

Attaching a data source to your agent loads the data into your agent's archival memory to access. You can attach data to your agent in two ways:
Attaching a data source to your agent loads the data into your agent's archival memory to access.

*[Option 1]* From the CLI, run:

```sh
memgpt attach --agent <AGENT-NAME> --data-source <DATA-SOURCE-NAME>
```

*[Option 2]* While chatting with the agent, enter the `/attach` command and select the data source

```sh
```sh CLI
memgpt run
...
> Enter your message: /attach
? Select data source (Use arrow keys)
» short-stories
arxiv
memgpt-docs
```
```python Python
from memgpt import create_client

# Connect to the server as a user
client = create_client()

# Create an agent
agent = client.create_agent()

# Attach a source to an agent
client.attach_source_to_agent(source_name="short-storie", agent_id=agent.id)
```

> 👍 Hint
> To encourage your agent to reference its archival memory, we recommend adding phrases like "_search your archival memory..._" for the best results.
Expand All @@ -57,47 +73,36 @@ You can load a file, list of files, or directly into MemGPT with the following c
memgpt load directory --name <NAME> \
[--input-dir <DIRECTORY>] [--input-files <FILE1> <FILE2>...] [--recursive]
```
```python Python
from memgpt import create_client

### Loading a database dump
# Connect to the server as a user
client = create_client()

You can load database into MemGPT, either from a database dump or a database connection, with the following command:
# Create a data source
source = client.create_source(name="example_source")

```sh
memgpt load database --name <NAME> \
--query <QUERY> \ # Query to run on database to get data
--dump-path <PATH> \ # Path to dump file
--scheme <SCHEME> \ # Database scheme
--host <HOST> \ # Database host
--port <PORT> \ # Database port
--user <USER> \ # Database user
--password <PASSWORD> \ # Database password
--dbname <DB_NAME> # Database name
# Add file data into a source
client.load_file_into_source(filename=filename, source_id=source.id)
```

### Loading a vector database
### Loading with custom connectors
You can implement your own data connectors in MemGPT, and use them to load data into data sources:

If you already have a vector database containing passages and embeddings, you can load them into MemGPT by specifying the table name, database URI, and the columns containing the passage text and embeddings.

```sh
memgpt load vector-database --name <NAME> \
--uri <URI> \ # Database URI
--table_name <TABLE-NAME> \ # Name of table containing data
--text_column <TEXT-COL> \ # Name of column containing text
--embedding_column <EMBEDDING-COL> # Name of column containing embedding
```
```python Python
from memgpt.data_sources.connectors import DataConnector

Since embeddings are already provided, MemGPT will not re-compute the embeddings.
class DummyDataConnector(DataConnector):
"""Fake data connector for texting which yields document/passage texts from a provided list"""

### Loading a LlamaIndex dump
def __init__(self, texts: List[str]):
self.texts = texts

If you have a Llama Index `VectorIndex` which was saved to disk, you can load it into MemGPT by specifying the directory the index was saved to:
def generate_documents(self) -> Iterator[Tuple[str, Dict]]:
for text in self.texts:
yield text, {"metadata": "dummy"}

```sh
memgpt load index --name <NAME> --dir <INDEX-DIR>
def generate_passages(self, documents: List[Document], chunk_size: int = 1024) -> Iterator[Tuple[str | Dict]]:
for doc in documents:
yield doc.text, doc.metadata
```

Since Llama Index will have already computing embeddings, MemGPT will not re-compute embeddings.

### Loading other types of data

We highly encourage contributions for new data sources, which can be added as a new [CLI data load command](https://github.com/cpacker/MemGPT/blob/main/memgpt/cli/cli_load.py). We recommend checking for [Llama Index connectors](https://gpt-index.readthedocs.io/en/v0.6.3/how_to/data_connectors.html) that may support ingesting the data you're interested in loading.

0 comments on commit 994c015

Please sign in to comment.