-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CrateDB: Documentation about Vector Store, Document Loader, and Memory
- Loading branch information
Showing
6 changed files
with
1,430 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,4 +4,5 @@ node_modules/ | |
|
||
.docusaurus | ||
.cache-loader | ||
docs/api | ||
docs/api | ||
example.sqlite |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,276 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# CrateDB Document Loader\n", | ||
"\n", | ||
"> [CrateDB] is capable of performing both vector and lexical search.\n", | ||
"> It is built on top of the Apache Lucene library, talks SQL,\n", | ||
"> is PostgreSQL-compatible, and scales like Elasticsearch.\n", | ||
"\n", | ||
"This notebook covers how to get started with the CrateDB document loader.\n", | ||
"\n", | ||
"The CrateDB document loader is based on [SQLAlchemy], and uses LangChain's\n", | ||
"SQLDatabaseLoader. It loads the result of a database query with one document\n", | ||
"per row.\n", | ||
"\n", | ||
"[CrateDB]: https://github.com/crate/crate\n", | ||
"[SQLAlchemy]: https://www.sqlalchemy.org/\n", | ||
"\n", | ||
"## Overview\n", | ||
"\n", | ||
"The `CrateDBLoader` class helps you get your unstructured content from CrateDB\n", | ||
"into LangChain's `Document` format.\n", | ||
"\n", | ||
"You must provide an SQLAlchemy-compatible connection string, and a query\n", | ||
"expression in SQL format. \n", | ||
"\n", | ||
"### Integration details\n", | ||
"\n", | ||
"| Class | Package | Local | Serializable | JS support|\n", | ||
"|:-----------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------| :---: | :---: | :---: |\n", | ||
"| [CrateDBLoader](https://python.langchain.com/api_reference/cratedb/document_loaders/langchain_cratedb.document_loaders.cratedb.CrateDBLoader.html) | [langchain_box](https://python.langchain.com/api_reference/cratedb/index.html) | ✅ | ❌ | ❌ | \n", | ||
"### Loader features\n", | ||
"| Source | Document Lazy Loading | Async Support\n", | ||
"| :---: | :---: | :---: | \n", | ||
"| CrateDBLoader | ✅ | ❌ | \n", | ||
"\n", | ||
"## Setup\n", | ||
"\n", | ||
"You can run CrateDB Community Edition on your premises, or you can use CrateDB Cloud.\n", | ||
"\n", | ||
"### Credentials\n", | ||
"\n", | ||
"You will supply credentials through a regular SQLAlchemy connection string, like\n", | ||
"`crate://username:[email protected]/`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Installation\n", | ||
"\n", | ||
"Install the **langchain-community** and **sqlalchemy-cratedb** packages." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install -qU langchain-community sqlalchemy-cratedb" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Initialization\n", | ||
"\n", | ||
"Now, initialize the loader and start loading documents. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_community.document_loaders import CrateDBLoader\n", | ||
"\n", | ||
"loader = CrateDBLoader(\"SELECT * FROM sys.summits\", url=\"crate://crate@localhost/\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"source": "## Load" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"documents = loader.load()\n", | ||
"print(documents)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "## Lazy Load\n" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"page = []\n", | ||
"for doc in loader.lazy_load():\n", | ||
" page.append(doc)\n", | ||
" if len(page) >= 10:\n", | ||
" # do some paged operation, e.g.\n", | ||
" # index.upsert(page)\n", | ||
"\n", | ||
" page = []" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## API reference\n", | ||
"\n", | ||
"For detailed documentation of all PyMuPDFLoader features and configurations head to the API reference: https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"source": [ | ||
"## Tutorial\n", | ||
"\n", | ||
"### Populate database." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!crash < ./example_data/mlb_teams_2012.sql\n", | ||
"!crash --command \"REFRESH TABLE mlb_teams_2012;\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"source": "### Usage" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from pprint import pprint\n", | ||
"\n", | ||
"from langchain.document_loaders import CrateDBLoader\n", | ||
"\n", | ||
"CONNECTION_STRING = \"crate://crate@localhost/\"\n", | ||
"\n", | ||
"loader = CrateDBLoader(\n", | ||
" 'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n", | ||
" url=CONNECTION_STRING,\n", | ||
")\n", | ||
"documents = loader.load()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"pprint(documents)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "### Specifying Which Columns are Content vs Metadata" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"loader = CrateDBLoader(\n", | ||
" 'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n", | ||
" url=CONNECTION_STRING,\n", | ||
" page_content_columns=[\"Team\"],\n", | ||
" metadata_columns=[\"Payroll (millions)\"],\n", | ||
")\n", | ||
"documents = loader.load()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pprint(documents)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": "### Adding Source to Metadata" | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"loader = CrateDBLoader(\n", | ||
" 'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n", | ||
" url=CONNECTION_STRING,\n", | ||
" source_columns=[\"Team\"],\n", | ||
")\n", | ||
"documents = loader.load()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pprint(documents)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.6" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
1 change: 1 addition & 0 deletions
1
docs/docs/integrations/document_loaders/example_data/mlb_teams_2012.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.