Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Weaviate destination connector #1963

Merged
merged 47 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
94c12ff
initial commit for weaviate destination connector
rvztz Oct 31, 2023
96c465b
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Oct 31, 2023
1887ebe
added test scripts
rvztz Nov 1, 2023
5ab41a8
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 1, 2023
2f30293
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 2, 2023
20a7c58
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 3, 2023
8dc3f45
Adds check to validate the number of written pdf elemeents
rvztz Nov 3, 2023
6d56877
Fixes notes on cli interface. Adds batch_size as a cli command.
rvztz Nov 3, 2023
0a29da4
linting
rvztz Nov 3, 2023
9887907
Removes changes on cli interfaces Dict() type
rvztz Nov 3, 2023
ad7130e
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 4, 2023
008b732
Adds `DestinationConnectionError` to the weaviate dest connector
rvztz Nov 4, 2023
d87629c
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 7, 2023
8a0cc4b
Adds formatting logic to the `conform_dict`method on the weaviate con…
rvztz Nov 8, 2023
6c045be
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 8, 2023
062d4e4
pins weaviate python client version
rvztz Nov 8, 2023
bfe95ad
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 13, 2023
5d1a3f4
version pinning on weaviate-client
rvztz Nov 13, 2023
68b60aa
bumps weaviate dependency version on constraints
rvztz Nov 13, 2023
d2ad002
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 13, 2023
7968d3d
Modifies `test_unstructured.staging.test_weaviate.py::test_weaviate_s…
rvztz Nov 14, 2023
412aadf
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 14, 2023
63b9413
version bump
rvztz Nov 14, 2023
4d20445
removes s3 source
rvztz Nov 16, 2023
0c4955e
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 16, 2023
9d97829
adds chunking and embeddings flags to local source connector on `weav…
rvztz Nov 16, 2023
3039d7c
Fixes timestamp parsing
rvztz Nov 21, 2023
fe301ad
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 21, 2023
44b5b34
moves test-ingest-weaviate-output.py
rvztz Nov 21, 2023
96f10ab
Removes duplicated changelog entry, removes weaviate-client version p…
rvztz Nov 22, 2023
25fa86c
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 22, 2023
13477df
Adds docs for weaviate connector
rvztz Nov 22, 2023
3fb6037
remmoves unused var on ingest.sh example for weaviate
rvztz Nov 22, 2023
7d84b7d
removes unused vars from example script
rvztz Nov 22, 2023
ab9199d
Modifies weaviate connectotr docs. Removes re-declaration of batch_si…
rvztz Nov 22, 2023
453ecbf
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 22, 2023
dce9302
Copies local input connector parameters from mongodb.sh
rvztz Nov 22, 2023
afe85f9
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 23, 2023
bbe5c3d
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 28, 2023
54cbf9d
version bump
rvztz Nov 28, 2023
a1fe16d
Sets correct expected number of documents created
rvztz Nov 28, 2023
99ac633
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 28, 2023
6daa70d
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 29, 2023
1add053
Sets single client instance t WeaviateDestinationConnector level
rvztz Nov 30, 2023
372c997
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 30, 2023
ae8b98a
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 30, 2023
8816718
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Dec 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
## 0.11.1-dev4
## 0.11.1-dev5

### Enhancements

* **Batch Source Connector support** For instances where it is more optimal to read content from a source connector in batches, a new batch ingest doc is added which created multiple ingest docs after reading them in in batches per process.

### Features

* **Weaviate destination connector** Weaviate connector added to ingest CLI. Users may now use `unstructured-ingest` to write partitioned data from over 20 data sources (so far) to a Weaviate object collection.
* **Adds HubSpot connector** Adds connector to retrieve call, communications, emails, notes, products and tickets from HubSpot

### Fixes
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,10 @@ install-ingest-airtable:
install-ingest-sharepoint:
python3 -m pip install -r requirements/ingest/sharepoint.txt

.PHONY: install-ingest-weaviate
install-ingest-weaviate:
python3 -m pip install -r requirements/ingest/weaviate.txt

.PHONY: install-ingest-local
install-ingest-local:
echo "no unique dependencies for local connector"
Expand Down
1 change: 1 addition & 0 deletions docs/source/ingest/destination_connectors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ in our community `Slack. <https://short.unstructured.io/pzw05l7>`_
destination_connectors/azure_cognitive_search
destination_connectors/delta_table
destination_connectors/mongodb
destination_connectors/weaviate
67 changes: 67 additions & 0 deletions docs/source/ingest/destination_connectors/weaviate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
Weaviate
===========

Batch process all your records using ``unstructured-ingest`` to store structured outputs locally on your filesystem and upload those local files to a Weaviate collection.

First you'll need to install the weaviate dependencies as shown here.

.. code:: shell

pip install "unstructured[weaviate]"

Run Locally
-----------
The upstream connector can be any of the ones supported, but for convenience here, showing a sample command using the
upstream weaviate connector. This will push elements into a collection schema of your choice into a weaviate instance
running locally.

.. tabs::

.. tab:: Shell

.. code:: shell

unstructured-ingest \
local \
--input-path example-docs/fake-memo.pdf \
--anonymous \
--output-dir local-output-to-weaviate \
--num-processes 2 \
--verbose \
--strategy fast \
weaviate \
--host-url http://localhost:8080 \
--class-name elements \

.. tab:: Python

.. code:: python

import os

from unstructured.ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig
from unstructured.ingest.runner import LocalRunner

if __name__ == "__main__":
runner = LocalRunner(
processor_config=ProcessorConfig(
verbose=True,
output_dir="local-output-to-weaviate",
num_processes=2,
),
read_config=ReadConfig(),
partition_config=PartitionConfig(),
writer_type="weaviate",
writer_kwargs={
"host_url": os.getenv("WEAVIATE_HOST_URL"),
"class_name": os.getenv("WEAVIATE_CLASS_NAME")
}
)
runner.run(
input_path="example-docs/fake-memo.pdf",
)


For a full list of the options the CLI accepts check ``unstructured-ingest <upstream connector> weaviate --help``.

NOTE: Keep in mind that you will need to have all the appropriate extras and dependencies for the file types of the documents contained in your data storage platform if you're running this locally. You can find more information about this in the `installation guide <https://unstructured-io.github.io/unstructured/installing.html>`_.
Loading
Loading