Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Weaviate destination connector #1963

Merged
merged 47 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
94c12ff
initial commit for weaviate destination connector
rvztz Oct 31, 2023
96c465b
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Oct 31, 2023
1887ebe
added test scripts
rvztz Nov 1, 2023
5ab41a8
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 1, 2023
2f30293
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 2, 2023
20a7c58
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 3, 2023
8dc3f45
Adds check to validate the number of written pdf elemeents
rvztz Nov 3, 2023
6d56877
Fixes notes on cli interface. Adds batch_size as a cli command.
rvztz Nov 3, 2023
0a29da4
linting
rvztz Nov 3, 2023
9887907
Removes changes on cli interfaces Dict() type
rvztz Nov 3, 2023
ad7130e
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 4, 2023
008b732
Adds `DestinationConnectionError` to the weaviate dest connector
rvztz Nov 4, 2023
d87629c
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 7, 2023
8a0cc4b
Adds formatting logic to the `conform_dict`method on the weaviate con…
rvztz Nov 8, 2023
6c045be
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 8, 2023
062d4e4
pins weaviate python client version
rvztz Nov 8, 2023
bfe95ad
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 13, 2023
5d1a3f4
version pinning on weaviate-client
rvztz Nov 13, 2023
68b60aa
bumps weaviate dependency version on constraints
rvztz Nov 13, 2023
d2ad002
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 13, 2023
7968d3d
Modifies `test_unstructured.staging.test_weaviate.py::test_weaviate_s…
rvztz Nov 14, 2023
412aadf
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 14, 2023
63b9413
version bump
rvztz Nov 14, 2023
4d20445
removes s3 source
rvztz Nov 16, 2023
0c4955e
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 16, 2023
9d97829
adds chunking and embeddings flags to local source connector on `weav…
rvztz Nov 16, 2023
3039d7c
Fixes timestamp parsing
rvztz Nov 21, 2023
fe301ad
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 21, 2023
44b5b34
moves test-ingest-weaviate-output.py
rvztz Nov 21, 2023
96f10ab
Removes duplicated changelog entry, removes weaviate-client version p…
rvztz Nov 22, 2023
25fa86c
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 22, 2023
13477df
Adds docs for weaviate connector
rvztz Nov 22, 2023
3fb6037
remmoves unused var on ingest.sh example for weaviate
rvztz Nov 22, 2023
7d84b7d
removes unused vars from example script
rvztz Nov 22, 2023
ab9199d
Modifies weaviate connectotr docs. Removes re-declaration of batch_si…
rvztz Nov 22, 2023
453ecbf
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 22, 2023
dce9302
Copies local input connector parameters from mongodb.sh
rvztz Nov 22, 2023
afe85f9
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 23, 2023
bbe5c3d
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 28, 2023
54cbf9d
version bump
rvztz Nov 28, 2023
a1fe16d
Sets correct expected number of documents created
rvztz Nov 28, 2023
99ac633
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 28, 2023
6daa70d
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 29, 2023
1add053
Sets single client instance t WeaviateDestinationConnector level
rvztz Nov 30, 2023
372c997
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 30, 2023
ae8b98a
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Nov 30, 2023
8816718
Merge branch 'main' into rvztz/weaviate-destination-connector
rvztz Dec 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
## 0.10.31-dev2
## 0.10.31-dev3

### Enhancements
* **Temporary Support for paddle language parameter** User can specify default langage code for paddle with ENV `DEFAULT_PADDLE_LANG` before we have the language mapping for paddle.

### Features

* **Weaviate destination connector** Weaviate connector added to ingest CLI. Users may now use `unstructured-ingest` to write partitioned data from over 20 data sources (so far) to a Weaviate object collection.

### Fixes

* **Remove default user ./ssh folder** The default notebook user during image build would create the known_hosts file with incorrect ownership, this is legacy and no longer needed so it was removed.
* **Include `languages` in metadata when partitioning strategy='hi_res' or 'fast'** User defined `languages` was previously used for text detection, but not included in the resulting element metadata for some strategies. `languages` will now be included in the metadata regardless of partition strategy for pdfs and images.

Expand All @@ -20,6 +23,7 @@

* **Add functionality to do a second OCR on cropped table images.** Changes to the values for scaling ENVs affect entire page OCR output(OCR regression) so we now do a second OCR for tables.
* **Adds ability to pass timeout for a request when partitioning via a `url`.** `partition` now accepts a new optional parameter `request_timeout` which if set will prevent any `requests.get` from hanging indefinitely and instead will raise a timeout error. This is useful when partitioning a url that may be slow to respond or may not respond at all.
* **Weaviate destination connector** Weaviate connector added to ingest CLI. Users may now use `unstructured-ingest` to write partitioned data from over 20 data sources (so far) to a Weaviate object collection.

rvztz marked this conversation as resolved.
Show resolved Hide resolved
### Fixes

Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,10 @@ install-ingest-airtable:
install-ingest-sharepoint:
python3 -m pip install -r requirements/ingest/sharepoint.txt

.PHONY: install-ingest-weaviate
install-ingest-weaviate:
python3 -m pip install -r requirements/ingest/weaviate.txt

.PHONY: install-ingest-local
install-ingest-local:
echo "no unique dependencies for local connector"
Expand Down
Loading
Loading