Skip to content

Commit

Permalink
Merge branch 'main' into roman/expand-fsspec-downstream-connectors-2
Browse files Browse the repository at this point in the history
  • Loading branch information
tabossert authored Oct 25, 2023
2 parents cbd0fb9 + 135aa65 commit 9be95d3
Show file tree
Hide file tree
Showing 67 changed files with 1,231 additions and 1,211 deletions.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

### Enhancements

* **Leverage dict to share content across ingest pipeline** To share the ingest doc content across steps in the ingest pipeline, this was updated to use a multiprocessing-safe dictionary so changes get persisted and each step has the option to modify the ingest docs in place.

### Features

### Fixes

* **Caching fixes in ingest pipeline** Previously, steps like the source node were not leveraging parameters such as `re_download` to dictate if files should be forced to redownload rather than use what might already exist locally.

## 0.10.26

### Enhancements

* **Add CI evaluation workflow** Adds evaluation metrics to the current ingest workflow to measure the performance of each file extracted as well as aggregated-level performance.
* **Fsspec downstream connectors** New destination connector added to ingest CLI, users may now use `unstructured-ingest` to write to any of the following:
* Azure
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"metadata": {
"data_source": {
"url": "",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/fake-text.txt",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down Expand Up @@ -115,7 +115,7 @@
"metadata": {
"data_source": {
"url": "",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/fake-text.txt",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down Expand Up @@ -225,7 +225,7 @@
"metadata": {
"data_source": {
"url": "",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/fake-text.txt",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down Expand Up @@ -335,7 +335,7 @@
"metadata": {
"data_source": {
"url": "",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/fake-text.txt",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down Expand Up @@ -445,7 +445,7 @@
"metadata": {
"data_source": {
"url": "",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/fake-text.txt",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down Expand Up @@ -555,7 +555,7 @@
"metadata": {
"data_source": {
"url": "",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/fake-text.txt",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"metadata": {
"data_source": {
"url": "",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/ideas-page.html",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"metadata": {
"data_source": {
"url": "https://unstructuredio.sharepoint.com/Shared Documents/stanley-cups.xlsx?d=wb9956a338079432191ea609def07394d",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/stanley-cups.xlsx",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down Expand Up @@ -117,7 +117,7 @@
"metadata": {
"data_source": {
"url": "https://unstructuredio.sharepoint.com/Shared Documents/stanley-cups.xlsx?d=wb9956a338079432191ea609def07394d",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/stanley-cups.xlsx",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down Expand Up @@ -229,7 +229,7 @@
"metadata": {
"data_source": {
"url": "https://unstructuredio.sharepoint.com/Shared Documents/stanley-cups.xlsx?d=wb9956a338079432191ea609def07394d",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/stanley-cups.xlsx",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down Expand Up @@ -341,7 +341,7 @@
"metadata": {
"data_source": {
"url": "https://unstructuredio.sharepoint.com/Shared Documents/stanley-cups.xlsx?d=wb9956a338079432191ea609def07394d",
"version": 1,
"version": "1",
"record_locator": {
"server_path": "/Shared Documents/stanley-cups.xlsx",
"site_url": "https://unstructuredio.sharepoint.com"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -24,7 +24,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -43,7 +43,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -62,7 +62,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -81,7 +81,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -100,7 +100,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -119,7 +119,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -138,7 +138,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -157,7 +157,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -176,7 +176,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -195,7 +195,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -214,7 +214,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -233,7 +233,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -252,7 +252,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -271,7 +271,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -290,7 +290,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -309,7 +309,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -328,7 +328,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -347,7 +347,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -366,7 +366,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -385,7 +385,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -404,7 +404,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -423,7 +423,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -442,7 +442,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand All @@ -461,7 +461,7 @@
"metadata": {
"data_source": {
"url": "abfs://container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf",
"version": 167189396509615428390709838081557906335,
"version": "167189396509615428390709838081557906335",
"record_locator": {
"protocol": "abfs",
"remote_file_path": "container1/Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf"
Expand Down
Loading

0 comments on commit 9be95d3

Please sign in to comment.