Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into potter/astradb-updates-2
Browse files Browse the repository at this point in the history
  • Loading branch information
potter-potter committed May 15, 2024
2 parents b52b76b + 12b30d2 commit 5dddef4
Show file tree
Hide file tree
Showing 40 changed files with 1,216 additions and 839 deletions.
10 changes: 7 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,21 @@
## 0.13.8-dev3
## 0.13.8-dev9

### Enhancements

**Faster evaluation** Support for concurrent processing of documents during evaluation
* **Faster evaluation** Support for concurrent processing of documents during evaluation

### Features

### Fixes

* **Add missing starting_page_num param to partition_image**
* **Make the filename and file params for partition_image and partition_pdf match the other partitioners**
* **Fix include_slide_notes and include_page_breaks params in partition_ppt**
* **Re-apply: skip accuracy calculation feature** Overwritten by mistake
* **AstraDB: opton to prevent indexing metadata**
* **Fix type hint for paragraph_grouper param** `paragraph_grouper` can be set to `False`, but the type hint did not not reflect this previously.
* **Remove links param from partition_pdf** `links` is extracted during partitioning and is not needed as a paramter in partition_pdf.
* **Improve CSV delimeter detection.** `partition_csv()` would raise on CSV files with very long lines.
* **AstraDB: option to prevent indexing metadata**

## 0.13.7

Expand Down
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ markdown-it-py==3.0.0
# myst-parser
markupsafe==2.1.5
# via jinja2
mdit-py-plugins==0.4.0
mdit-py-plugins==0.4.1
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
Expand Down
11 changes: 11 additions & 0 deletions example-docs/csv-with-long-lines.csv

Large diffs are not rendered by default.

Binary file modified example-docs/language-docs/eng_spa_mult.ppt
Binary file not shown.
6 changes: 3 additions & 3 deletions requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ charset-normalizer==3.3.2
# unstructured-client
click==8.1.7
# via nltk
dataclasses-json==0.6.5
dataclasses-json==0.6.6
# via -r ./base.in
dataclasses-json-speakeasy==0.5.11
# via unstructured-client
Expand All @@ -39,7 +39,7 @@ jsonpath-python==1.0.6
# via unstructured-client
langdetect==1.0.9
# via -r ./base.in
lxml==5.2.1
lxml==5.2.2
# via -r ./base.in
marshmallow==3.21.2
# via
Expand Down Expand Up @@ -67,7 +67,7 @@ python-magic==0.4.27
# via -r ./base.in
rapidfuzz==3.9.0
# via -r ./base.in
regex==2024.4.28
regex==2024.5.10
# via nltk
requests==2.31.0
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/build.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ markdown-it-py==3.0.0
# myst-parser
markupsafe==2.1.5
# via jinja2
mdit-py-plugins==0.4.0
mdit-py-plugins==0.4.1
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
Expand Down
2 changes: 1 addition & 1 deletion requirements/dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ platformdirs==3.10.0
# -c ./test.txt
# jupyter-core
# virtualenv
pre-commit==3.7.0
pre-commit==3.7.1
# via -r ./dev.in
prometheus-client==0.20.0
# via jupyter-server
Expand Down
2 changes: 1 addition & 1 deletion requirements/extra-docx.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#
# pip-compile ./extra-docx.in
#
lxml==5.2.1
lxml==5.2.2
# via
# -c ./base.txt
# python-docx
Expand Down
2 changes: 1 addition & 1 deletion requirements/extra-odt.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#
# pip-compile ./extra-odt.in
#
lxml==5.2.1
lxml==5.2.2
# via
# -c ./base.txt
# python-docx
Expand Down
6 changes: 3 additions & 3 deletions requirements/extra-paddleocr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ attrdict==2.0.1
# via unstructured-paddleocr
babel==2.15.0
# via flask-babel
bce-python-sdk==0.9.7
bce-python-sdk==0.9.9
# via visualdl
blinker==1.8.2
# via flask
Expand Down Expand Up @@ -77,7 +77,7 @@ lazy-loader==0.4
# via scikit-image
lmdb==1.4.1
# via unstructured-paddleocr
lxml==5.2.1
lxml==5.2.2
# via
# -c ./base.txt
# premailer
Expand Down Expand Up @@ -199,7 +199,7 @@ six==1.16.0
# imgaug
# python-dateutil
# visualdl
tifffile==2024.5.3
tifffile==2024.5.10
# via scikit-image
tqdm==4.66.4
# via
Expand Down
6 changes: 3 additions & 3 deletions requirements/extra-pdf-image.txt
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ kiwisolver==1.4.5
# via matplotlib
layoutparser[layoutmodels,tesseract]==0.3.4
# via unstructured-inference
lxml==5.2.1
lxml==5.2.2
# via
# -c ./base.txt
# pikepdf
Expand Down Expand Up @@ -198,7 +198,7 @@ pyparsing==3.0.9
# matplotlib
pypdf==4.2.0
# via -r ./extra-pdf-image.in
pypdfium2==4.29.0
pypdfium2==4.30.0
# via pdfplumber
pytesseract==0.3.10
# via layoutparser
Expand All @@ -222,7 +222,7 @@ rapidfuzz==3.9.0
# via
# -c ./base.txt
# unstructured-inference
regex==2024.4.28
regex==2024.5.10
# via
# -c ./base.txt
# transformers
Expand Down
2 changes: 1 addition & 1 deletion requirements/extra-pptx.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#
# pip-compile ./extra-pptx.in
#
lxml==5.2.1
lxml==5.2.2
# via python-pptx
pillow==10.3.0
# via python-pptx
Expand Down
2 changes: 1 addition & 1 deletion requirements/huggingface.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ pyyaml==6.0.1
# via
# huggingface-hub
# transformers
regex==2024.4.28
regex==2024.5.10
# via
# -c ./base.txt
# sacremoses
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/azure.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ azure-datalake-store==0.0.53
# via adlfs
azure-identity==1.16.0
# via adlfs
azure-storage-blob==12.19.1
azure-storage-blob==12.20.0
# via adlfs
certifi==2024.2.2
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/delta-table.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#
# pip-compile ./ingest/delta-table.in
#
deltalake==0.17.3
deltalake==0.17.4
# via -r ./ingest/delta-table.in
fsspec==2024.3.1
# via -r ./ingest/delta-table.in
Expand Down
6 changes: 3 additions & 3 deletions requirements/ingest/embed-aws-bedrock.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ charset-normalizer==3.3.2
# via
# -c ./ingest/../base.txt
# requests
dataclasses-json==0.6.5
dataclasses-json==0.6.6
# via
# -c ./ingest/../base.txt
# langchain-community
Expand All @@ -51,11 +51,11 @@ jsonpatch==1.33
# via langchain-core
jsonpointer==2.4
# via jsonpatch
langchain-community==0.0.37
langchain-community==0.0.38
# via -r ./ingest/embed-aws-bedrock.in
langchain-core==0.1.52
# via langchain-community
langsmith==0.1.54
langsmith==0.1.57
# via
# langchain-community
# langchain-core
Expand Down
8 changes: 4 additions & 4 deletions requirements/ingest/embed-huggingface.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ charset-normalizer==3.3.2
# via
# -c ./ingest/../base.txt
# requests
dataclasses-json==0.6.5
dataclasses-json==0.6.6
# via
# -c ./ingest/../base.txt
# langchain-community
Expand Down Expand Up @@ -62,11 +62,11 @@ jsonpatch==1.33
# via langchain-core
jsonpointer==2.4
# via jsonpatch
langchain-community==0.0.37
langchain-community==0.0.38
# via -r ./ingest/embed-huggingface.in
langchain-core==0.1.52
# via langchain-community
langsmith==0.1.54
langsmith==0.1.57
# via
# langchain-community
# langchain-core
Expand Down Expand Up @@ -120,7 +120,7 @@ pyyaml==6.0.1
# langchain-community
# langchain-core
# transformers
regex==2024.4.28
regex==2024.5.10
# via
# -c ./ingest/../base.txt
# transformers
Expand Down
4 changes: 2 additions & 2 deletions requirements/ingest/embed-octoai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,13 @@ idna==3.7
# anyio
# httpx
# requests
openai==1.26.0
openai==1.28.1
# via -r ./ingest/embed-octoai.in
pydantic==2.7.1
# via openai
pydantic-core==2.18.2
# via pydantic
regex==2024.4.28
regex==2024.5.10
# via
# -c ./ingest/../base.txt
# tiktoken
Expand Down
10 changes: 5 additions & 5 deletions requirements/ingest/embed-openai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ charset-normalizer==3.3.2
# via
# -c ./ingest/../base.txt
# requests
dataclasses-json==0.6.5
dataclasses-json==0.6.6
# via
# -c ./ingest/../base.txt
# langchain-community
Expand Down Expand Up @@ -59,11 +59,11 @@ jsonpatch==1.33
# via langchain-core
jsonpointer==2.4
# via jsonpatch
langchain-community==0.0.37
langchain-community==0.0.38
# via -r ./ingest/embed-openai.in
langchain-core==0.1.52
# via langchain-community
langsmith==0.1.54
langsmith==0.1.57
# via
# langchain-community
# langchain-core
Expand All @@ -83,7 +83,7 @@ numpy==1.26.4
# via
# -c ./ingest/../base.txt
# langchain-community
openai==1.26.0
openai==1.28.1
# via -r ./ingest/embed-openai.in
orjson==3.10.3
# via langsmith
Expand All @@ -104,7 +104,7 @@ pyyaml==6.0.1
# via
# langchain-community
# langchain-core
regex==2024.4.28
regex==2024.5.10
# via
# -c ./ingest/../base.txt
# tiktoken
Expand Down
14 changes: 6 additions & 8 deletions requirements/ingest/embed-vertexai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ charset-normalizer==3.3.2
# via
# -c ./ingest/../base.txt
# requests
dataclasses-json==0.6.5
dataclasses-json==0.6.6
# via
# -c ./ingest/../base.txt
# langchain
Expand All @@ -55,7 +55,7 @@ google-auth==2.29.0
# google-cloud-core
# google-cloud-resource-manager
# google-cloud-storage
google-cloud-aiplatform==1.50.0
google-cloud-aiplatform==1.51.0
# via langchain-google-vertexai
google-cloud-bigquery==3.22.0
# via google-cloud-aiplatform
Expand Down Expand Up @@ -98,14 +98,12 @@ idna==3.7
# requests
# yarl
jsonpatch==1.33
# via
# langchain
# langchain-core
# via langchain-core
jsonpointer==2.4
# via jsonpatch
langchain==0.1.17
langchain==0.1.20
# via -r ./ingest/embed-vertexai.in
langchain-community==0.0.37
langchain-community==0.0.38
# via
# -r ./ingest/embed-vertexai.in
# langchain
Expand All @@ -119,7 +117,7 @@ langchain-google-vertexai==1.0.3
# via -r ./ingest/embed-vertexai.in
langchain-text-splitters==0.0.1
# via langchain
langsmith==0.1.54
langsmith==0.1.57
# via
# langchain
# langchain-community
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/google-drive.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ charset-normalizer==3.3.2
# requests
google-api-core==2.19.0
# via google-api-python-client
google-api-python-client==2.128.0
google-api-python-client==2.129.0
# via -r ./ingest/google-drive.in
google-auth==2.29.0
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/salesforce.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ idna==3.7
# requests
isodate==0.6.1
# via zeep
lxml==5.2.1
lxml==5.2.2
# via
# -c ./ingest/../base.txt
# zeep
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/weaviate.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ urllib3==1.26.18
# requests
validators==0.28.1
# via weaviate-client
weaviate-client==4.5.7
weaviate-client==4.6.0
# via
# -c ./ingest/../deps/constraints.txt
# -r ./ingest/weaviate.in
Expand Down
6 changes: 3 additions & 3 deletions requirements/test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ flake8==7.0.0
# flake8-print
flake8-print==5.0.0
# via -r ./test.in
freezegun==1.5.0
freezegun==1.5.1
# via -r ./test.in
grpcio==1.63.0
# via -r ./test.in
Expand All @@ -52,7 +52,7 @@ label-studio-sdk==0.0.32
# via -r ./test.in
label-studio-tools==0.0.4
# via label-studio-sdk
lxml==5.2.1
lxml==5.2.2
# via
# -c ./base.txt
# label-studio-sdk
Expand Down Expand Up @@ -114,7 +114,7 @@ requests==2.31.0
# via
# -c ./base.txt
# label-studio-sdk
ruff==0.4.3
ruff==0.4.4
# via -r ./test.in
six==1.16.0
# via
Expand Down
Loading

0 comments on commit 5dddef4

Please sign in to comment.