Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update local fork #9

Merged
merged 26 commits into from
Nov 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
df10231
[DS 111] MongoDB v2 source (#154)
mateuszkuprowski Oct 17, 2024
d029303
feat/refactor sql connectors (#181)
rbiseck3 Oct 17, 2024
fe85640
bugfix/support optional access configs on connection configs (#183)
rbiseck3 Oct 18, 2024
6fc428f
feat/split databricks into each auth type supported (#182)
rbiseck3 Oct 18, 2024
7306ac1
feat/update kdbai to latest version (#187)
rbiseck3 Oct 21, 2024
977fc0a
feat/sql source (#185)
rbiseck3 Oct 21, 2024
3ee677c
feat: add sampling functionality to fsspec indexers (#189)
ahmetmeleq Oct 22, 2024
a30124b
Feat: V2 Slack Source Connector (#180)
ds-filipknefel Oct 23, 2024
71309ee
feat/add Delta Tables Destination Connector v2 (#163)
mackurzawa Oct 23, 2024
43020b6
feat/snowflake connector (#191)
rbiseck3 Oct 23, 2024
6891b87
bugfix/support async indexing (#192)
rbiseck3 Oct 23, 2024
a8d6364
fix/Databricks example. (#193)
hubert-rutkowski85 Oct 23, 2024
bf81552
set changelog to new minor version (#194)
rbiseck3 Oct 24, 2024
5d679fa
databricks volumes add .json (#198)
potter-potter Oct 25, 2024
8445479
feat/add singlestore source connector (#197)
rbiseck3 Oct 25, 2024
5773481
feat/astradb source connector (#143)
potter-potter Oct 25, 2024
cd6aaa0
File system based indexers return record display name (#200)
six5532one Oct 28, 2024
28cf5a0
feat/Migration - GitLab Source to Connector V2 Structure (#DS-91) (#168)
unstructured-theron Oct 31, 2024
d42d3c6
Remove `overwrite` settings for fsspec and databricks connectors (#213)
vangheem Nov 2, 2024
60bfc83
feat/bump unstructured-client version and leverage new async support …
rbiseck3 Nov 5, 2024
e2a08e2
feat/created confluence source v2 connector (#217)
mateuszkuprowski Nov 7, 2024
5bff394
feat/onedrive-destination (#190)
mateuszkuprowski Nov 7, 2024
18ead57
fix SQL Precheck bug (#205)
potter-potter Nov 8, 2024
608eeb3
feat: Migrate qdrant destination to V2 (#178)
guilherme-uns Nov 8, 2024
e38aecf
feat/Kafka V2 Connector Source (#161)
Beppeth Nov 8, 2024
373568c
feat/release 0.2.2 (#221)
rbiseck3 Nov 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
45 changes: 31 additions & 14 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ jobs:
make integration-test-embedders

source_connectors_integration_test:
runs-on: ubuntu-latest
runs-on: ubuntu-latest-m
needs: [ setup ]
steps:
- uses: 'actions/checkout@v4'
Expand All @@ -84,16 +84,25 @@ jobs:
uses: ./.github/actions/base-cache
with:
python-version: "3.10"
- name: Test (end-to-end)
run : |
source .venv/bin/activate
- name: Setup up docker
run: |
sudo make install-docker-compose
docker compose version
- name: Run Integration Tests
env:
DATABRICKS_HOST: ${{secrets.DATABRICKS_HOST}}
DATABRICKS_CATALOG: ${{secrets.DATABRICKS_CATALOG}}
DATABRICKS_CLIENT_ID: ${{secrets.DATABRICKS_CLIENT_ID}}
DATABRICKS_CLIENT_SECRET: ${{secrets.DATABRICKS_CLIENT_SECRET}}
CONFLUENCE_USER_EMAIL: ${{secrets.CONFLUENCE_USER_EMAIL}}
CONFLUENCE_API_TOKEN: ${{secrets.CONFLUENCE_API_TOKEN}}
run : |
source .venv/bin/activate
make install-test
make integration-test-connectors-src

destination_connectors_integration_test:
runs-on: ubuntu-latest
runs-on: ubuntu-latest-m
needs: [ setup ]
steps:
- uses: 'actions/checkout@v4'
Expand All @@ -108,11 +117,25 @@ jobs:
uses: ./.github/actions/base-cache
with:
python-version: "3.10"
- name: Test (end-to-end)
run : |
source .venv/bin/activate
- name: Setup up docker
run: |
sudo make install-docker-compose
docker compose version
- name: Run Integration Tests
env:
DATABRICKS_HOST: ${{secrets.DATABRICKS_HOST}}
DATABRICKS_CATALOG: ${{secrets.DATABRICKS_CATALOG}}
DATABRICKS_CLIENT_ID: ${{secrets.DATABRICKS_CLIENT_ID}}
DATABRICKS_CLIENT_SECRET: ${{secrets.DATABRICKS_CLIENT_SECRET}}
S3_INGEST_TEST_ACCESS_KEY: ${{ secrets.S3_INGEST_TEST_ACCESS_KEY }}
S3_INGEST_TEST_SECRET_KEY: ${{ secrets.S3_INGEST_TEST_SECRET_KEY }}
MS_CLIENT_CRED: ${{ secrets.MS_CLIENT_CRED }}
MS_CLIENT_ID: ${{ secrets.MS_CLIENT_ID }}
MS_TENANT_ID: ${{ secrets.MS_TENANT_ID }}
MS_USER_EMAIL: ${{ secrets.MS_USER_EMAIL }}
MS_USER_PNAME: ${{ secrets.MS_USER_PNAME }}
run : |
source .venv/bin/activate
make install-test
make integration-test-connectors-dest

Expand Down Expand Up @@ -140,8 +163,6 @@ jobs:
env:
AIRTABLE_PERSONAL_ACCESS_TOKEN: ${{ secrets.AIRTABLE_PERSONAL_ACCESS_TOKEN }}
BOX_APP_CONFIG: ${{ secrets.BOX_APP_CONFIG }}
CONFLUENCE_API_TOKEN: ${{ secrets.CONFLUENCE_API_TOKEN }}
CONFLUENCE_USER_EMAIL: ${{ secrets.CONFLUENCE_USER_EMAIL }}
DATABRICKS_HOST: ${{secrets.DATABRICKS_HOST}}
DATABRICKS_CATALOG: ${{secrets.DATABRICKS_CATALOG}}
DATABRICKS_CLIENT_ID: ${{secrets.DATABRICKS_CLIENT_ID}}
Expand Down Expand Up @@ -261,10 +282,6 @@ jobs:
ASTRA_DB_APPLICATION_TOKEN: ${{secrets.ASTRA_DB_APPLICATION_TOKEN}}
ASTRA_DB_API_ENDPOINT: ${{secrets.ASTRA_DB_ENDPOINT}}
CLARIFAI_API_KEY: ${{secrets.CLARIFAI_API_KEY}}
DATABRICKS_HOST: ${{secrets.DATABRICKS_HOST}}
DATABRICKS_CATALOG: ${{secrets.DATABRICKS_CATALOG}}
DATABRICKS_CLIENT_ID: ${{secrets.DATABRICKS_CLIENT_ID}}
DATABRICKS_CLIENT_SECRET: ${{secrets.DATABRICKS_CLIENT_SECRET}}
SHAREPOINT_CLIENT_ID: ${{secrets.SHAREPOINT_CLIENT_ID}}
SHAREPOINT_CRED: ${{secrets.SHAREPOINT_CRED}}
KDBAI_BEARER_TOKEN: ${{ secrets.KDBAI_BEARER_TOKEN }}
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@ tags
.ruff_cache/

.ppm
.vs

example-docs/*_images
examples/**/output/
Expand All @@ -207,3 +208,4 @@ metricsdiff.txt
annotated/

tmp_ingest/
.vs
54 changes: 53 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,62 @@
## 0.0.26-dev4
## 0.2.2

### Enhancements

* **Remove `overwrite` field** from fsspec and databricks connectors
* **Added migration for GitLab Source V2**
* **Added V2 confluence source connector**
* **Added OneDrive destination connector**
* **Qdrant destination to v2**
* **Migrate Kafka Source Connector to V2**

## 0.2.1

### Enhancements

* **File system based indexers return a record display name**
* **Add singlestore source connector**
* **Astra DB V2 Source Connector** Create a v2 version of the Astra DB Source Connector.
* **Support native async requests from unstructured-client**
* **Support filtering element types in partitioner step**

### Fixes

* **Fix Databricks Volumes file naming** Add .json to end of upload file.
* **Fix SQL Type destination precheck** Change to context manager "with".

## 0.2.0

### Enhancements

* **Add snowflake source and destination connectors**
* **Migrate Slack Source Connector to V2**
* **Migrate Slack Source Connector to V2**
* **Add Delta Table destination to v2**
* **Migrate Slack Source Connector to V2**

## 0.1.1

### Enhancements

* **Update KDB.AI vectorstore integration to 1.4**
* **Add sqlite and postgres source connectors**
* **Add sampling functionality for indexers in fsspec connectors**

### Fixes

* **Fix Databricks Volumes destination** Fix for filenames to not be hashes.

## 0.1.0

### Enhancements

* **Move default API URL parameter value to serverless API**
* **Add check that access config always wrapped in Secret**
* **Add togetherai embedder support**
* **Refactor sqlite and postgres to be distinct connectors to support better input validation**
* **Added MongoDB source V2 connector**
* **Support optional access configs on connection configs**
* **Refactor databricks into distinct connectors based on auth type**

### Fixes

Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ integration-test-embedders:
integration-test-connectors-src:
PYTHONPATH=. pytest --tags source -sv test/integration/connectors


.PHONY: integration-test-connectors-dest
integration-test-connectors-dest:
PYTHONPATH=. pytest --tags destination -sv test/integration/connectors
2 changes: 1 addition & 1 deletion requirements/connectors/gitlab.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ charset-normalizer==3.3.2
# via requests
idna==3.10
# via requests
python-gitlab==4.12.2
python-gitlab==4.13.0
# via -r ./connectors/gitlab.in
requests==2.32.3
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/connectors/kdbai.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
-c ../common/constraints.txt

kdbai-client
kdbai-client>=1.4.0
14 changes: 8 additions & 6 deletions requirements/connectors/kdbai.txt
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
# This file was autogenerated by uv via the following command:
# uv pip compile ./connectors/kdbai.in --output-file ./connectors/kdbai.txt --no-strip-extras --python-version 3.9
# uv pip compile kdbai.in --output-file kdbai.txt --no-strip-extras --python-version 3.9
certifi==2024.8.30
# via requests
charset-normalizer==3.3.2
charset-normalizer==3.4.0
# via requests
idna==3.10
# via requests
kdbai-client==1.3.0
# via -r ./connectors/kdbai.in
kdbai-client==1.4.0
# via -r kdbai.in
numpy==1.26.4
# via
# -c ./connectors/../common/constraints.txt
# -c ../common/constraints.txt
# pandas
# pykx
packaging==24.1
# via kdbai-client
pandas==2.2.3
# via
# kdbai-client
Expand All @@ -35,5 +37,5 @@ tzdata==2024.2
# via pandas
urllib3==1.26.20
# via
# -c ./connectors/../common/constraints.txt
# -c ../common/constraints.txt
# requests
2 changes: 1 addition & 1 deletion requirements/connectors/slack.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
-c ../common/constraints.txt

slack_sdk
slack_sdk[optional]
67 changes: 66 additions & 1 deletion requirements/connectors/slack.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,69 @@
# This file was autogenerated by uv via the following command:
# uv pip compile ./connectors/slack.in --output-file ./connectors/slack.txt --no-strip-extras --python-version 3.9
slack-sdk==3.33.1
aiodns==3.2.0
# via slack-sdk
aiohappyeyeballs==2.4.3
# via aiohttp
aiohttp==3.10.10
# via slack-sdk
aiosignal==1.3.1
# via aiohttp
async-timeout==4.0.3
# via aiohttp
attrs==24.2.0
# via aiohttp
boto3==1.34.131
# via slack-sdk
botocore==1.34.131
# via
# -c ./connectors/../common/constraints.txt
# boto3
# s3transfer
cffi==1.17.1
# via pycares
frozenlist==1.4.1
# via
# aiohttp
# aiosignal
greenlet==3.1.1
# via sqlalchemy
idna==3.10
# via yarl
jmespath==1.0.1
# via
# boto3
# botocore
multidict==6.1.0
# via
# aiohttp
# yarl
propcache==0.2.0
# via yarl
pycares==4.4.0
# via aiodns
pycparser==2.22
# via cffi
python-dateutil==2.9.0.post0
# via botocore
s3transfer==0.10.3
# via boto3
six==1.16.0
# via python-dateutil
slack-sdk[optional]==3.33.1
# via -r ./connectors/slack.in
sqlalchemy==2.0.36
# via slack-sdk
typing-extensions==4.12.2
# via
# multidict
# sqlalchemy
urllib3==1.26.20
# via
# -c ./connectors/../common/constraints.txt
# botocore
websocket-client==1.8.0
# via slack-sdk
websockets==13.1
# via slack-sdk
yarl==1.15.5
# via aiohttp
3 changes: 3 additions & 0 deletions requirements/connectors/snowflake.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
-c ../common/constraints.txt

snowflake-connector-python
53 changes: 53 additions & 0 deletions requirements/connectors/snowflake.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# This file was autogenerated by uv via the following command:
# uv pip compile snowflake.in --output-file snowflake.txt --no-strip-extras --python-version 3.9
asn1crypto==1.5.1
# via snowflake-connector-python
certifi==2024.8.30
# via
# requests
# snowflake-connector-python
cffi==1.17.1
# via
# cryptography
# snowflake-connector-python
charset-normalizer==3.4.0
# via
# requests
# snowflake-connector-python
cryptography==43.0.3
# via
# pyopenssl
# snowflake-connector-python
filelock==3.16.1
# via snowflake-connector-python
idna==3.10
# via
# requests
# snowflake-connector-python
packaging==24.1
# via snowflake-connector-python
platformdirs==4.3.6
# via snowflake-connector-python
pycparser==2.22
# via cffi
pyjwt==2.9.0
# via snowflake-connector-python
pyopenssl==24.2.1
# via snowflake-connector-python
pytz==2024.2
# via snowflake-connector-python
requests==2.32.3
# via snowflake-connector-python
snowflake-connector-python==3.12.2
# via -r snowflake.in
sortedcontainers==2.4.0
# via snowflake-connector-python
tomlkit==0.13.2
# via snowflake-connector-python
typing-extensions==4.12.2
# via snowflake-connector-python
urllib3==1.26.20
# via
# -c ../common/constraints.txt
# requests
# snowflake-connector-python
2 changes: 1 addition & 1 deletion requirements/remote/client.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
-c ../common/constraints.txt

unstructured-client >= 0.25.8
unstructured-client >= 0.26.1
Loading