Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/determistic ids for uploads #286

Merged
merged 2 commits into from
Dec 4, 2024
Merged

Conversation

rbiseck3
Copy link
Collaborator

@rbiseck3 rbiseck3 commented Dec 4, 2024

Description

Update the ID being used for uploads to be a UUID created from the combination of the existing element id as well as the file data identifier to add more record-specific context to the id.

@rbiseck3 rbiseck3 merged commit d841bef into release/0.3.7 Dec 4, 2024
26 checks passed
@rbiseck3 rbiseck3 deleted the roman/determistic-ids branch December 4, 2024 14:53
rbiseck3 added a commit that referenced this pull request Dec 9, 2024
* fix/Kafka cloud source couldn't connect, add test (#257)

* feat/add release branch to PR triggers (#284)

* add release branch to PR triggers

* omit vectar dest e2e test

* fix/Azure AI search - reuse client and close connections (#282)

* support personal access token for confluence auth (#275)

* feat/determistic ids for uploads (#286)

* create deterministic id for upload use

* fix id in sql connector

* fix: FSSPEC invalid metadata date field types (#279)

Few fsspec connectors: SFTP, Azure, Box and GCS havedate_modified and date_created fields of FileDataSourceMetadata class were of type float | None instead of str | None, modified code creating the metadata to cast float timestamps to strings.

---------

Co-authored-by: Filip Knefel <[email protected]>

* [DS-303] SQL Connectors prevent syntax errors and SQL injection (#273)

* Snowflake query data binding fix

* Enable SingleSource source connector entry

* Fix Snowflake nan issue

* Make Singlestore connector more robust against SQL injection

* Clean sql upload and add query debug log

* Make SQLite connector more robust against sql injection

* SQL injection fixes; Changelog and version update

* Optimize memory usage of Snowflake uploader

* Changelog update: Optimize memory usage of Snowflake uploader

* feat/Qdrant dest - add cloud test (#248)

* feat/ duckdb destination connector (#285)

* 🔀 fix: DS-328 Snowflake Downloader error (#287)

* Fix Snowflake downloader

* Changelog and version update: Fix Snowflake downloader

* Replace Snowflake source connector inheritance with SQL classes

* Comment on snowflake dependency name

* Get rid of snowflake postgres inheritance. Replaced with SQL.

* Fix lint

* Version update: Fix Snowflake downloader

* feat: Refined box connector to actually use config JSON directly (#258)

Refined box connector to actually use config JSON directly
---------

Co-authored-by: Mateusz Kuprowski <[email protected]>
Co-authored-by: Michal Martyniak <[email protected]>

* fix: update fsspec upload paths to work independent of OS (#291)

When run on windows Path(<path-object>) converts slashes to backward slashes which are not correctly interpreted when passed to (non-local) fsspec filesystem.
Instead of using str(<path-object>) use <path-object>.to_posix() to mitigate this effect in fsspec code.

---------

Co-authored-by: Filip Knefel <[email protected]>

* fix: properly log elasticsearch upload errors (#289)

Original error logging was never called because by default parallel_bulk re-raises exceptions and raises errors for non 2XX responses and these were not caught.

We change the logic to catch, log and re-raise errors on our side. Error log is sanitized to remove the uploaded object contents from it.


---------

Co-authored-by: Filip Knefel <[email protected]>

* chore: update weaviate example (#272)

Update Weaviate connector example

---------

Co-authored-by: Filip Knefel <[email protected]>

* update changelog

---------

Co-authored-by: Hubert Rutkowski <[email protected]>
Co-authored-by: Filip Knefel <[email protected]>
Co-authored-by: Filip Knefel <[email protected]>
Co-authored-by: mpolomdeepsense <[email protected]>
Co-authored-by: David Potter <[email protected]>
Co-authored-by: mateuszkuprowski <[email protected]>
Co-authored-by: Mateusz Kuprowski <[email protected]>
Co-authored-by: Michal Martyniak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants