-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/determistic ids for uploads #286
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vangheem
approved these changes
Dec 4, 2024
rbiseck3
added a commit
that referenced
this pull request
Dec 9, 2024
* fix/Kafka cloud source couldn't connect, add test (#257) * feat/add release branch to PR triggers (#284) * add release branch to PR triggers * omit vectar dest e2e test * fix/Azure AI search - reuse client and close connections (#282) * support personal access token for confluence auth (#275) * feat/determistic ids for uploads (#286) * create deterministic id for upload use * fix id in sql connector * fix: FSSPEC invalid metadata date field types (#279) Few fsspec connectors: SFTP, Azure, Box and GCS havedate_modified and date_created fields of FileDataSourceMetadata class were of type float | None instead of str | None, modified code creating the metadata to cast float timestamps to strings. --------- Co-authored-by: Filip Knefel <[email protected]> * [DS-303] SQL Connectors prevent syntax errors and SQL injection (#273) * Snowflake query data binding fix * Enable SingleSource source connector entry * Fix Snowflake nan issue * Make Singlestore connector more robust against SQL injection * Clean sql upload and add query debug log * Make SQLite connector more robust against sql injection * SQL injection fixes; Changelog and version update * Optimize memory usage of Snowflake uploader * Changelog update: Optimize memory usage of Snowflake uploader * feat/Qdrant dest - add cloud test (#248) * feat/ duckdb destination connector (#285) * 🔀 fix: DS-328 Snowflake Downloader error (#287) * Fix Snowflake downloader * Changelog and version update: Fix Snowflake downloader * Replace Snowflake source connector inheritance with SQL classes * Comment on snowflake dependency name * Get rid of snowflake postgres inheritance. Replaced with SQL. * Fix lint * Version update: Fix Snowflake downloader * feat: Refined box connector to actually use config JSON directly (#258) Refined box connector to actually use config JSON directly --------- Co-authored-by: Mateusz Kuprowski <[email protected]> Co-authored-by: Michal Martyniak <[email protected]> * fix: update fsspec upload paths to work independent of OS (#291) When run on windows Path(<path-object>) converts slashes to backward slashes which are not correctly interpreted when passed to (non-local) fsspec filesystem. Instead of using str(<path-object>) use <path-object>.to_posix() to mitigate this effect in fsspec code. --------- Co-authored-by: Filip Knefel <[email protected]> * fix: properly log elasticsearch upload errors (#289) Original error logging was never called because by default parallel_bulk re-raises exceptions and raises errors for non 2XX responses and these were not caught. We change the logic to catch, log and re-raise errors on our side. Error log is sanitized to remove the uploaded object contents from it. --------- Co-authored-by: Filip Knefel <[email protected]> * chore: update weaviate example (#272) Update Weaviate connector example --------- Co-authored-by: Filip Knefel <[email protected]> * update changelog --------- Co-authored-by: Hubert Rutkowski <[email protected]> Co-authored-by: Filip Knefel <[email protected]> Co-authored-by: Filip Knefel <[email protected]> Co-authored-by: mpolomdeepsense <[email protected]> Co-authored-by: David Potter <[email protected]> Co-authored-by: mateuszkuprowski <[email protected]> Co-authored-by: Mateusz Kuprowski <[email protected]> Co-authored-by: Michal Martyniak <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Update the ID being used for uploads to be a UUID created from the combination of the existing element id as well as the file data identifier to add more record-specific context to the id.