Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: FSSPEC invalid metadata date field types #279

Merged
merged 7 commits into from
Dec 5, 2024

Conversation

ds-filipknefel
Copy link
Contributor

@ds-filipknefel ds-filipknefel commented Nov 28, 2024

Few fsspec connectors: SFTP, Azure, Box and GCS havedate_modified and date_created fields of FileDataSourceMetadata class were of type float | None instead of str | None, modified code creating the metadata to cast float timestamps to strings.

@potter-potter
Copy link
Contributor

Can you add a comment into the description why they need to be strings? I noticed in the fsspec connectors that the need for string/float varies.

Cast date_created/modified metadata field values to string in fsspec
connectors where it occured.
@ds-filipknefel ds-filipknefel changed the title fix: SFTP invalid metadata date field types fix: FSSPEC invalid metadata date field types Dec 3, 2024
Copy link
Contributor

@potter-potter potter-potter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@ds-filipknefel ds-filipknefel merged commit 4dd4d30 into release/0.3.7 Dec 5, 2024
26 checks passed
@ds-filipknefel ds-filipknefel deleted the fix/sftp-metadata-invalid-types branch December 5, 2024 11:32
rbiseck3 added a commit that referenced this pull request Dec 9, 2024
* fix/Kafka cloud source couldn't connect, add test (#257)

* feat/add release branch to PR triggers (#284)

* add release branch to PR triggers

* omit vectar dest e2e test

* fix/Azure AI search - reuse client and close connections (#282)

* support personal access token for confluence auth (#275)

* feat/determistic ids for uploads (#286)

* create deterministic id for upload use

* fix id in sql connector

* fix: FSSPEC invalid metadata date field types (#279)

Few fsspec connectors: SFTP, Azure, Box and GCS havedate_modified and date_created fields of FileDataSourceMetadata class were of type float | None instead of str | None, modified code creating the metadata to cast float timestamps to strings.

---------

Co-authored-by: Filip Knefel <[email protected]>

* [DS-303] SQL Connectors prevent syntax errors and SQL injection (#273)

* Snowflake query data binding fix

* Enable SingleSource source connector entry

* Fix Snowflake nan issue

* Make Singlestore connector more robust against SQL injection

* Clean sql upload and add query debug log

* Make SQLite connector more robust against sql injection

* SQL injection fixes; Changelog and version update

* Optimize memory usage of Snowflake uploader

* Changelog update: Optimize memory usage of Snowflake uploader

* feat/Qdrant dest - add cloud test (#248)

* feat/ duckdb destination connector (#285)

* 🔀 fix: DS-328 Snowflake Downloader error (#287)

* Fix Snowflake downloader

* Changelog and version update: Fix Snowflake downloader

* Replace Snowflake source connector inheritance with SQL classes

* Comment on snowflake dependency name

* Get rid of snowflake postgres inheritance. Replaced with SQL.

* Fix lint

* Version update: Fix Snowflake downloader

* feat: Refined box connector to actually use config JSON directly (#258)

Refined box connector to actually use config JSON directly
---------

Co-authored-by: Mateusz Kuprowski <[email protected]>
Co-authored-by: Michal Martyniak <[email protected]>

* fix: update fsspec upload paths to work independent of OS (#291)

When run on windows Path(<path-object>) converts slashes to backward slashes which are not correctly interpreted when passed to (non-local) fsspec filesystem.
Instead of using str(<path-object>) use <path-object>.to_posix() to mitigate this effect in fsspec code.

---------

Co-authored-by: Filip Knefel <[email protected]>

* fix: properly log elasticsearch upload errors (#289)

Original error logging was never called because by default parallel_bulk re-raises exceptions and raises errors for non 2XX responses and these were not caught.

We change the logic to catch, log and re-raise errors on our side. Error log is sanitized to remove the uploaded object contents from it.


---------

Co-authored-by: Filip Knefel <[email protected]>

* chore: update weaviate example (#272)

Update Weaviate connector example

---------

Co-authored-by: Filip Knefel <[email protected]>

* update changelog

---------

Co-authored-by: Hubert Rutkowski <[email protected]>
Co-authored-by: Filip Knefel <[email protected]>
Co-authored-by: Filip Knefel <[email protected]>
Co-authored-by: mpolomdeepsense <[email protected]>
Co-authored-by: David Potter <[email protected]>
Co-authored-by: mateuszkuprowski <[email protected]>
Co-authored-by: Mateusz Kuprowski <[email protected]>
Co-authored-by: Michal Martyniak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants