Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/Kafka cloud source couldn't connect, add test #257

Merged
merged 18 commits into from
Dec 2, 2024

Conversation

hubert-rutkowski85
Copy link
Contributor

@hubert-rutkowski85 hubert-rutkowski85 commented Nov 21, 2024

Couple of things here:

  1. There were 2 config errors in the Kafka Cloud source connector.

  2. Also added integration test for it. To have it work, admin will have to set secrets KAFKA_API_KEY , KAFKA_SECRET and KAFKA_BOOTSTRAP_SERVER in repo. EDIT: done, it works:

test/integration/connectors/test_kafka.py::test_kafka_source_cloud Topic fake-topic removed
Attempt 1: Waiting for topic fake-topic to not exist. Current topics: [test1, fake-topic]
Running validation
PASSED
  1. Kafka source connector has new field: group_id

  2. Make the auth fields be mandatory.

# Conflicts:
#	CHANGELOG.md
#	unstructured_ingest/__version__.py
# Conflicts:
#	CHANGELOG.md
#	unstructured_ingest/__version__.py
topic: str,
retries: int = 10,
interval: int = 1,
exists: bool = True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What purpose does adding this as an input help with? Seems odd to pass this in compared to the previous approach.

Copy link
Contributor Author

@hubert-rutkowski85 hubert-rutkowski85 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes it possible to use this function to wait for both topic existance, and non-existance. I used it in the new test.

@rbiseck3 rbiseck3 changed the base branch from main to release/0.3.7 December 2, 2024 16:22
@hubert-rutkowski85 hubert-rutkowski85 merged commit 0dca8ec into release/0.3.7 Dec 2, 2024
@hubert-rutkowski85 hubert-rutkowski85 deleted the DS-218/kafka-cloud-test branch December 2, 2024 16:49
rbiseck3 added a commit that referenced this pull request Dec 9, 2024
* fix/Kafka cloud source couldn't connect, add test (#257)

* feat/add release branch to PR triggers (#284)

* add release branch to PR triggers

* omit vectar dest e2e test

* fix/Azure AI search - reuse client and close connections (#282)

* support personal access token for confluence auth (#275)

* feat/determistic ids for uploads (#286)

* create deterministic id for upload use

* fix id in sql connector

* fix: FSSPEC invalid metadata date field types (#279)

Few fsspec connectors: SFTP, Azure, Box and GCS havedate_modified and date_created fields of FileDataSourceMetadata class were of type float | None instead of str | None, modified code creating the metadata to cast float timestamps to strings.

---------

Co-authored-by: Filip Knefel <[email protected]>

* [DS-303] SQL Connectors prevent syntax errors and SQL injection (#273)

* Snowflake query data binding fix

* Enable SingleSource source connector entry

* Fix Snowflake nan issue

* Make Singlestore connector more robust against SQL injection

* Clean sql upload and add query debug log

* Make SQLite connector more robust against sql injection

* SQL injection fixes; Changelog and version update

* Optimize memory usage of Snowflake uploader

* Changelog update: Optimize memory usage of Snowflake uploader

* feat/Qdrant dest - add cloud test (#248)

* feat/ duckdb destination connector (#285)

* 🔀 fix: DS-328 Snowflake Downloader error (#287)

* Fix Snowflake downloader

* Changelog and version update: Fix Snowflake downloader

* Replace Snowflake source connector inheritance with SQL classes

* Comment on snowflake dependency name

* Get rid of snowflake postgres inheritance. Replaced with SQL.

* Fix lint

* Version update: Fix Snowflake downloader

* feat: Refined box connector to actually use config JSON directly (#258)

Refined box connector to actually use config JSON directly
---------

Co-authored-by: Mateusz Kuprowski <[email protected]>
Co-authored-by: Michal Martyniak <[email protected]>

* fix: update fsspec upload paths to work independent of OS (#291)

When run on windows Path(<path-object>) converts slashes to backward slashes which are not correctly interpreted when passed to (non-local) fsspec filesystem.
Instead of using str(<path-object>) use <path-object>.to_posix() to mitigate this effect in fsspec code.

---------

Co-authored-by: Filip Knefel <[email protected]>

* fix: properly log elasticsearch upload errors (#289)

Original error logging was never called because by default parallel_bulk re-raises exceptions and raises errors for non 2XX responses and these were not caught.

We change the logic to catch, log and re-raise errors on our side. Error log is sanitized to remove the uploaded object contents from it.


---------

Co-authored-by: Filip Knefel <[email protected]>

* chore: update weaviate example (#272)

Update Weaviate connector example

---------

Co-authored-by: Filip Knefel <[email protected]>

* update changelog

---------

Co-authored-by: Hubert Rutkowski <[email protected]>
Co-authored-by: Filip Knefel <[email protected]>
Co-authored-by: Filip Knefel <[email protected]>
Co-authored-by: mpolomdeepsense <[email protected]>
Co-authored-by: David Potter <[email protected]>
Co-authored-by: mateuszkuprowski <[email protected]>
Co-authored-by: Mateusz Kuprowski <[email protected]>
Co-authored-by: Michal Martyniak <[email protected]>
micmarty-deepsense added a commit that referenced this pull request Jan 10, 2025
* initial implementation

* going up to indexes

* Working with partition now

* little trash from development left in the test_e2e

* uncomment a left over

* Remove Pre check

* Addressing some issues from Roman's review

* solving async issues with Python 3.10

* solving async issues with Python 3.10 again

* revert last change

* Implementation similar to ElasticSearch

* more changes to the async

* run_async implementation

* some fixes

* black fix

* removed unecessary stuf, metadata

* Flattening messages into a list

* another approach to flattern the messages list

* filedata path

* Async issues causing problems again

* black and ruff

* more detailed structured_output

* export OVERWRITE_FIXTURES=true

* filename change

* filename change

* filename change

* solved issues

* solve issues

* solve filename issues

* file changes

* version again

* adjustments

* filename added

* expectation was to have a filename

* solved discord pr issues

* Fixes

* Revert working message fetching code.

* Lint.

* CI test only discord.

* fix discord connector

* Overwrite fixtures.

* Revert "CI test only discord."

This reverts commit 1885bf5.

* Remove unnecessary env in github test.

* Remove failing clarifai

* fix/Kafka cloud source couldn't connect, add test (#257)

* feat/add release branch to PR triggers (#284)

* add release branch to PR triggers

* omit vectar dest e2e test

* fix/Azure AI search - reuse client and close connections (#282)

* support personal access token for confluence auth (#275)

* update discord deps

* add discord example (cannot be named discord.py to avoid pythonpath collisions)

* make channels required attr, require at least 1 elem

* add missing connector_type attr

* update version and changelog

* revert changes in kafka local

* add test for no token

* add indexer precheck

* pass DISCORD_TOKEN to source e2e tests

* add test for no channels

* tidy ruff

* set DISCORD_CHANNELS secret and use it as an env var

* fix flake8 error

* quickfix expected num of indexed files in test

* refactor discord tests

* update fixtures

* use @requires_env

* split channels string to list

* bump version

---------

Co-authored-by: mr-unstructured <[email protected]>
Co-authored-by: hubert.rutkowski <[email protected]>
Co-authored-by: Roman Isecke <[email protected]>
Co-authored-by: Hubert Rutkowski <[email protected]>
Co-authored-by: Roman Isecke <[email protected]>
Co-authored-by: Michal Martyniak <[email protected]>
Co-authored-by: Michał Martyniak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants