-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix/Kafka cloud source couldn't connect, add test #257
Conversation
# Conflicts: # CHANGELOG.md # unstructured_ingest/__version__.py
# Conflicts: # CHANGELOG.md # unstructured_ingest/__version__.py
topic: str, | ||
retries: int = 10, | ||
interval: int = 1, | ||
exists: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What purpose does adding this as an input help with? Seems odd to pass this in compared to the previous approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes it possible to use this function to wait for both topic existance, and non-existance. I used it in the new test.
# Conflicts: # CHANGELOG.md # unstructured_ingest/__version__.py
* fix/Kafka cloud source couldn't connect, add test (#257) * feat/add release branch to PR triggers (#284) * add release branch to PR triggers * omit vectar dest e2e test * fix/Azure AI search - reuse client and close connections (#282) * support personal access token for confluence auth (#275) * feat/determistic ids for uploads (#286) * create deterministic id for upload use * fix id in sql connector * fix: FSSPEC invalid metadata date field types (#279) Few fsspec connectors: SFTP, Azure, Box and GCS havedate_modified and date_created fields of FileDataSourceMetadata class were of type float | None instead of str | None, modified code creating the metadata to cast float timestamps to strings. --------- Co-authored-by: Filip Knefel <[email protected]> * [DS-303] SQL Connectors prevent syntax errors and SQL injection (#273) * Snowflake query data binding fix * Enable SingleSource source connector entry * Fix Snowflake nan issue * Make Singlestore connector more robust against SQL injection * Clean sql upload and add query debug log * Make SQLite connector more robust against sql injection * SQL injection fixes; Changelog and version update * Optimize memory usage of Snowflake uploader * Changelog update: Optimize memory usage of Snowflake uploader * feat/Qdrant dest - add cloud test (#248) * feat/ duckdb destination connector (#285) * 🔀 fix: DS-328 Snowflake Downloader error (#287) * Fix Snowflake downloader * Changelog and version update: Fix Snowflake downloader * Replace Snowflake source connector inheritance with SQL classes * Comment on snowflake dependency name * Get rid of snowflake postgres inheritance. Replaced with SQL. * Fix lint * Version update: Fix Snowflake downloader * feat: Refined box connector to actually use config JSON directly (#258) Refined box connector to actually use config JSON directly --------- Co-authored-by: Mateusz Kuprowski <[email protected]> Co-authored-by: Michal Martyniak <[email protected]> * fix: update fsspec upload paths to work independent of OS (#291) When run on windows Path(<path-object>) converts slashes to backward slashes which are not correctly interpreted when passed to (non-local) fsspec filesystem. Instead of using str(<path-object>) use <path-object>.to_posix() to mitigate this effect in fsspec code. --------- Co-authored-by: Filip Knefel <[email protected]> * fix: properly log elasticsearch upload errors (#289) Original error logging was never called because by default parallel_bulk re-raises exceptions and raises errors for non 2XX responses and these were not caught. We change the logic to catch, log and re-raise errors on our side. Error log is sanitized to remove the uploaded object contents from it. --------- Co-authored-by: Filip Knefel <[email protected]> * chore: update weaviate example (#272) Update Weaviate connector example --------- Co-authored-by: Filip Knefel <[email protected]> * update changelog --------- Co-authored-by: Hubert Rutkowski <[email protected]> Co-authored-by: Filip Knefel <[email protected]> Co-authored-by: Filip Knefel <[email protected]> Co-authored-by: mpolomdeepsense <[email protected]> Co-authored-by: David Potter <[email protected]> Co-authored-by: mateuszkuprowski <[email protected]> Co-authored-by: Mateusz Kuprowski <[email protected]> Co-authored-by: Michal Martyniak <[email protected]>
* initial implementation * going up to indexes * Working with partition now * little trash from development left in the test_e2e * uncomment a left over * Remove Pre check * Addressing some issues from Roman's review * solving async issues with Python 3.10 * solving async issues with Python 3.10 again * revert last change * Implementation similar to ElasticSearch * more changes to the async * run_async implementation * some fixes * black fix * removed unecessary stuf, metadata * Flattening messages into a list * another approach to flattern the messages list * filedata path * Async issues causing problems again * black and ruff * more detailed structured_output * export OVERWRITE_FIXTURES=true * filename change * filename change * filename change * solved issues * solve issues * solve filename issues * file changes * version again * adjustments * filename added * expectation was to have a filename * solved discord pr issues * Fixes * Revert working message fetching code. * Lint. * CI test only discord. * fix discord connector * Overwrite fixtures. * Revert "CI test only discord." This reverts commit 1885bf5. * Remove unnecessary env in github test. * Remove failing clarifai * fix/Kafka cloud source couldn't connect, add test (#257) * feat/add release branch to PR triggers (#284) * add release branch to PR triggers * omit vectar dest e2e test * fix/Azure AI search - reuse client and close connections (#282) * support personal access token for confluence auth (#275) * update discord deps * add discord example (cannot be named discord.py to avoid pythonpath collisions) * make channels required attr, require at least 1 elem * add missing connector_type attr * update version and changelog * revert changes in kafka local * add test for no token * add indexer precheck * pass DISCORD_TOKEN to source e2e tests * add test for no channels * tidy ruff * set DISCORD_CHANNELS secret and use it as an env var * fix flake8 error * quickfix expected num of indexed files in test * refactor discord tests * update fixtures * use @requires_env * split channels string to list * bump version --------- Co-authored-by: mr-unstructured <[email protected]> Co-authored-by: hubert.rutkowski <[email protected]> Co-authored-by: Roman Isecke <[email protected]> Co-authored-by: Hubert Rutkowski <[email protected]> Co-authored-by: Roman Isecke <[email protected]> Co-authored-by: Michal Martyniak <[email protected]> Co-authored-by: Michał Martyniak <[email protected]>
Couple of things here:
There were 2 config errors in the Kafka Cloud source connector.
Also added integration test for it. To have it work, admin will have to set secrets KAFKA_API_KEY , KAFKA_SECRET and KAFKA_BOOTSTRAP_SERVER in repo. EDIT: done, it works:
Kafka source connector has new field: group_id
Make the auth fields be mandatory.