Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sqs queue extractor #318

Merged
merged 17 commits into from
Jun 20, 2024
Merged

Add sqs queue extractor #318

merged 17 commits into from
Jun 20, 2024

Conversation

grantleehoffman
Copy link
Contributor

@grantleehoffman grantleehoffman commented Jun 17, 2024

Adds Queue extractor and SQS connector for processing queue messages from a SQS queue.
Has property delete_after_read which defaults to True. This will delete queue message batches after successfully reading to pipeline from extractor step.

Future Feature: Add in tracking of message processing through pipeline and delete once all writes are finished for each message.

Copy link

codecov bot commented Jun 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.74%. Comparing base (4b0b796) to head (df9eee5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #318      +/-   ##
==========================================
+ Coverage   97.70%   97.74%   +0.04%     
==========================================
  Files         139      142       +3     
  Lines        4662     4754      +92     
==========================================
+ Hits         4555     4647      +92     
  Misses        107      107              
Flag Coverage Δ
3.10-macos-latest 97.74% <100.00%> (+0.04%) ⬆️
3.10-ubuntu-latest 97.74% <100.00%> (+0.04%) ⬆️
3.10-windows-latest 97.70% <100.00%> (+0.04%) ⬆️
3.11-macos-latest 97.74% <100.00%> (+0.04%) ⬆️
3.11-ubuntu-latest 97.74% <100.00%> (+0.04%) ⬆️
3.11-windows-latest 97.70% <100.00%> (+0.04%) ⬆️
3.12-macos-latest 97.74% <100.00%> (+0.04%) ⬆️
3.12-ubuntu-latest 97.74% <100.00%> (+0.04%) ⬆️
3.12-windows-latest 97.70% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@grantleehoffman grantleehoffman marked this pull request as ready for review June 18, 2024 15:32
@yasonk
Copy link
Contributor

yasonk commented Jun 18, 2024

@grantleehoffman Should this have an update in the documentation? Maybe worth adding it to the docs here:
https://nodestream-proj.github.io/docs/docs/reference/extractors/

nodestream/pipeline/extractors/queues/sqs.py Outdated Show resolved Hide resolved
nodestream/pipeline/extractors/queues/extractor.py Outdated Show resolved Hide resolved
nodestream/pipeline/extractors/queues/extractor.py Outdated Show resolved Hide resolved
"""

@classmethod
def from_file_data(cls, connector: str, record_format: str, **connector_args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to add a test case for this? It looks like it may be tricky and that may have been why you omitted it in which case I am okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

nodestream/pipeline/extractors/queues/sqs.py Show resolved Hide resolved
@@ -0,0 +1,88 @@
import json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe overkill, but could be beneficial to add some design diagrams to the developer reference: https://nodestream-proj.github.io/docs/docs/category/developer-reference/

@@ -0,0 +1,122 @@
# Customizing the Queue Extractor

The `QueueExtractor` is responsible for extracting data from a queue source and converting it into a stream of records. The `QueueExtractor` is a subclass of `nodestream.pipeline.extractors:Extractor` and is responsible for the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep this here for now, but we'll need to move these docs to somewhere like here so they are on the public facing docs: https://github.com/nodestream-proj/docs/blob/main/docs/tutorials-advanced/extending-the-dsl.mdx

@@ -55,6 +55,54 @@ set the `record_format` to be `json` in the `StreamExtractor` configuration. For
record_format: json
```

## `QueueExtractor`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a pr for the public docs after we finalize this pr.

@grantleehoffman
Copy link
Contributor Author

I will add an update to the public docs once these changes have gone through review and are finalized.

@grantleehoffman grantleehoffman merged commit dc87832 into main Jun 20, 2024
13 checks passed
@grantleehoffman grantleehoffman deleted the add_sqs_queue_extractor branch June 20, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants