Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(data_classes): add KinesisFirehoseEvent #1540

Merged

Conversation

ryandeivert
Copy link
Contributor

@ryandeivert ryandeivert commented Sep 23, 2022

Issue number: #1539

Summary

Adding KinesisFirehoseEvent data class for handling Kinesis Firehose Lambda events (for Data Transformation).

Changes

Please provide a summary of what's being changed

  • Adding KinesisFirehoseEvent data class and respective sub-DictWrapper classes for handling Kinesis firehose event structure
  • Adding functional test for KinesisFirehoseEvent
    • This handles both text events and json events
  • Updating documentation for KinesisFirehoseEvent

User experience

Please share what the user experience looks like before and after this change

This is included in the documentation updates.

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.


View rendered docs/utilities/data_classes.md

@ryandeivert ryandeivert requested a review from a team as a code owner September 23, 2022 06:31
@ryandeivert ryandeivert requested review from rubenfonseca and removed request for a team September 23, 2022 06:31
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 23, 2022
@boring-cyborg boring-cyborg bot added documentation Improvements or additions to documentation tests labels Sep 23, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 23, 2022

Thanks a lot for your first contribution! Please check out our contributing guidelines and don't hesitate to ask whatever you need.
In the meantime, check out the #python channel on our AWS Lambda Powertools Discord: Invite link

@rubenfonseca rubenfonseca self-assigned this Sep 23, 2022
@rubenfonseca
Copy link
Contributor

@ryandeivert it seems we have a couple of failing tests due to mypy errors. Can you take a look at them, please?

@ryandeivert
Copy link
Contributor Author

hi @rubenfonseca I just pushed a commit that I think will resolve these. would you mind approving the CI job? thanks!

@ryandeivert
Copy link
Contributor Author

sorry @rubenfonseca - I had some syntax errors (missing :) and just fixed it

@ryandeivert ryandeivert force-pushed the ryandeivert-firehose-event branch from 0189976 to 5654d98 Compare September 23, 2022 20:31
@ryandeivert
Copy link
Contributor Author

@rubenfonseca okay I think the latest commit should do it now - ran make mypy locally to confirm, and pre commit hooks passing. sorry about that!

@ryandeivert ryandeivert force-pushed the ryandeivert-firehose-event branch from 5654d98 to e556593 Compare September 24, 2022 05:59
@ryandeivert ryandeivert changed the title kinesis firehose data event feat(data_classes): kinesis firehose data event Sep 24, 2022
@github-actions github-actions bot added the feature New feature or functionality label Sep 24, 2022
@ryandeivert
Copy link
Contributor Author

I've tried to figured out why this is failing here: https://github.com/awslabs/aws-lambda-powertools-python/actions/runs/3117242603/jobs/5057874979

it looks unrelated to my changes, and related to this recently merged change: #1526

I have merged the develop branch into this feature branch in hopes that it resolves that failure(s)

@heitorlessa
Copy link
Contributor

heitorlessa commented Sep 25, 2022 via email

@heitorlessa heitorlessa added the triage Pending triage from maintainers label Sep 25, 2022
@leandrodamascena
Copy link
Contributor

Hi all! Just to help here, I created a Kinesis Firehose using a Lambda as a data transform and the payload is as per the documentation (https://docs.aws.amazon.com/lambda/latest/dg/services-kinesisfirehose.html). There are some optional fields which should be fine to create a test for them.

@rubenfonseca rubenfonseca force-pushed the ryandeivert-firehose-event branch from 55c22dc to 642734f Compare September 27, 2022 14:29
@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2022

Codecov Report

Base: 99.77% // Head: 99.76% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (2ab66fc) compared to base (abb8043).
Patch coverage: 98.46% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1540      +/-   ##
===========================================
- Coverage    99.77%   99.76%   -0.02%     
===========================================
  Files          126      127       +1     
  Lines         5819     5884      +65     
  Branches       663      668       +5     
===========================================
+ Hits          5806     5870      +64     
  Misses           6        6              
- Partials         7        8       +1     
Impacted Files Coverage Δ
...s/utilities/data_classes/kinesis_firehose_event.py 98.43% <98.43%> (ø)
...mbda_powertools/utilities/data_classes/__init__.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@rubenfonseca rubenfonseca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this! Really looking forward to merge this :) A couple of things:

  1. I've rebased with develop so the mypy error is gone
  2. I've extracted the sample in the documentation to the examples folder, so we can cover it with mypy and linting tools
  3. I believe we should create a second test for the case where the Kinesis Firehose is configured with Direct PUT mode. In this case, there's no record metadata. I'm pasting a sample here that you can use
{
    "invocationId": "3af0fd17-6799-4368-a857-ae2c18a47560",
    "deliveryStreamArn": "arn:aws:firehose:eu-north-1:123456789:deliverystream/PUT-HTP-dNJEu",
    "region": "eu-north-1",
    "records": [ 
        {
            "recordId": "49633694907038825242852484918537387102326040674489597954000000", 
            "approximateArrivalTimestamp": 1664287467647,
            "data": "eyJDSEFOR0UiOjEuMjUsIlBSSUNFIjoxMjAuMDEsIlRJQ0tFUl9TWU1CT0wiOiJJT1AiLCJTRUNUT1IiOiJURUNITk9MT0dZIn0="
        }
    ]
}

Let me know if this makes sense :)

docs/utilities/data_classes.md Outdated Show resolved Hide resolved
@rubenfonseca
Copy link
Contributor

@ryandeivert let me know if you want us to make the changes above, so we can merge your PR and include this in our next release!

@leandrodamascena
Copy link
Contributor

Hi @ryandeivert and @rubenfonseca! I'll work to fix small things in this PR and include this feature in the next release, ok?

@ryandeivert
Copy link
Contributor Author

hi @leandrodamascena / @rubenfonseca -- so sorry I fell off on this. you are welcome to make these changes or I can get to it later this week!

@leandrodamascena
Copy link
Contributor

hi @leandrodamascena / @rubenfonseca -- so sorry I fell off on this. you are welcome to make these changes or I can get to it later this week!

No need to apologize! We know everyone is very busy. Be our guest and feel free to fix these small changes 😃

We expect to release a new version next week, so we have time to wait for your update. btw, i will add more comments on this PR.

Thanks for the prompt reply.

@leandrodamascena
Copy link
Contributor

leandrodamascena commented Oct 12, 2022

I made two commits because it only change the type and that reduces the work a little. There was an error in the pipeline, but when you make the final changes, it will be fixed. Thanks

@ryandeivert
Copy link
Contributor Author

hi @leandrodamascena taking a look at this now. will follow up

@ryandeivert
Copy link
Contributor Author

hi @leandrodamascena -- I believe I have addressed all of the requests. I merged the develop branch into this and used the new test events provided in the other recently merged PR

@leandrodamascena
Copy link
Contributor

leandrodamascena commented Oct 14, 2022

hi @leandrodamascena -- I believe I have addressed all of the requests. I merged the develop branch into this and used the new test events provided in the other recently merged PR

Hiiii @ryandeivert! That's amazing that you can handle all the changes quickly! I will check this tomorrow!

I see that the pipeline is failing some test, but don't worry if you can't solve it now, I can fix that.

Thank you!!

@ryandeivert
Copy link
Contributor Author

@leandrodamascena sorry about that! I think it should be fixed now..

Comment on lines +75 to +79
def data_as_json(self) -> dict:
"""Decoded base64-encoded data loaded to json"""
if self._json_data is None:
self._json_data = json.loads(self.data_as_text)
return self._json_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to review this piece of code.
This is not your fault, we have this method/property as default in all classes and this can be a problem for records that allow non-json data like firehose put is and maybe others.. Check this out

image
image
image

We'll discuss this in our daily sync on Monday and I back here with updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EAFP

@leandrodamascena
Copy link
Contributor

@leandrodamascena sorry about that! I think it should be fixed now..

Hi @ryandeivert pls don't apologize, you did a great job! Everything is working as expected (I tested it locally) and I'm just worried about this.

If we assume that all data is a JSON object can be a problem for those who use Firehose DataClass Events - and maybe others DataClass Events - and can affect the user experience. Let me discuss this internally and I back here with updates.

image

@ryandeivert
Copy link
Contributor Author

I'm just worried about this.

If we assume that all data is a JSON object can be a problem for those who use Firehose DataClass Events - and maybe others DataClass Events - and can affect the user experience. Let me discuss this internally and I back here with updates.

@leandrodamascena thanks for all the feedback! when I'd robbed that logic from another data class, I did have this same question. however, my conclusion at the time was that users implementing this event source will likely know whether or not their data is in fact json, and would use the appropriate function accordingly.

In the event (for whatever reason) that users do not know for certain, I would foresee them wrapping the call to data_as_json in a try/except block to handle the JSONDecodeError and simply fall-back on the data_as_text value.

This method simply leverages Python's duck-typing concept and IMO isn't something you need to address in this convenience function (aside from adding documentation that it could raise a JSONDecodeError/ValueError). The other alternative would be dropping the function altogether and leaving all data decoding to be an exercise for the user (maybe the preference here in this new class, but would break backwards compatibility in other classes).

I'm fine with whatever you all land on for this - just let me know the outcome and if you want me to make any additional changes!

@leandrodamascena
Copy link
Contributor

In the event (for whatever reason) that users do not know for certain, I would foresee them wrapping the call to data_as_json in a try/except block to handle the JSONDecodeError and simply fall-back on the data_as_text value.

I think it might be a good solution.

This method simply leverages Python's duck-typing concept and IMO isn't something you need to address in this convenience function (aside from adding documentation that it could raise a JSONDecodeError/ValueError). The other alternative would be dropping the function altogether and leaving all data decoding to be an exercise for the user (maybe the preference here in this new class, but would break backwards compatibility in other classes).

We avoid breaking changes without proper communication with our clients, so removing this in other classes isn't a good idea in fact.

I'm fine with whatever you all land on for this - just let me know the outcome and if you want me to make any additional changes!

Let me share your thoughts with others and I'll be back here with updates on Monday/Tuesday.

Thanks.

@leandrodamascena
Copy link
Contributor

@leandrodamascena thanks for all the feedback! when I'd robbed that logic from another data class, I did have this same question. however, my conclusion at the time was that users implementing this event source will likely know whether or not their data is in fact json, and would use the appropriate function accordingly.

Hi @ryandeivert! We think this is the best approach and users can handle the exception on their side if the data is not json. Using the other solution to fall back to the data_as_text value (if not JSON) can be difficult for users to deal with casting and data type.

So I think this PR is ready to merge. If you don't have any other consideration we can merge it.

@ryandeivert
Copy link
Contributor Author

@leandrodamascena sounds good to me! nothing else on my end :) feel free to merge when you see fit!

@leandrodamascena leandrodamascena changed the title feat(data_classes): kinesis firehose data event feat(data_classes): add KinesisFirehoseEvent Oct 17, 2022
@leandrodamascena leandrodamascena merged commit 416ab1b into aws-powertools:develop Oct 17, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 17, 2022

Awesome work, congrats on your first merged pull request and thank you for helping improve everyone's experience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation feature New feature or functionality size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tests triage Pending triage from maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Kinesis Data Firehose event data class
5 participants