Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(step-functions): add bucketNamePath in item reader #31619

Merged
merged 5 commits into from
Oct 14, 2024

Conversation

ChakshuGupta13
Copy link
Contributor

@ChakshuGupta13 ChakshuGupta13 commented Oct 2, 2024

Issue # (if applicable)

Closes #29409.

Reason for this change

  • For DistributedMap state of StepFunctions, IItemReader currently only allows S3 bucket as input source to be declared statically in CDK.
  • In other words, current CDK implementation only caters to static use-case where we know either bucket or bucketName (from which we can create IBucket) and pass it to IItemReader.
  • Whereas via AWS management console, if we create DistributedMap manually, then we can also convey S3 source dynamically using State Input / JsonPath.
  • In other words, for dynamic use-case, we will neither have bucket nor bucketName i.e. we only know state input variable which will convey bucketName e.g. $.bucketName.
  • So, if we want to use IItemReader for dynamic use case also, then, to avoid making breaking change (e.g. changing type of bucket from IBucket to string), we will:
    • (1) need to make bucket: IBucket an optional prop in ItemReaderProps (refer Making properties optional)
    • (2) and add another optional field bucketNamePath: string to convey state input variable name (e.g. $.bucketName)

Description of changes

  • Add bucketNamePath as optional prop in ItemReaderProps.
  • Make bucket an optional prop instead of required prop in ItemReaderProps.
  • Adapt implementing classes of IItemReader to handle bucket being optional (refer Making properties optional).
  • Add validateItemReader in IItemReader which implementing classes will implement to handle mutual exclusivity of bucket and bucketNamePath.
  • Modify DistributedMap:validateState to validate IItemReader if present.
  • Modify README to explain and add examples relevant to changes.
  • Add new unit-tests for DistributedMap.
  • Add new integration tests to validate deployment and expected result.

Description of how you validated changes

  • Build changes: cd ./packages/aws-cdk-lib/ && yarn build aws-stepfunctions && yarn watch
  • Add and run new unit-tests for DistributedMap: yarn test aws-stepfunctions/test
  • Run yarn rossetta:extract -d aws-stepfunctions -v to confirm validity of README changes
  • Add new integration test and run (with snapshot created):
    • Build once and watch: npx lerna run build --scope=@aws-cdk-testing/framework-integ && yarn watch
    • Test: yarn build --directory test/aws-stepfunctions/test && yarn integ test/aws-stepfunctions/test/integ.item-reader-path-s3-object.js --update-on-failed
    • Verified expected step function execution result during snapshot creation

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

**Reason for this change**
- For `DistributedMap` state of StepFunctions, `IItemReader` currently only allows S3 bucket as input source to be declared statically in CDK.
- In other words, current CDK implementation only caters to static use-case where we know either `bucket` or `bucketName` (from which we can create `IBucket`) and pass it to `IItemReader`.
- Whereas via AWS management console, if we create `DistributedMap` manually, then we can also convey S3 source dynamically using State Input / JsonPath.
- In other words, for dynamic use-case, we will neither have `bucket` nor `bucketName` i.e. we only know state input variable which will convey `bucketName` e.g. `$.bucketName`.
- So, if we want to use `IItemReader` for dynamic use case also, then, to avoid making breaking change (e.g. changing type of `bucket` from `IBucket` to `string`), we will:
  - (1) need to make `bucket: IBucket` an optional prop in `ItemReaderProps` (refer [Making properties optional](https://github.com/aws/jsii/blob/main/packages/jsii-diff/BREAKING_CHANGES.md#making-properties-optional))
  - (2) and add another optional field `bucketNamePath: string` to convey state input variable name (e.g. $.bucketName)

**What changes are being made?**
- Add `bucketNamePath` as optional prop in `ItemReaderProps`.
- Make `bucket` an optional prop instead of required prop in `ItemReaderProps`.
- Adapt implementing classes of `IItemReader` to handle `bucket` being optional (refer [Making properties optional](https://github.com/aws/jsii/blob/main/packages/jsii-diff/BREAKING_CHANGES.md#making-properties-optional)).
- Add `validateItemReader` in `IItemReader` which implementing classes will implement to handle mutual exclusivity of `bucket` and `bucketNamePath`.
- Modify `DistributedMap:validateState` to validate `IItemReader` if present.
- Modify README to explain and add examples relevant to changes.
- Add new unit-tests for `DistributedMap`.
- Add new integration tests to confirm vali

**How are these changes tested?**
- Build changes: `cd ./packages/aws-cdk-lib/ && yarn build aws-stepfunctions && yarn watch`
- Add and run new unit-tests for `DistributedMap`: `yarn test aws-stepfunctions/test`
- Run `yarn rossetta:extract -d aws-stepfunctions -v` to confirm validity of README changes
- Add new integration test and run (with snapshot created):
  - Build once and watch: `npx lerna run build --scope=@aws-cdk-testing/framework-integ && yarn watch`
  - Test: `yarn build --directory test/aws-stepfunctions/test && yarn integ test/aws-stepfunctions/test/integ.item-reader-path-s3-object.js --update-on-failed`
  - Verified expected step function execution result during snapshot creation
@github-actions github-actions bot added the beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK label Oct 2, 2024
@aws-cdk-automation aws-cdk-automation requested a review from a team October 2, 2024 13:23
@github-actions github-actions bot added effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2 labels Oct 2, 2024
@ChakshuGupta13 ChakshuGupta13 marked this pull request as ready for review October 2, 2024 13:59
@ChakshuGupta13
Copy link
Contributor Author

Note: Changes of this PR were suggested as part of discussion from #30836.
Reviewers: @GavinZZ @moelasmar

Copy link
Contributor

@GavinZZ GavinZZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I recalled when we first discussed about this issue, I mentioned that making a property from required to optional is a breaking change. As I discussed more with the team, loosening the restriction can be ok (ideally we want to avoid it if possible). In this scenario, I think this is a valid reason to make it optional.

I think this solution is fine with me. I'll leave it to Mohamed for a final review. Meanwhile I'll close the old PR since that approach is abandoned.

@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Oct 3, 2024
@moelasmar moelasmar self-assigned this Oct 8, 2024
Copy link
Contributor

@moelasmar moelasmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general the change looks good to me, I left some minor comments

@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Oct 11, 2024
@mergify mergify bot dismissed moelasmar’s stale review October 11, 2024 10:34

Pull request has been modified.

Chakshu Gupta added 2 commits October 11, 2024 16:21
- for IItemReader, if bucket is not given, then policy statement can allow step function to list any bucket.
@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Oct 11, 2024
Copy link
Contributor

mergify bot commented Oct 14, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Oct 14, 2024
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 6395bd0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit 97130d8 into aws:main Oct 14, 2024
12 checks passed
Copy link
Contributor

mergify bot commented Oct 14, 2024

Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

Copy link

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 14, 2024
@ChakshuGupta13 ChakshuGupta13 deleted the modify_item_reader branch October 14, 2024 09:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

state machine : Dynamic passing of bucket and key to distributed map
4 participants