-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: environment variables specified in Model or Estimator are not passed through to SageMaker ModelStep #160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for putting this fix together. had one small comment around test coverage
src/stepfunctions/steps/sagemaker.py
Outdated
parameters = model_config(model=model, instance_type=instance_type, role=model.role, image_uri=model.image_uri) | ||
if model_name: | ||
parameters['ModelName'] = model_name | ||
elif isinstance(model, Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this backwards compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this generated the same parameters as the old way of doing, but takes into account the env variables as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. If model
is the Model
base class, parameters will have new fields that weren't there before. It's not just PrimaryContainer.Evnironment
, but VpcConfig
and other fields in the container definition.
I don't have enough context to know what effect that has, but instantiating a ModelStep with the same model will not produce the same parameters as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right - it shouldn't produce the same parameters if vpc_config
or an image_config
(see here) were provided in the Model as they would be included in the parameters as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good point to consider why those params were initially omitted, but I think it makes sense to have consistency between parameters generated from FrameworkModel with the ones generated from Model
With a FrameworkModel, params would also include vpc_config
, but not image_config
(container_def() is called without image_config
arg here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it shouldn't produce the same parameters if vpc_config or an image_config (see here) were provided
can we add the tests that would have caught this. our coverage might be limited
It is a good point to consider why those params were initially omitted,
Do the commits where they were introduced provide that context? In any case, we need to preserve existing behaviour. Customers who upgrade without changing any of their code should be able to do that without unexpected mutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the tests that would have caught this. our coverage might be limited
Yes, i will add test to ensure there is no regression and/or breaking changes
Do the commits where they were introduced provide that context? In any case, we need to preserve existing behaviour. Customers who upgrade without changing any of their code should be able to do that without unexpected mutations.
It was included in the initial commit, so no insight on why they were omitted
'ExecutionRoleArn': model.role, | ||
'ModelName': model_name or model.name, | ||
'PrimaryContainer': { | ||
'Environment': {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the original change that this one superseded (#84) only modified this line. the rationale/need for this change isn't quite captured in the issue this is resolving. Although it's supported, we need to get into why we are making this change.
As it modifies your control flow, also suggest adding tests for "all model types"
edit: do we have tests that pass in a FrameworkModel and a Model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted for this change since I saw it was supported in sagemaker sdk since the issue was opened, but I agree that we should not introduce breaking changes - even more if the issue does not capture the need to.
I'll revert the changes and go for the solution that was proposed in #84 and add test to validate that behaviour is the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: do we have tests that pass in a FrameworkModel and a Model?
We have tests that use a FrameworkModel (ex: test_training_step_creation_with_framework() and others that use a Model (ex: test_training_step_creation_with_model()), but none that pass both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but none that pass both.
I didn't mean in a single test (is that even possible). was just verifying that both parts of the control flow are being tested. I would assume that if they already existed, one of them would have broken with the attempt to introduce a breaking change.
if that didn't happen, i think it surfaces a gap in testing and we should use this opportunity to plug it in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test in the latest commit that confirms we are consistent
The test will help catch future breaking changes
@patch('botocore.client.BaseClient._make_api_call', new=mock_boto_api_call) | ||
@patch.object(boto3.session.Session, 'region_name', 'us-east-1') | ||
def test_training_step_creation_with_model_with_env(pca_estimator_with_env): | ||
training_step = TrainingStep('Training', estimator=pca_estimator_with_env, job_name='TrainingJob') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is using a new model with a defined env, vpc_config and image_config
vpc_config and image_config are not passed to ModelStep - this confirms that it is consistent with the current behaviour and will allow us to catch a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not blocking: but please follow this guidance for future contributions)
You're following the existing conventions here, but let's try to structure the tests as documentation so they are easier to read in the future.
- Name the test cases in a way that describes the functionality you are verifying. e.g.
test_training_step_get_expected_model_returns_model_with_environment
- Scope down the assertions to specific pieces if you're testing specific things. e.g.
assert training_step.get_expected_model(...).env == # the environment
. You can do have multiple and smaller asserts in a single test. - If something is already covered in other test cases, you don't to repeat it every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
I'll take note to add these conventions in the CONTRIBUTING guide as well to ensure contributors have access to these
if self.estimator.environment: | ||
model.env = self.estimator.environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CR description should also capture why we're doing this. It's not strictly related to the bug reported, but is solving an issue. Typically better to fix things separately, but when there are multiple fixes, they need to be captured in the commit summary, along with rationale and any testing that was performed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given where we currently are, i'm not strongly opinionated because it feels more practical to go with it. It feels like it might just be busy work splitting them up in the current state.
Generally and going forward, I think we should keep changes contained and specific to the bug they address / feature they introduce for a few reasons:
- reverts are simpler if there's a problem with one fix and not with another
- the discovered bug may not have been triaged with a process of repro steps, expected behaviour, actual behaviour to converge on a candidate fix.
- simpler for reviewers and contributors - simplifies life for everyone involved
- helps to clearly see what was fixed for inclusion in changelog when we perform next release.
- this PR grows the scope of the change that it supersedes
having said that, i would probably still split it up if I were the one doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's already implemented and works, let's keep it in here, since it's a small enough change. Just update the PR description accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done :)
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Issue #, if available: #82
Description of changes:
1. fix: environment variables are overwritten and not passed through to SageMaker ModelStep
New bug uncovered:
2. fix: env variables defined in the Estimator are not translated to Model when calling
TrainingStep.get_expected_model()
pca_estimator_with_env
) with defined env parameters to use in test and confirm that the env variables are passed through to the ModelTrainingStep.get_expected_model()
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.