Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase endpointInServiceTimeout from 10m to 60m #39090

Merged
merged 2 commits into from
Sep 3, 2024

Conversation

bruceadowns
Copy link
Contributor

@bruceadowns bruceadowns commented Aug 29, 2024

Description

SageMaker endpoints are now taking a little over 10 minutes to become InService. This aligns the hardcoded endpoint timeout to 60 minutes.

This primarily effects SageMaker serverless endpoints. As an example, a test endpoint that deploys in >10m is using an xgboost model and associated container.

i.e.

...
module.tf_modules_sagemaker.module.endpoint.aws_sagemaker_endpoint.sagemaker_endpoint[0]: Still creating... [9m40s elapsed]
module.tf_modules_sagemaker.module.endpoint.aws_sagemaker_endpoint.sagemaker_endpoint[0]: Still creating... [9m50s elapsed]
module.tf_modules_sagemaker.module.endpoint.aws_sagemaker_endpoint.sagemaker_endpoint[0]: Still creating... [10m0s elapsed]
╷
│ Error: waiting for SageMaker Endpoint (ep-demo-sample-svrless) to be in service: timeout while waiting for state to become 'InService' (last state: 'Creating', timeout: 10m0s)
│ 
│   with module.tf_modules_sagemaker.module.endpoint.aws_sagemaker_endpoint.sagemaker_endpoint[0],
│   on .terraform/modules/tf_modules_sagemaker/modules/endpoint/main.tf line 55, in resource "aws_sagemaker_endpoint" "sagemaker_endpoint":
│   55: resource "aws_sagemaker_endpoint" "sagemaker_endpoint" {

Relations

Closes #0000

References

Output from Acceptance Testing

% make testacc TESTS=TestAccXXX PKG=ec2

...

Copy link

Community Note

Voting for Prioritization

  • Please vote on this pull request by adding a 👍 reaction to the original post to help the community and maintainers prioritize this pull request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

For Submitters

  • Review the contribution guide relating to the type of change you are making to ensure all of the necessary steps have been taken.
  • For new resources and data sources, use skaff to generate scaffolding with comments detailing common expectations.
  • Whether or not the branch has been rebased will not impact prioritization, but doing so is always a welcome surprise.

@github-actions github-actions bot added service/sagemaker Issues and PRs that pertain to the sagemaker service. needs-triage Waiting for first response or review from a maintainer. labels Aug 29, 2024
@bruceadowns bruceadowns marked this pull request as ready for review August 29, 2024 23:17
@bruceadowns bruceadowns requested a review from a team as a code owner August 29, 2024 23:17
Copy link

@Tradunsky Tradunsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn, 10 min startup time...

Have you found the reason why it takes so long? A heavy docker image or we download something during startup?

@justinretzolk justinretzolk added enhancement Requests to existing resources that expand the functionality or scope. timeouts Pertains to timeout increases. and removed needs-triage Waiting for first response or review from a maintainer. labels Aug 30, 2024
@bruceadowns
Copy link
Contributor Author

damn, 10 min startup time...

Have you found the reason why it takes so long? A heavy docker image or we download something during startup?

(updated pr comment with...)

This primarily effects SageMaker serverless endpoints. As an example, a test endpoint that deploys in >10m is using an xgboost model and associated container, using 4g of memory.

@ewbankkit ewbankkit self-assigned this Sep 3, 2024
@github-actions github-actions bot added the prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. label Sep 3, 2024
Copy link
Contributor

@ewbankkit ewbankkit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀.

@ewbankkit
Copy link
Contributor

@bruceadowns Thanks for the contribution 🎉 👏.

@ewbankkit ewbankkit merged commit a1f8260 into hashicorp:main Sep 3, 2024
24 checks passed
@github-actions github-actions bot added this to the v5.66.0 milestone Sep 3, 2024
@github-actions github-actions bot removed the prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. label Sep 5, 2024
Copy link

github-actions bot commented Sep 5, 2024

This functionality has been released in v5.66.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

Copy link

github-actions bot commented Oct 7, 2024

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/sagemaker Issues and PRs that pertain to the sagemaker service. timeouts Pertains to timeout increases.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants