This section describes security controls and recommended practices implemented by the solution.
Amazon SageMaker offers a comprehensive set of security features—including infrastructure security, data protection, authorization, authentication, monitoring, and auditability—to help your organization with security requirements that may apply to ML workloads. Using SageMaker, you can standardize security policies across the entire ML development process to increase your security posture and reduce the time it takes to provide data scientists with access to the data they need, while complying with your organization’s data security requirements.
This solution implements an isolated data science environment deployed into your VPC and provisions the following infrastructure:
The main design principles and decisions are:
- SageMaker Studio domain is deployed in a dedicated VPC. Each elastic network interface (ENI) used by SageMaker domain is created within a private dedicated subnet and attached to the specified security groups
data science Team VPC
can be configured with internet access by attaching a NAT gateway. You can also run this VPC in internet-free mode without any inbound or outbound internet access- All access to S3 is routed via S3 VPC endpoints
- All access to SageMaker API and runtime and the all used AWS public services is routed via VPC endpoints
- AWS Service Catalog is used to deploy a data science environment and SageMaker project templates
- All user roles are deployed into data science account IAM
- Provisioning of all IAM roles is completely separated from the deployment of the data science environment. You can use your own processes to provision the needed IAM roles.
- All network traffic is transferred over private and secure network links
- All ingress internet access is blocked for the private subnets and only allowed for NAT gateway route
- Optionally you can block all internet egress creating a completely internet-free secure environment
- SageMaker endpoints with a trained, validated, and approved model are hosted in dedicated staging and production accounts in your private VPC
- All access is managed by IAM and can be compliant with your corporate authentication standards
- All user interfaces can be integrated with your Active Directory or SSO system
- Access to any resource is disabled by default (implicit deny) and must be explicitly authorized in permission or resource policies
- You can limit access to data, code and training resources by role and job function
- All data is encrypted in-transit and at-rest using customer-managed AWS KMS keys
- You can block access to public libraries and frameworks
- Code and model artifacts are securely persisted in AWS CodeCommit repositories
There are following common options to run a private installation or a mirror of Python packages:
- private PyPi server running on EC2 instances: example
- private PyPi mirror running on ECS or Fargate: example and workshop
- use AWS CodeArtifact to host PyPi: example
- use S3 to host custom channels for Conda repository: example
- The solution can provide end-to-end auditability with AWS CloudTrail, AWS Config, and Amazon CloudWatch
- Network traffic can be captured at individual network interface level
We use an IAM role policy which enforce usage of specific security controls. For example, all SageMaker workloads must be created in the VPC with specified security groups and subnets:
{
"Condition": {
"Null": {
"sagemaker:VpcSubnets": "true"
}
},
"Action": [
"sagemaker:CreateNotebookInstance",
"sagemaker:CreateHyperParameterTuningJob",
"sagemaker:CreateProcessingJob",
"sagemaker:CreateTrainingJob",
"sagemaker:CreateModel"
],
"Resource": [
"arn:aws:sagemaker:*:<ACCOUNT_ID>:*"
],
"Effect": "Deny"
}
List of IAM policy conditions for Amazon SageMaker. For more examples, refer to the developer guide.
We use an Amazon S3 bucket policy explicitly denies all access which is not originated from the designated S3 VPC endpoints:
{
"Version": "2008-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<s3-bucket-name>/*",
"arn:aws:s3:::<s3-bucket-name>"
],
"Condition": {
"StringNotEquals": {
"aws:sourceVpce": ["<s3-vpc-endpoint-id1>", "<s3-vpc-endpoint-id2>"]
}
}
}
]
}
S3 VPC endpoint policy allows access only to the specified S3 project buckets with data, models and CI/CD pipeline artifacts, SageMaker-owned S3 bucket and S3 objects which are used for product provisioning.
Not implemented in this version Logging and monitoring. You can use the following AWS services:
- AWS CloudWatch
- AWS CloudTrail
- VPC Flow Logs
- AWS Security Hub
- Amazon GuardDuty
- Amazon Macie
Not implemented in this version Re-active correction of user actions. For example, you can stop ML instances if the instance type is not approved for use by the data scientist.
To verify the access to the Amazon S3 buckets for the data science environment, you can run the following commands in the Studio terminal:
aws s3 ls
The S3 VPC endpoint policy blocks access to S3 ListBuckets
operation.
aws s3 ls s3://<sagemaker deployment data S3 bucket name>
You can access the data science environment's data or models S3 buckets.
aws s3 mb s3://<any available bucket name>
The S3 VPC endpoint policy blocks access to any other S3 bucket.
aws sts get-caller-identity
All operations are performed under the SageMaker execution role.
Try to start a training job without VPC attachment:
container_uri = sagemaker.image_uris.retrieve(region=session.region_name,
framework='xgboost',
version='1.0-1',
image_scope='training')
xgb = sagemaker.estimator.Estimator(image_uri=container_uri,
role=sagemaker_execution_role,
instance_count=2,
instance_type='ml.m5.xlarge',
output_path='s3://{}/{}/model-artifacts'.format(default_bucket, prefix),
sagemaker_session=sagemaker_session,
base_job_name='reorder-classifier',
volume_kms_key=ebs_kms_id,
output_kms_key=s3_kms_id
)
xgb.set_hyperparameters(objective='binary:logistic',
num_round=100)
xgb.fit({'train': train_set_pointer, 'validation': validation_set_pointer})
You get AccessDeniedException
because of the explicit Deny
in the IAM policy:
IAM policy:
{
"Condition": {
"Null": {
"sagemaker:VpcSubnets": "true",
"sagemaker:VpcSecurityGroup": "true"
}
},
"Action": [
"sagemaker:CreateNotebookInstance",
"sagemaker:CreateHyperParameterTuningJob",
"sagemaker:CreateProcessingJob",
"sagemaker:CreateTrainingJob",
"sagemaker:CreateModel"
],
"Resource": [
"arn:aws:sagemaker:*:<ACCOUNT_ID>:*"
],
"Effect": "Deny"
}
Now add the secure network configuration to the Estimator
:
network_config = NetworkConfig(
enable_network_isolation=False,
security_group_ids=env_data["SecurityGroups"],
subnets=env_data["SubnetIds"],
encrypt_inter_container_traffic=True)
xgb = sagemaker.estimator.Estimator(
image_uri=container_uri,
role=sagemaker_execution_role,
instance_count=2,
instance_type='ml.m5.xlarge',
output_path='s3://{}/{}/model-artifacts'.format(default_bucket, prefix),
sagemaker_session=sagemaker_session,
base_job_name='reorder-classifier',
subnets=network_config.subnets,
security_group_ids=network_config.security_group_ids,
encrypt_inter_container_traffic=network_config.encrypt_inter_container_traffic,
enable_network_isolation=network_config.enable_network_isolation,
volume_kms_key=ebs_kms_id,
output_kms_key=s3_kms_id
)
You are able to create and run the training job.
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0