Security

This section describes security controls and recommended practices implemented by the solution.

Amazon SageMaker offers a comprehensive set of security features—including infrastructure security, data protection, authorization, authentication, monitoring, and auditability—to help your organization with security requirements that may apply to ML workloads. Using SageMaker, you can standardize security policies across the entire ML development process to increase your security posture and reduce the time it takes to provide data scientists with access to the data they need, while complying with your organization’s data security requirements.

Network isolation

This solution implements an isolated data science environment deployed into your VPC and provisions the following infrastructure:

The main design principles and decisions are:

SageMaker Studio domain is deployed in a dedicated VPC. Each elastic network interface (ENI) used by SageMaker domain is created within a private dedicated subnet and attached to the specified security groups
data science Team VPC can be configured with internet access by attaching a NAT gateway. You can also run this VPC in internet-free mode without any inbound or outbound internet access
All access to S3 is routed via S3 VPC endpoints
All access to SageMaker API and runtime and the all used AWS public services is routed via VPC endpoints
AWS Service Catalog is used to deploy a data science environment and SageMaker project templates
All user roles are deployed into data science account IAM
Provisioning of all IAM roles is completely separated from the deployment of the data science environment. You can use your own processes to provision the needed IAM roles.
All network traffic is transferred over private and secure network links
All ingress internet access is blocked for the private subnets and only allowed for NAT gateway route
Optionally you can block all internet egress creating a completely internet-free secure environment
SageMaker endpoints with a trained, validated, and approved model are hosted in dedicated staging and production accounts in your private VPC

Authentication

All access is managed by IAM and can be compliant with your corporate authentication standards
All user interfaces can be integrated with your Active Directory or SSO system

Authorization

Access to any resource is disabled by default (implicit deny) and must be explicitly authorized in permission or resource policies
You can limit access to data, code and training resources by role and job function

Data protection

All data is encrypted in-transit and at-rest using customer-managed AWS KMS keys

Artifact management

You can block access to public libraries and frameworks
Code and model artifacts are securely persisted in AWS CodeCommit repositories

There are following common options to run a private installation or a mirror of Python packages:

private PyPi server running on EC2 instances: example
private PyPi mirror running on ECS or Fargate: example and workshop
use AWS CodeArtifact to host PyPi: example
use S3 to host custom channels for Conda repository: example

Auditability

The solution can provide end-to-end auditability with AWS CloudTrail, AWS Config, and Amazon CloudWatch
Network traffic can be captured at individual network interface level

Security controls

Preventive

We use an IAM role policy which enforce usage of specific security controls. For example, all SageMaker workloads must be created in the VPC with specified security groups and subnets:

{
    "Condition": {
        "Null": {
            "sagemaker:VpcSubnets": "true"
        }
    },
    "Action": [
        "sagemaker:CreateNotebookInstance",
        "sagemaker:CreateHyperParameterTuningJob",
        "sagemaker:CreateProcessingJob",
        "sagemaker:CreateTrainingJob",
        "sagemaker:CreateModel"
    ],
    "Resource": [
        "arn:aws:sagemaker:*:<ACCOUNT_ID>:*"
    ],
    "Effect": "Deny"
}

List of IAM policy conditions for Amazon SageMaker. For more examples, refer to the developer guide.

We use an Amazon S3 bucket policy explicitly denies all access which is not originated from the designated S3 VPC endpoints:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<s3-bucket-name>/*",
                "arn:aws:s3:::<s3-bucket-name>"
            ],
            "Condition": {
                "StringNotEquals": {
                    "aws:sourceVpce": ["<s3-vpc-endpoint-id1>", "<s3-vpc-endpoint-id2>"]
                }
            }
        }
    ]
}

S3 VPC endpoint policy allows access only to the specified S3 project buckets with data, models and CI/CD pipeline artifacts, SageMaker-owned S3 bucket and S3 objects which are used for product provisioning.

Detective

Not implemented in this version Logging and monitoring. You can use the following AWS services:

AWS CloudWatch
AWS CloudTrail
VPC Flow Logs
AWS Security Hub
Amazon GuardDuty
Amazon Macie

Corrective

Not implemented in this version Re-active correction of user actions. For example, you can stop ML instances if the instance type is not approved for use by the data scientist.

Test secure S3 access

To verify the access to the Amazon S3 buckets for the data science environment, you can run the following commands in the Studio terminal:

aws s3 ls

The S3 VPC endpoint policy blocks access to S3 ListBuckets operation.

aws s3 ls s3://<sagemaker deployment data S3 bucket name>

You can access the data science environment's data or models S3 buckets.

aws s3 mb s3://<any available bucket name>

The S3 VPC endpoint policy blocks access to any other S3 bucket.

aws sts get-caller-identity

All operations are performed under the SageMaker execution role.

Test preventive IAM policies

Try to start a training job without VPC attachment:

container_uri = sagemaker.image_uris.retrieve(region=session.region_name, 
                                              framework='xgboost', 
                                              version='1.0-1', 
                                              image_scope='training')

xgb = sagemaker.estimator.Estimator(image_uri=container_uri,
                                    role=sagemaker_execution_role, 
                                    instance_count=2, 
                                    instance_type='ml.m5.xlarge',
                                    output_path='s3://{}/{}/model-artifacts'.format(default_bucket, prefix),
                                    sagemaker_session=sagemaker_session,
                                    base_job_name='reorder-classifier',
                                    volume_kms_key=ebs_kms_id,
                                    output_kms_key=s3_kms_id
                                   )

xgb.set_hyperparameters(objective='binary:logistic',
                        num_round=100)

xgb.fit({'train': train_set_pointer, 'validation': validation_set_pointer})

You get AccessDeniedException because of the explicit Deny in the IAM policy:

IAM policy:

{
    "Condition": {
        "Null": {
            "sagemaker:VpcSubnets": "true",
            "sagemaker:VpcSecurityGroup": "true"
        }
    },
    "Action": [
        "sagemaker:CreateNotebookInstance",
        "sagemaker:CreateHyperParameterTuningJob",
        "sagemaker:CreateProcessingJob",
        "sagemaker:CreateTrainingJob",
        "sagemaker:CreateModel"
    ],
    "Resource": [
        "arn:aws:sagemaker:*:<ACCOUNT_ID>:*"
    ],
    "Effect": "Deny"
}

Now add the secure network configuration to the Estimator:

network_config = NetworkConfig(
        enable_network_isolation=False, 
        security_group_ids=env_data["SecurityGroups"],
        subnets=env_data["SubnetIds"],
        encrypt_inter_container_traffic=True)

xgb = sagemaker.estimator.Estimator(
    image_uri=container_uri,
    role=sagemaker_execution_role, 
    instance_count=2, 
    instance_type='ml.m5.xlarge',
    output_path='s3://{}/{}/model-artifacts'.format(default_bucket, prefix),
    sagemaker_session=sagemaker_session,
    base_job_name='reorder-classifier',

    subnets=network_config.subnets,
    security_group_ids=network_config.security_group_ids,
    encrypt_inter_container_traffic=network_config.encrypt_inter_container_traffic,
    enable_network_isolation=network_config.enable_network_isolation,
    volume_kms_key=ebs_kms_id,
    output_kms_key=s3_kms_id

  )

You are able to create and run the training job.

Back to README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security.md

security.md

Security

Network isolation

Authentication

Authorization

Data protection

Artifact management

Auditability

Security controls

Preventive

Detective

Corrective

Test secure S3 access

Test preventive IAM policies

Files

security.md

Latest commit

History

security.md

File metadata and controls

Security

Network isolation

Authentication

Authorization

Data protection

Artifact management

Auditability

Security controls

Preventive

Detective

Corrective

Test secure S3 access

Test preventive IAM policies