Skip to content

Latest commit

 

History

History
243 lines (200 loc) · 10.3 KB

File metadata and controls

243 lines (200 loc) · 10.3 KB

Security


This section describes security controls and recommended practices implemented by the solution.

Amazon SageMaker offers a comprehensive set of security features—including infrastructure security, data protection, authorization, authentication, monitoring, and auditability—to help your organization with security requirements that may apply to ML workloads. Using SageMaker, you can standardize security policies across the entire ML development process to increase your security posture and reduce the time it takes to provide data scientists with access to the data they need, while complying with your organization’s data security requirements.

Network isolation

This solution implements an isolated data science environment deployed into your VPC and provisions the following infrastructure:

SageMaker deployment in VPC

The main design principles and decisions are:

  • SageMaker Studio domain is deployed in a dedicated VPC. Each elastic network interface (ENI) used by SageMaker domain is created within a private dedicated subnet and attached to the specified security groups
  • data science Team VPC can be configured with internet access by attaching a NAT gateway. You can also run this VPC in internet-free mode without any inbound or outbound internet access
  • All access to S3 is routed via S3 VPC endpoints
  • All access to SageMaker API and runtime and the all used AWS public services is routed via VPC endpoints
  • AWS Service Catalog is used to deploy a data science environment and SageMaker project templates
  • All user roles are deployed into data science account IAM
  • Provisioning of all IAM roles is completely separated from the deployment of the data science environment. You can use your own processes to provision the needed IAM roles.
  • All network traffic is transferred over private and secure network links
  • All ingress internet access is blocked for the private subnets and only allowed for NAT gateway route
  • Optionally you can block all internet egress creating a completely internet-free secure environment
  • SageMaker endpoints with a trained, validated, and approved model are hosted in dedicated staging and production accounts in your private VPC

Authentication

  • All access is managed by IAM and can be compliant with your corporate authentication standards
  • All user interfaces can be integrated with your Active Directory or SSO system

Authorization

  • Access to any resource is disabled by default (implicit deny) and must be explicitly authorized in permission or resource policies
  • You can limit access to data, code and training resources by role and job function

Data protection

Artifact management

  • You can block access to public libraries and frameworks
  • Code and model artifacts are securely persisted in AWS CodeCommit repositories

There are following common options to run a private installation or a mirror of Python packages:

Auditability

Security controls

Preventive

We use an IAM role policy which enforce usage of specific security controls. For example, all SageMaker workloads must be created in the VPC with specified security groups and subnets:

{
    "Condition": {
        "Null": {
            "sagemaker:VpcSubnets": "true"
        }
    },
    "Action": [
        "sagemaker:CreateNotebookInstance",
        "sagemaker:CreateHyperParameterTuningJob",
        "sagemaker:CreateProcessingJob",
        "sagemaker:CreateTrainingJob",
        "sagemaker:CreateModel"
    ],
    "Resource": [
        "arn:aws:sagemaker:*:<ACCOUNT_ID>:*"
    ],
    "Effect": "Deny"
}

List of IAM policy conditions for Amazon SageMaker. For more examples, refer to the developer guide.

We use an Amazon S3 bucket policy explicitly denies all access which is not originated from the designated S3 VPC endpoints:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<s3-bucket-name>/*",
                "arn:aws:s3:::<s3-bucket-name>"
            ],
            "Condition": {
                "StringNotEquals": {
                    "aws:sourceVpce": ["<s3-vpc-endpoint-id1>", "<s3-vpc-endpoint-id2>"]
                }
            }
        }
    ]
}

S3 VPC endpoint policy allows access only to the specified S3 project buckets with data, models and CI/CD pipeline artifacts, SageMaker-owned S3 bucket and S3 objects which are used for product provisioning.

Detective

Not implemented in this version Logging and monitoring. You can use the following AWS services:

  • AWS CloudWatch
  • AWS CloudTrail
  • VPC Flow Logs
  • AWS Security Hub
  • Amazon GuardDuty
  • Amazon Macie

Corrective

Not implemented in this version Re-active correction of user actions. For example, you can stop ML instances if the instance type is not approved for use by the data scientist.

Test secure S3 access

To verify the access to the Amazon S3 buckets for the data science environment, you can run the following commands in the Studio terminal:

aws s3 ls

aws s3 ls

The S3 VPC endpoint policy blocks access to S3 ListBuckets operation.

aws s3 ls s3://<sagemaker deployment data S3 bucket name>

aws s3 ls allowed

You can access the data science environment's data or models S3 buckets.

aws s3 mb s3://<any available bucket name>

aws s3 mb

The S3 VPC endpoint policy blocks access to any other S3 bucket.

aws sts get-caller-identity

get role

All operations are performed under the SageMaker execution role.

Test preventive IAM policies

Try to start a training job without VPC attachment:

container_uri = sagemaker.image_uris.retrieve(region=session.region_name, 
                                              framework='xgboost', 
                                              version='1.0-1', 
                                              image_scope='training')

xgb = sagemaker.estimator.Estimator(image_uri=container_uri,
                                    role=sagemaker_execution_role, 
                                    instance_count=2, 
                                    instance_type='ml.m5.xlarge',
                                    output_path='s3://{}/{}/model-artifacts'.format(default_bucket, prefix),
                                    sagemaker_session=sagemaker_session,
                                    base_job_name='reorder-classifier',
                                    volume_kms_key=ebs_kms_id,
                                    output_kms_key=s3_kms_id
                                   )

xgb.set_hyperparameters(objective='binary:logistic',
                        num_round=100)

xgb.fit({'train': train_set_pointer, 'validation': validation_set_pointer})

You get AccessDeniedException because of the explicit Deny in the IAM policy:

start-training-job-without-vpc accessdeniedexception

IAM policy:

{
    "Condition": {
        "Null": {
            "sagemaker:VpcSubnets": "true",
            "sagemaker:VpcSecurityGroup": "true"
        }
    },
    "Action": [
        "sagemaker:CreateNotebookInstance",
        "sagemaker:CreateHyperParameterTuningJob",
        "sagemaker:CreateProcessingJob",
        "sagemaker:CreateTrainingJob",
        "sagemaker:CreateModel"
    ],
    "Resource": [
        "arn:aws:sagemaker:*:<ACCOUNT_ID>:*"
    ],
    "Effect": "Deny"
}

Now add the secure network configuration to the Estimator:

network_config = NetworkConfig(
        enable_network_isolation=False, 
        security_group_ids=env_data["SecurityGroups"],
        subnets=env_data["SubnetIds"],
        encrypt_inter_container_traffic=True)
xgb = sagemaker.estimator.Estimator(
    image_uri=container_uri,
    role=sagemaker_execution_role, 
    instance_count=2, 
    instance_type='ml.m5.xlarge',
    output_path='s3://{}/{}/model-artifacts'.format(default_bucket, prefix),
    sagemaker_session=sagemaker_session,
    base_job_name='reorder-classifier',

    subnets=network_config.subnets,
    security_group_ids=network_config.security_group_ids,
    encrypt_inter_container_traffic=network_config.encrypt_inter_container_traffic,
    enable_network_isolation=network_config.enable_network_isolation,
    volume_kms_key=ebs_kms_id,
    output_kms_key=s3_kms_id

  )

You are able to create and run the training job.


Back to README


Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0