Skip to content

Latest commit

 

History

History
145 lines (114 loc) · 4.66 KB

README.md

File metadata and controls

145 lines (114 loc) · 4.66 KB

EMR Serverless Samples

This repository contains example code for getting started with EMR Serverless and using it with Apache Spark and Apache Hive.

In addition, it provides Container Images for both the Spark History Server and Tez UI in order to debug your jobs.

For full details about using EMR Serverless, please see the EMR Serverless documentation.

Pre-Requisites

These demos assume you are using an Administrator-level role in your AWS account

  1. Amazon EMR Serverless is currently in preview. Please follow the sign-up steps at https://pages.awscloud.com/EMR-Serverless-Preview.html to request access.

  2. Setup CLI Access for Amazon EMR Serverless

aws s3 cp s3://elasticmapreduce/emr-serverless-preview/artifacts/latest/dev/cli/service.json ./service.json
aws configure add-model --service-model file://service.json

aws configure set region us-east-1
aws emr-serverless list-applications
aws emr-serverless help 
  1. Create an Amazon S3 bucket in the us-east-1 region
export S3_BUCKET=<>
aws s3 mb s3://${S3_BUCKET} --region us-east-1
export SOURCE_ROOT=<path-to-root-of source ~/environment/emr-serverless-samples>
  1. Create an EMR Serverless execution role (replacing BUCKET-NAME with the one you created above)

This role provides both S3 access for specific buckets as well as full read and write access to the Glue Data Catalog.

aws iam create-role --role-name emr-serverless-job-role --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {
          "Service": "emr-serverless.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
    ]
  }'
aws iam put-role-policy --role-name emr-serverless-job-role --policy-name S3Access --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ReadFromOutputAndInputBuckets",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::*.elasticmapreduce",
                "arn:aws:s3:::*.elasticmapreduce/*",
                "arn:aws:s3:::noaa-gsod-pds",
                "arn:aws:s3:::noaa-gsod-pds/*",
                "arn:aws:s3:::'${S3_BUCKET}'",
                "arn:aws:s3:::'${S3_BUCKET}'/*"
            ]
        },
        {
            "Sid": "WriteToOutputDataBucket",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::'${S3_BUCKET}'/*"
            ]
        }
    ]
}'

aws iam put-role-policy --role-name emr-serverless-job-role --policy-name GlueAccess --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Sid": "GlueCreateAndReadDataCatalog",
        "Effect": "Allow",
        "Action": [
            "glue:CreateDatabase",
            "glue:GetDatabase",
            "glue:GetDataBases",
            "glue:CreateTable",
            "glue:GetTable",
            "glue:GetTables",
            "glue:DeleteTable",
            "glue:UpdateTable",
            "glue:GetPartition",
            "glue:GetPartitions",
            "glue:CreatePartition",
            "glue:DeletePartition",
            "glue:BatchCreatePartition",
            "glue:GetUserDefinedFunctions",
            "glue:BatchDeletePartition"
        ],
        "Resource": ["*"]
      }
    ]
  }'
  1. Optional - Refer Spark UI to build Spark UI Package, if you want to use Spark History Server to monitor Spark Job.

  2. Optional - Refer Tez UI , to build Tez UI Package, if you want to use Tez UI to monitor Hive Job.

Examples

Utilities

  • Spark UI

    You can use this Dockerfile to run Spark history server in your container.

  • Tez UI

    You can use this Dockerfile to run Tez UI and Application Timeline Server in your container.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.