emr-serverless-samples/demo.md at main · akshayar/emr-serverless-samples · GitHub

EMR Serverless: Getting Started with Spark Job

Create Application , Spark Job with word count
1. Create Bucket and Role
2. Create Application, list application, get application
3. Start application, wait for application to start
4. Submit Spark job, note run id, query job status, wait till job success
5. Query S3 output, S3 logs , show S3 logs
6. Spark History Server

EMR Serverless: Spark Integration with Glue Catalog

Spark Application with Glue and Athena integration
1. Pyspark code with config to do Glue Integration. Copy Pyspark script to S3
2. Submit Spark job, note run id, query job status
3. Spark History Server
4. Wait till job success
5. Query S3 output, S3 logs , show S3 logs
6. Show Glue catalog and table
7. Query through Athene

EMR Serverless: Getting Started with Hive Job

Hive
1. Create Application - HIVE, list application, get application
2. Start application, wait for application to start
3. Copy Hive Script to S3
4. Submit the Hive job, note run id, query job status
5. Launch Tez UI and show job in progress
6. Wait till job success
7. Query S3 output, S3 logs , show S3 logs
8. Show Glue catalog and table
9. Query through Athena