SageMaker Spark Container

Spark Overview

Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

SageMaker Spark Container

The SageMaker Spark Container is a Docker image used to run batch data processing workloads on Amazon SageMaker using the Apache Spark framework. The container images in this repository are used to build the pre-built container images that are used when running Spark jobs on Amazon SageMaker using the SageMaker Python SDK. The pre-built images are available in the Amazon Elastic Container Registry (Amazon ECR), and this repository serves as a reference for those wishing to build their own customized Spark containers for use in Amazon SageMaker.

For the list of available Spark images, see Available SageMaker Spark Images.

License

This project is licensed under the Apache-2.0 License.

Usage in the SageMaker Python SDK

The simplest way to get started with the SageMaker Spark Container is to use the pre-built images via the SageMaker Python SDK.

Amazon SageMaker Processing — sagemaker 2.5.3 documentation

Getting Started With Development

To get started building and testing the SageMaker Spark container, you will have to setup a local development environment.

See instructions in DEVELOPMENT.md

Contributing

To contribute to this project, please read through CONTRIBUTING.md

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
scripts		scripts
smsparkbuild		smsparkbuild
spark/processing		spark/processing
src/smspark		src/smspark
test		test
.flake8		.flake8
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
VERSION		VERSION
available_images.md		available_images.md
buildspec.yml		buildspec.yml
buildspec_rollback.yml		buildspec_rollback.yml
buildspec_test.yml		buildspec_test.yml
cython_constraint.txt		cython_constraint.txt
docker-compose.yml		docker-compose.yml
mypy.ini		mypy.ini
new_images.yml		new_images.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SageMaker Spark Container

Spark Overview