You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When leveraging Spark or PySpark data delivery pipelines, s3a://spark-infrastructure is referenced in generated artifacts as the S3 bucket at which the Spark SQL warehouse, event logging, and other data is stored. As S3 bucket names must be unique across all AWS accounts within an AWS partition, deploying Spark pipelines to a non local environment will consistently fail and requires developers to update all reference to use a unique bucket name. In order to mitigate bucket collisions, consider generating S3 buckets using a different naming convention, such as pre-pending the project name (i.e. s3a://my-aissemble-project-spark-infrastructure).
Steps to Reproduce
Clear, specific, and detailed steps taken to enable reproduction of the bug for investigation.
Create an aiSSEMBLE project with Spark or PySpark data delivery pipelines that relies on S3
Deploy to a non local environment
Expected Behavior
While reasonable to expect developers to perform some manual changes to support non-local deployment (i.e. creating sealed secrets for AWS credentials), using S3 buckets that are likely to be unique will improve deployment velocity and reduces potential sources of confusion.
Actual Behavior
Non-local deployment of Spark or PySpark data delivery pipelines that reference S3 will always fail without manual intervention.
Additional Context
N/A
The text was updated successfully, but these errors were encountered:
Description
When leveraging Spark or PySpark data delivery pipelines,
s3a://spark-infrastructure
is referenced in generated artifacts as the S3 bucket at which the Spark SQL warehouse, event logging, and other data is stored. As S3 bucket names must be unique across all AWS accounts within an AWS partition, deploying Spark pipelines to a non local environment will consistently fail and requires developers to update all reference to use a unique bucket name. In order to mitigate bucket collisions, consider generating S3 buckets using a different naming convention, such as pre-pending the project name (i.e.s3a://my-aissemble-project-spark-infrastructure
).Steps to Reproduce
Clear, specific, and detailed steps taken to enable reproduction of the bug for investigation.
Expected Behavior
While reasonable to expect developers to perform some manual changes to support non-local deployment (i.e. creating sealed secrets for AWS credentials), using S3 buckets that are likely to be unique will improve deployment velocity and reduces potential sources of confusion.
Actual Behavior
Non-local deployment of Spark or PySpark data delivery pipelines that reference S3 will always fail without manual intervention.
Additional Context
N/A
The text was updated successfully, but these errors were encountered: