Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change credentials for LocalStack between environments without replicating all of my environment properties. #28

Closed
carter-cundiff opened this issue May 1, 2024 · 5 comments · Fixed by #56
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@carter-cundiff
Copy link
Contributor

carter-cundiff commented May 1, 2024

In your Spark Application you are set up be default to utilize two sets of credentials. Localstack credentials are provided in your dev values and remote credentials pulled from a SealedSecret are provided in your base values. Currently these credentials are passed to your Spark Application as env variables. Unfortunately due to the way lists are handled in yaml, there is no good way to append these env variables between yaml files, so as a result you must duplicate all your env variables within your dev values and base values.

This could be improved by utilizing the envFrom field within our Spark Application for passing our different credentials within the dev and base values without the need to duplicate all the other env variables that we would like regardless of the environment.

DOD

  • Update existing Spark Application base values template to utilize envFrom.secretRef ✔️
    • Remove hardcoded LocalStack credentials ✔️
  • Remove hardcoded LocalStack credentials from existing Mlflow dev values template ✔️
    • Update existing Baton migration to exclude adding those credentials ✔️
  • Create new LocalStack credentials secret within LocalStack V2 helm chart ✔️
    • Include config flag to enable/disable the secret ✔️
  • Baton migration for base values: ✔️
Feature: Migrate a spark application base values with Localstack S3 credentials to use a secret reference
  Scenario: Migrate a spark application base values with Localstack S3 credentials
    Given a project that has a spark application base values 
    And the base values contains hardcoded Localstack S3 credentials
    And the base values contains "<yaml-config>" configuration
    When the spark application base values credential migration executes
    Then the base values S3 credentials will be updated to use a secret reference
    And the hardcoded Localstack S3 credentials will be removed

  Examples:
    | yaml-config      |
    | driver           |
    | driver-envFrom   |
    | executor         |
    | executor-envFrom |

  Scenario: Skip spark application base values migration with secret based S3 credentials in the base values
    Given a project that has a spark application base values 
    And the base values contains secret based S3 credentials
    When the spark application base values credential migration executes
    Then the spark application base values credential migration is skipped

  Scenario: Skip spark application base values migration without any S3 credentials in the base values
    Given a project that has a spark application base values 
    And the base values does not contain any S3 credentials
    When the spark application base values credential migration executes
    Then the spark application base values credential migration is skipped

  Scenario: Skip spark application base values migration without any environment variables in the base values
    Given a project that has a spark application base values
    And the base values does not contain any environment variables
    When the spark application base values credential migration executes
    Then the spark application base values credential migration is skipped
  • Update migration table within the release notes ✔️
  • Stretch goal:
    • Update baton version ✔️

Test Step Outline

Test New Project SparkApplication Generation

  1. Pull the latest commits from dev and build the following in aissemble:
mvn clean install -pl :build-parent,:aissemble-localstack-chart,:foundation-mda,:foundation-archetype,:foundation-upgrade,:aissemble-spark -Dmaven.build.cache.enabled=false
  1. Create a downstream project from the archetype using the following command:
mvn archetype:generate -DarchetypeGroupId=com.boozallen.aissemble -DarchetypeArtifactId=foundation-archetype -DarchetypeVersion=1.7.0-SNAPSHOT -DgroupId=com.example -DartifactId=example -DprojectGitUrl=url -DprojectName=example && cd example
  1. Add the attached MlPipelineTraining.json, PysparkPipeline.json, and SparkPipeline.json to example-pipeline-models/src/main/resources/pipelines/
  2. Run the following once:
mvn clean install
  1. Run the following until all the manual actions are complete:
mvn clean generate-sources
  1. Add the following to the save_model function within example-pipelines/ml-pipeline-training/pipeline-training-step/src/pipeline_training_step/impl/ml_pipeline_training.py:
mlflow.log_dict({"success": "true"}, "TestLocalStack.json")
  1. WSL ONLY: Add the following to example-docker/example-pipeline-training-step-docker/pom.xml:
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <execution>
                        <id>unpack</id>
                        <phase>prepare-package</phase>
                        <goals>
                            <goal>unpack</goal>
                        </goals>
                        <configuration>
                            <artifactItems>
                                <artifactItem>
                                    <groupId>com.boozallen.aissemble</groupId>
                                    <artifactId>extensions-docker-cacerts</artifactId>
                                    <version>1.7.0-SNAPSHOT</version>
                                    <overWrite>true</overWrite>
                                    <outputDirectory>${project.build.directory}/cacerts</outputDirectory>
                                    <type>zip</type>                               
                                </artifactItem>
                            </artifactItems>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

and example-docker/example-pipeline-training-step-docker/src/main/resources/docker/Dockerfile (after the FROM command):

COPY ./target/cacerts/* /usr/local/share/ca-certificates/
RUN update-ca-certificates
  1. Run the following once:
mvn clean install
  1. Update the example-deploy/src/main/resources/apps/s3-local/Chart.yaml dependencies section to the following:
    Note: this relative path assumes the aiSSEMBLE repo lives alongside this project in the same directory
dependencies:
  - name: aissemble-localstack-chart
    version: "1.0.0"
    repository: "file://../../../../../../../aissemble/extensions/extensions-helm/aissemble-localstack-chart"
  1. Replace aissemble-localstack with aissemble-localstack-chart in the example-deploy/src/main/resources/apps/s3-local/values.yaml
  2. Delete the cached repositories:
rm -rf example-deploy/src/main/resources/apps/s3-local/charts
rm -rf example-deploy/src/main/resources/apps/s3-local/Chart.lock
  1. Run the following to pull the local chart:
helm dependency build example-deploy/src/main/resources/apps/s3-local/
  1. tilt up; tilt down
  2. Once all the pods are ready, run the spark-pipeline resource and wait for the pipeline to finish running
  3. Then run the pyspark-pipeline resource and wait for the pipeline to finish running
  4. Verify there are now two events in the s3a:spark-infrastructure/spark-events/ bucket with the following command in a new teminal:
awslocal s3 ls spark-infrastructure/spark-events/
  1. Run the following in a new terminal:
curl --location 'http://localhost:5001/training-jobs?pipeline_step=pipeline-training-step' \--header 'Content-Type: application/json' \--data '{}' 
  1. Wait for the pipeline to finish running (around 10 seconds). Will show as complete on the following command:
kubectl get jobs
  1. Navigate to http://localhost:5005/#/experiments/1 and select the latest run
  2. Within the run, verify the file TestLocalStack.json with the following content exists in the artifacts tab:
{
  "success": "true"
}

Test upgrading a project

  1. Create a downstream project from the archetype using the following command:
mvn archetype:generate -DarchetypeGroupId=com.boozallen.aissemble -DarchetypeArtifactId=foundation-archetype -DarchetypeVersion=1.6.1 -DgroupId=com.example -DartifactId=example-upgrade -DprojectGitUrl=url -DprojectName=example-upgrade && cd example-upgrade
  1. Add the attached MlPipelineTraining.json, PysparkPipeline.json, and SparkPipeline.json to example-pipeline-models/src/main/resources/pipelines/
  2. Run the following once:
mvn clean install -pl :example-upgrade-pipeline-models
  1. Run the following until all the manual actions are complete:
mvn clean generate-sources
  1. Remove <version>${version.clean.plugin}</version> from example-upgrade-deploy/pom.xml
  2. Update the root pom.xml build-parent version to 1.7.0-SNAPSHOT
  3. Run the following:
mvn baton:1.0.0:baton-migrate -pl :example-upgrade
  1. Verify the example-upgrade-pipelines/pyspark-pipeline/src/pyspark_pipeline/resources/apps/pyspark-pipeline-base-values.yaml and example-upgrade-pipelines/spark-pipeline/src/main/resources/apps/spark-pipeline-base-values.yaml now contain the following:
    driver:
      # Setup these secret key references within your SealedSecret
      envFrom:
        - secretRef:
            name: remote-auth-config
      cores: 1
      coreLimit: "1200m"
      memory: "512m"
      env:
        - name: KRAUSENING_BASE
          value: /opt/spark/krausening/base
    executor:
      envFrom:
        - secretRef:
            name: remote-auth-config
      cores: 1
      memory: "512m"
      env:
        - name: KRAUSENING_BASE
          value: /opt/spark/krausening/base

Note: the spark-pipeline-base-values.yaml will also contain javaOptions:
9. Verify the example-upgrade-deploy/src/main/resources/apps/mlflow-ui/values-dev.yaml now has the following:

aissemble-mlflow-chart:
  mlflow:
    externalS3:
      host: "s3-local"
      port: 4566
      protocol: http
      existingSecretAccessKeyIDKey: "AWS_ACCESS_KEY_ID"
      existingSecretKeySecretKey: "AWS_SECRET_ACCESS_KEY"
    tracking:
      service:
        type: LoadBalancer
  1. Verify the example-upgrade-deploy/src/main/resources/apps/mlflow-ui/values.yaml now has the following:
aissemble-mlflow-chart:
  mlflow:
    externalS3:
      existingSecret: remote-auth-config
      bucket: mlflow-models/mlflow-storage

      # Update these keys with your external S3 details and credentials defined here:
      # [YOUR-PROJECT]-deploy/src/main/resources/templates/sealed-secret.yaml
      # existingSecretAccessKeyIDKey: 
      # existingSecretKeySecretKey: 
      # host:
@carter-cundiff carter-cundiff added the enhancement New feature or request label May 1, 2024
@carter-cundiff carter-cundiff added this to the 1.7.0 milestone May 1, 2024
@carter-cundiff carter-cundiff self-assigned this May 1, 2024
@carter-cundiff
Copy link
Contributor Author

Discussed DOD with @ewilkins-csi and @jacksondelametter

@carter-cundiff
Copy link
Contributor Author

carter-cundiff commented May 1, 2024

Decided to use one secret ref in the base values:

envFrom:
  - secretRef:
      name: remote-auth-config

Then the contents of the remote-auth-config will vary based on your deployment environment.

@carter-cundiff
Copy link
Contributor Author

Adjusted DOD. LocalStack secret will now be included as part of LocalStack V2 helm chart. Will update Mlflow to utilize this secret as well.

@carter-cundiff carter-cundiff changed the title Change credentials for Spark between environments without replicating all of my environment properties. Change credentials for LocalStack between environments without replicating all of my environment properties. May 2, 2024
aaron-gary added a commit that referenced this issue May 2, 2024
# This is the 1st commit message:

# This is a combination of 4 commits.
# This is the 1st commit message:

#2 Add Maven build workflow

# This is the commit message #2:

#2 update build workflow

# This is the commit message #3:

#2 add branch checkout to build workflow

# This is the commit message #4:

#2 Add on event to build workflow

# This is the commit message #2:

#2 Remove unused executions

# This is the commit message #3:

#2 Focus on build

# This is the commit message #4:

#2 Remove build execution where not needed

# This is the commit message #5:

#2 debug failing module

# This is the commit message #6:

#2 Remove unused target folder copy

# This is the commit message #7:

#2 Build spark and jenkins docker images

# This is the commit message #8:

#2 Retry full build

# This is the commit message #9:

#2 Omitting module

# This is the commit message #10:

#2 Fix for out of disk space

# This is the commit message #11:

#2 tagging docker images

# This is the commit message #12:

#2 Remove Temporarily remove Docker module

# This is the commit message #13:

#2 Build update

# This is the commit message #14:

#2 Build update

# This is the commit message #15:

#2 move chart dry-runs to IT profile

# This is the commit message #16:

#2 curl delta-hive assembly in docker build

# This is the commit message #17:

#2 cache m2 repo

# This is the commit message #18:

#2 prune docker build cache between images to save space

# This is the commit message #19:

#2 add maven build-cache to GH cache

# This is the commit message #20:

#2 run clean goal in build to clear docker cache

# This is the commit message #21:

#2 set maven caches to always save even if the build failed

# This is the commit message #22:

#2 adjust number of docker modules built

# This is the commit message #23:

#2 use the same cache for .m2 even if poms change

# This is the commit message #24:

#2 change from `save-always` flag to `if: always()` see actions/cache#1315

# This is the commit message #25:

#2 further reduce docker images being built

# This is the commit message #26:

#2 disable modules that depend on helm charts

# This is the commit message #27:

#2 use maven wrapper

# This is the commit message #28:

#2 restore modules to test build-cache

# This is the commit message #29:

#2 fix build of modules with intra-project chart dependencies

# This is the commit message #30:

#2 use explict .m2 repo cache so we can fall-back to older caches

# This is the commit message #31:

#2 save maven caches on build failure
aaron-gary added a commit that referenced this issue May 2, 2024
# This is the 1st commit message:

# This is a combination of 4 commits.
# This is the 1st commit message:

#2 Add Maven build workflow

# This is the commit message #2:

#2 update build workflow

# This is the commit message #3:

#2 add branch checkout to build workflow

# This is the commit message #4:

#2 Add on event to build workflow

# This is the commit message #2:

#2 Remove unused executions

# This is the commit message #3:

#2 Focus on build

# This is the commit message #4:

#2 Remove build execution where not needed

# This is the commit message #5:

#2 debug failing module

# This is the commit message #6:

#2 Remove unused target folder copy

# This is the commit message #7:

#2 Build spark and jenkins docker images

# This is the commit message #8:

#2 Retry full build

# This is the commit message #9:

#2 Omitting module

# This is the commit message #10:

#2 Fix for out of disk space

# This is the commit message #11:

#2 tagging docker images

# This is the commit message #12:

#2 Remove Temporarily remove Docker module

# This is the commit message #13:

#2 Build update

# This is the commit message #14:

#2 Build update

# This is the commit message #15:

#2 move chart dry-runs to IT profile

# This is the commit message #16:

#2 curl delta-hive assembly in docker build

# This is the commit message #17:

#2 cache m2 repo

# This is the commit message #18:

#2 prune docker build cache between images to save space

# This is the commit message #19:

#2 add maven build-cache to GH cache

# This is the commit message #20:

#2 run clean goal in build to clear docker cache

# This is the commit message #21:

#2 set maven caches to always save even if the build failed

# This is the commit message #22:

#2 adjust number of docker modules built

# This is the commit message #23:

#2 use the same cache for .m2 even if poms change

# This is the commit message #24:

#2 change from `save-always` flag to `if: always()` see actions/cache#1315

# This is the commit message #25:

#2 further reduce docker images being built

# This is the commit message #26:

#2 disable modules that depend on helm charts

# This is the commit message #27:

#2 use maven wrapper

# This is the commit message #28:

#2 restore modules to test build-cache

# This is the commit message #29:

#2 fix build of modules with intra-project chart dependencies

# This is the commit message #30:

#2 use explict .m2 repo cache so we can fall-back to older caches

# This is the commit message #31:

#2 save maven caches on build failure
@ewilkins-csi
Copy link
Contributor

OTS looks good!

carter-cundiff added a commit that referenced this issue May 9, 2024
…plicating all of my environment properties
carter-cundiff added a commit that referenced this issue May 9, 2024
…plicating all of my environment properties
carter-cundiff added a commit that referenced this issue May 10, 2024
…plicating all of my environment properties
carter-cundiff added a commit that referenced this issue May 10, 2024
…plicating all of my environment properties
carter-cundiff added a commit that referenced this issue May 10, 2024
…plicating all of my environment properties
carter-cundiff added a commit that referenced this issue May 10, 2024
#28 Change credentials for LocalStack between environments without replicating all of my environment properties
@Cho-William
Copy link
Contributor

Testing passed, closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants