Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a downstream consumer of aiSSEMBLE, I want my SparkApplications to interface with the v2 SparkHistory chart by default. #95

Closed
5 tasks done
peter-mcclonski opened this issue May 23, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@peter-mcclonski
Copy link
Contributor

peter-mcclonski commented May 23, 2024

Definition of Done

  • Add an alternate values file within the SparkApplication base chart which supports:
    • The SparkApplication dev template shall be updated to default to mounting the Spark History event volume.
    • The SparkApplication dev template shall default to writing spark event logs to the Spark History event volume.
  • Create the Spark Ecosystem V2 chart template for downstream projects
  • Ensure that the Spark Ecosystem V2 chart for downstream projects supports the new Spark History V2 chart.

Test Steps

  1. Generate a new project with the following command:
mvn archetype:generate -B -DarchetypeGroupId=com.boozallen.aissemble \
                          -DarchetypeArtifactId=foundation-archetype \
                          -DarchetypeVersion=1.7.0-SNAPSHOT \
                          -DartifactId=test-project\
                          -DgroupId=org.test \
                          -DprojectName='Test' \
                          -DprojectGitUrl=test.org/test-project\
&& cd test-project
  1. Add the following pipeline definitions to test-project-pipeline-models/src/main/resources/pipelines/:
{
	"name": "SimplePipeline",
	"package": "org.test",
	"type": {
		"name": "data-flow",
		"implementation": "data-delivery-spark"
	},
	"steps": [
		{
			"name": "Ingest",
			"type": "synchronous",
			"alerting": {
				"enabled": false
			},
			"dataProfiling": {
				"enabled": false
			},
			"provenance": {
				"enabled": false
			}
		}
	]
}
{
	"name": "SimplePipelinePython",
	"package": "org.test",
	"type": {
		"name": "data-flow",
		"implementation": "data-delivery-pyspark"
	},
	"steps": [
		{
			"name": "Ingest",
			"type": "synchronous",
			"alerting": {
				"enabled": false
			},
			"dataProfiling": {
				"enabled": false
			},
			"provenance": {
				"enabled": false
			}
		}
	]
}
  1. Execute mvn clean install repeatedly, resolving all manual actions until none remain.
  2. Modify the contents of test-project-deploy/pom.xml, replacing <profile>aissemble-spark-infrastructure-deploy</profile> with <profile>aissemble-spark-infrastructure-deploy-v2</profile>
  3. Delete the directory test-project-deploy/src/main/resources/apps/spark-infrastructure along with its contents.
  4. Execute mvn clean install
  5. _OTS ONLY: _ Replace the repository in the generated Chart.yaml with the absolute path to the spark-history chart in the local aissemble repository, and re-execute mvn clean install
  6. Use kubectl apply -f to apply the following yaml:
apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-config
data: {}
  1. Update the contents of test-project-pipelines/simple-pipeline/src/main/resources/apps/simple-pipeline-base-values.yaml and test-project-pipelines/simple-pipeline-python/src/simple_pipeline_python/resources/apps/simple-pipeline-python-base-values.yaml to remove references to hadoop-aws and aws-java-sdk-bundle
  2. Update the contents of test-project-pipelines/simple-pipeline/src/main/resources/apps/simple-pipeline-dev-values.yaml and test-project-pipelines/simple-pipeline-python/src/simple_pipeline_python/resources/apps/simple-pipeline-python-dev-values.yaml to remove their spark.eventLog configurations.
  3. Save the following content at the root of test-project with the name values-migrate-dev.yaml:
########################################
## CONFIG | Spark Configs
########################################
metadata:
  namespace: default
sparkApp:
  spec:
    sparkConf:
      spark.eventLog.enabled: "true"
      spark.eventLog.dir: "/opt/spark/spark-events"
    type: "placeholder" #required for a dry run test to pass, this should always be overridden
    mode: cluster
    imagePullPolicy: IfNotPresent
    restartPolicy:
      type: Never
    sparkVersion: "3.4.0"
    sparkConfigMap: spark-config
    dynamicAllocation:
      enabled: true
      initialExecutors: 0
      minExecutors: 0
      maxExecutors: 4
    volumes:
      - name: ivy-cache
        persistentVolumeClaim:
          claimName: ivy-cache
      - name: spark-events
        persistentVolumeClaim:
          claimName: spark-events-claim
    driver:
      cores: 1
      coreLimit: "1200m"
      memory: "512m"
      serviceAccount: spark
      volumeMounts:
        - name: ivy-cache
          mountPath: "/opt/spark/.ivy2"
        - name: spark-events
          mountPath: "/opt/spark/spark-events"
    executor:
      cores: 1
      coreLimit: "1200m"
      memory: "512m"
      labels:
        version: 3.4.0
      volumeMounts:
        - name: ivy-cache
          mountPath: "/opt/spark/.ivy2"
        - name: spark-events
          mountPath: "/opt/spark/spark-events"
service:
  enabled: false
  spec:
    ports:
      - name: "debug"
        port: 4747
        targetPort: 4747
  1. Search for spark-application in your TiltFile, adding --values values-migrate-dev.yaml after --version %s in both cases.
  2. Execute mvn clean install -Dmaven.build.cache.skipCache=true, resolving any lingering manual actions
  3. Execute tilt up and wait for all resources to be ready
  4. Trigger execution of the simple-pipeline resource, ensuring it completes successfully
  5. Trigger execution of the simple-pipeline-python resource, ensuring it completes successfully
  6. Navigate to localhost:18080
  7. Ensure that spark events are visible in the spark-history UI from both simple-pipeline and simple-pipeline-python
@peter-mcclonski peter-mcclonski self-assigned this May 23, 2024
@d-ryan-ashcraft d-ryan-ashcraft added the enhancement New feature or request label May 23, 2024
@d-ryan-ashcraft d-ryan-ashcraft added this to the 1.7.0 milestone May 23, 2024
peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue May 23, 2024
peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue May 24, 2024
peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue May 24, 2024
peter-mcclonski added a commit that referenced this issue May 24, 2024
peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue May 24, 2024
peter-mcclonski added a commit to peter-mcclonski/aissemble that referenced this issue May 24, 2024
peter-mcclonski added a commit that referenced this issue May 24, 2024
@csun-cpointe
Copy link
Contributor

Test passed !!

Screenshot 2024-05-24 at 2 53 45 PM

Screenshot 2024-05-24 at 3 01 38 PM

Screenshot 2024-05-24 at 3 02 34 PM

liangyun123 pushed a commit that referenced this issue Jun 12, 2024
liangyun123 pushed a commit that referenced this issue Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants