Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Pyspark pipeline can't run the pipeline with split records #420

Open
csun-cpointe opened this issue Oct 18, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@csun-cpointe
Copy link
Contributor

csun-cpointe commented Oct 18, 2024

Description

When create a combine data records pyspark pipeline project, the pipeline can't start because of split data record dependency error.

Steps to Reproduce

Clear, specific, and detailed steps taken to enable reproduction of the bug for investigation.

  1. Create a new project on 1.10.0-SNAPSHOT.
    mvn archetype:generate '-DarchetypeGroupId=com.boozallen.aissemble' \
                           '-DarchetypeArtifactId=foundation-archetype' \
                           '-DarchetypeVersion=1.10.0-SNAPSHOT' \
                           '-DgroupId=org.test' \
                           '-Dpackage=org.test' \
                           '-DprojectGitUrl=test.org/test.git' \
                           '-DprojectName=Test combine records' \
                           '-DartifactId=test-combine-records' \
    && cd test-combine-records
  2. Set your Java version to 17 if it is not currently
  3. Unzip the resources.zip and replace the resources folder at the -pipeline-models/src/main/ directory
  4. Fully generate the project by running mvn clean install and following manual actions
  5. Unzip the krausening.zip and replace the krausening folder at the -docker/test-combine-record-spark-worker-docker/src/main/resources directory
  6. Build the project without the cache and follow the last manual action.
    mvn clean install -Dmaven.build.cache.skipCache
  7. In the -shared/pom.xml, use the the aissemble-data-records-separate-module profile for split records
      <configuration>
          <basePackage>com.boozallen</basePackage>
 -        <profile>aissemble-data-records-combined-module</profile>
 +        <profile>aissemble-data-records-separate-module</profile>
      </configuration>
  1. Build the project without the cache and follow the last manual action.
    mvn clean install -Dmaven.build.cache.skipCache
  2. In the spark-pipeline/pom.xml, update the data-record artifact name
        <dependency>
            <groupId>${project.groupId}</groupId>
-           <artifactId>test-combine-record-data-records-java</artifactId>
+           <artifactId>test-combine-record-data-records-spark-java</artifactId>
            <version>${project.version}</version>
        </dependency>
  1. In the pyspark-pipeline/pom.xml, update the data-record artifact name
        <dependency>
            <groupId>${project.groupId}</groupId>
-           <artifactId>test-combine-record-data-records-python</artifactId>
+           <artifactId>test-combine-record-data-records-spark-python</artifactId>
            <version>${project.version}</version>
        </dependency>
  1. In the pyspark-pipeline/pyproject.toml, update the test-combine-record-data-records-python dependency package name to include spark as following
    test-combine-record-data-records-spark-python = {path = "../../test-combine-record-shared/test-combine-record-data-records-spark-python", develop = true}
  1. Build the project without the cache and follow the last manual action.
    mvn clean install -Dmaven.build.cache.skipCache
  2. Tilt up all services

Expected Behavior

All services are running in ready state.

Actual Behavior

spark-worker-image failed at the below error
Screenshot 2024-10-18 at 9 27 28 AM

Additional Context

  • Log output
  • Screenshots (if applicable)
  • Solution Baseline Version
  • Environment details (local, cloud, Azure, AWS, etc.)
@csun-cpointe csun-cpointe added the bug Something isn't working label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant