From b9c3a607a67d4c549651319f88fd15898dd5a6df Mon Sep 17 00:00:00 2001
From: Robert Delametter <18638044+jacksondelametter@users.noreply.github.com>
Date: Tue, 30 Apr 2024 10:57:13 -0500
Subject: [PATCH] #15: Adding Poetry Python package migration in
foundation-upgrade
---
RELEASE_NOTES_DRAFT.md | 177 ++++++++++++++++++
.../upgrade/migration/PoetryMigration.java | 115 ++++++++++++
.../src/main/resources/migrations.json | 9 +
.../migration/AbstractMigrationTest.java | 12 ++
.../migration/PoetryMigrationSteps.java | 122 ++++++++++++
.../specifications/poetry-migration.feature | 16 ++
6 files changed, 451 insertions(+)
create mode 100644 RELEASE_NOTES_DRAFT.md
create mode 100644 foundation/foundation-upgrade/src/main/java/com/boozallen/aissemble/upgrade/migration/PoetryMigration.java
create mode 100644 foundation/foundation-upgrade/src/test/java/com/boozallen/aissemble/upgrade/migration/PoetryMigrationSteps.java
create mode 100644 foundation/foundation-upgrade/src/test/resources/specifications/poetry-migration.feature
diff --git a/RELEASE_NOTES_DRAFT.md b/RELEASE_NOTES_DRAFT.md
new file mode 100644
index 000000000..a46be5300
--- /dev/null
+++ b/RELEASE_NOTES_DRAFT.md
@@ -0,0 +1,177 @@
+# Major Additions
+
+* Python modules were renamed to reflect aiSSEMBLE. These include the following.
+| Old Python Module | New Python Module |
+|--------------------------------------------|----------------------------------------------------------------|
+| foundation-core-python | aissemble-core-python |
+| foundation-model-training-api | aissemble-foundation-model-training-api |
+| foundation-versioning-service | aissemble-foundation-versioning-service |
+| foundation-drift-detection-client | aissemble-foundation-drift-detection-client |
+| foundation-encryption-policy-python | aissemble-foundation-encryption-policy-python |
+| foundation-model-lineage | aissemble-foundation-model-lineage |
+| foundation-data-lineage-python | aissemble-foundation-data-lineage-python |
+| foundation-messaging-python-client | aissemble-foundation-messaging-python-client |
+| foundation-pdp-client-python | aissemble-foundation-pdp-client-python |
+| foundation-transform-core-python | aissemble-Foundation-transform-core-python |
+| extensions-model-training-api-sagemaker | aissemble-extensions-model-training-api-sagemaker |
+| extensions-data-delivery-spark-py | aissemble-extensions-data-delivery-spark-py |
+| extensions-encryption-valut-python | aissemble-extensions-encryption-valut-python |
+| extensions-transform-spark-python | aissemble-extensions-transform-spark-python |
+| test-data-delivery-pyspark-model | aissemble-test-data-delivery-pyspark-model |
+| test-data-delivery-pyspark-model-basic | aissemble-test-data-delivery-pyspark-model-basic |
+| machine-learning-inference | aissemble-machine-learning-inference |
+| machine-learning-training | aissemble-machine-learning-training |
+| machine-learning-training-base | aissemble-machine-learning-training-base |
+| machine-learning-sagemaker-training | aissemble-machine-learning-sagemaker-training |
+
+
+## OpenLineage Namespace Conventions
+Conventions for setting namespaces when leveraging `Data Lineage` has been updated to better follow [OpenLineage's guidelines](https://openlineage.io/docs/spec/naming/). Moving forward, namespaces should be defined in the `data-lineage.properties` file, such that Jobs are tied to pipelines and Datasets are tied to data sources. This is a departure from the old pattern of one single namespace property (`data.lineage.namespace`) being leveraged for an entire project. Refer to the [GitHub docs](https://boozallen.github.io/aissemble/current-dev/lineage-medatada-capture-overview.html#_configuration) for updated guidance. Usage of the `data.lineage.namespace` property in a project's `data-lineage.properties` file will be supported as a fallback but should not be used in practice.
+
+# Breaking Changes
+There are no breaking changes in the 1.7.0 release.
+
+## DataLineage and ModelLineage Event Changes
+To associate the pipeline step's lineage event with the pipeline's, we have created a pipeline level lineage event, and a way
+for each pipeline step's lineage event to be associated with the pipeline's lineage run event.
+
+We have also made adjustments regarding customizing the lineage event so that we can customize the lineage event
+based on the event type. The below functions have been removed, and replaced by event type-specific functions:
+
+| Python Method Signature | Java Method Signature |
+|--------------------------------------------------------------------|----------------------------------------------------------|
+| create_run(self) → Run | Run createRun() |
+| create_job(self) → Job | Job createJob() |
+| create_run_event(self, run: Run, job: Job, status: str) → RunEvent | RunEvent createRunEvent(Run run, Job job, String status) |
+
+If you have overridden these functions in your project, please refer to below [Customize Lineage Event] section to make changes accordingly.
+
+The default producer value that will be generated into the data-lineage.properties file is now pulled from the scm url tag in the project's
+root pom.xml file.
+
+# Known Issues
+There are no known issues with the 1.7.0 release.
+
+# Known Vulnerabilities
+| Date
identified | Vulnerability | Severity | Package | Affected
versions | CVE | Fixed
in |
+|---------------------|-----------------------------------------|------------|------------|------------------------|-----|---------------|
+
+# How to Upgrade
+The following steps will upgrade your project to 1.7. These instructions consist of multiple phases:
+- Automatic Upgrades - no manual action required
+- Precondition Steps - needed in all situations
+- Conditional Steps (e.g., Python steps, Java steps, if you use Metadata, etc)
+- Final Steps - needed in all situations
+
+## Automatic Upgrades
+To reduce burden of upgrading aiSSEMBLE, the Baton project is used to automate the migration of some files to the new version. These migrations run automatically when you build your project, and are included by default when you update the `build-parent` version in your root POM. Below is a description of all of the Baton migrations that are included with this version of aiSSEMBLE.
+
+| Migration Name | Description |
+|------------------------------------------------------|--------------------------------------------------------------|
+| upgrade-tiltfile-aissemble-version-migration | Updates the aiSSEMBLE version within your project's Tiltfile |
+| upgrade-v2-chart-files-aissemble-version-migration | Updates the helm chart dependencies within your project's deployment resources (-deploy/src/main/resources/apps/) to use the latest version of the aiSSEMBLE |
+| upgrade-v1-chart-files-aissemble-version-migration | Updates the docker image tags within your project's deployment resources (-deploy/src/main/resources/apps/) to use the latest version of the aiSSEMBLE |
+| upgrade-mlflow-v2-external-s3-migration | Update the mlflow V2 deployment (if present) in your project to utilize Localstack for local development and SealedSecrets for remote deployments |
+| | Will need to migrate pyproject.toml files to reflect newly named Python modules |
+
+To deactivate any of these migrations, add the following configuration to the `baton-maven-plugin` within your root `pom.xml`:
+
+```diff
+
+ org.technologybrewery.baton
+ baton-maven-plugin
+
+
+ com.boozallen.aissemble
+ foundation-upgrade
+ ${version.aissemble}
+
+
++
++
++ NAME_OF_MIGRATION
++ NAME_OF_MIGRATION
++
++
+
+```
+
+## Precondition Steps
+
+### Beginning the Upgrade - Required for All Projects
+To start your aiSSEMBLE upgrade, update your project's pom.xml to use the 1.7.0 version of the build-parent:
+ ```xml
+
+ com.boozallen.aissemble
+ build-parent
+ 1.7.0
+
+ ```
+
+## Conditional Steps
+
+### Upgrade Steps for Projects Leveraging Data Lineage
+
+#### Updated Namespace Conventions with Data Lineage
+In order to follow standards for defining namespaces for OpenLineage Jobs and Datasets, the following steps can be taken to leverage proper namespace conventions:
+1. [Optional] If you are already setting the `data.lineage.namespace` value in your `-docker/-spark-worker-docker/src/main/resources/krausening/base/data-lineage.properties` file, it is recommended that you follow the [configuration documentation]((https://boozallen.github.io/aissemble/current-dev/lineage-medatada-capture-overview.html#_configuration)) and set `data.lineage..namespace` and `data.lineage...namespace` instead, and remove `data.lineage.namespace` property.
+2. If you project does not have a `data-lineage.properties` file, one will be generated during your next build.
+3. If your pipeline leverages any lineage Datasets, you must define a namespace for each dataset, per the [GitHub docs guidance](https://boozallen.github.io/aissemble/current-dev/lineage-medatada-capture-overview.html#_configuration):
+```text
+data.lineage..namespace=
+```
+Note: An exception will be thrown if both the dataset's namespace and `data.lineage.namespace` are not configured.
+
+#### Associate Step Lineage Events to Pipeline
+The data lineage now supports pipeline level lineage run event, which provides the parent run facet for all the step level lineage events, and helps to preserve pipeline-step job hierarchy and to tie all the step level lineage events' job together.
+
+##### pyspark pipeline driver class
+* add PipelineBase import
+* add the `PipelineBase().record_pipeline_lineage_start_event()` before all the steps' executions
+* add the `PipelineBase().record_pipeline_lineage_complete_event()` after all the steps' executions
+```text
+ from krausening.logging import LogManager
++ from first_process.generated.pipeline.pipeline_base import PipelineBase
+
+ if __name__ == "__main__":
+ logger.info("STARTED: FirstProcess driver")
++ PipelineBase().record_pipeline_lineage_start_event()
+ Ingest1().execute_step()
+ ...
+ Ingest4().execute_step()
++ PipelineBase().record_pipeline_lineage_complete_event()
+```
+
+##### spark pipeline driver class
+* add PipelineBase import
+* add the `PipelineBase.getInstance().recordPipelineLineageStartEvent();` before all the steps' executions
+* add the `PipelineBase.getInstance().recordPipelineLineageCompleteEvent();` after all the steps' executions
+```text
++ import com.boozallen.pipeline.PipelineBase;
+ import org.slf4j.Logger;
+ ...
+
+ public static void main(String[] args) {
+ logger.info("STARTED: {} driver", "SparkPipeline");
+ SparkPipelineBaseDriver.main(args);
+
++ PipelineBase.getInstance().recordPipelineLineageStartEvent();
+ ...
+ final Step2 step2 = CDI.current().select(Step2.class, new Any.Literal()).get();
+ CompletionStage step2Result = step2.executeStep();
+ ...
++ PipelineBase.getInstance().recordPipelineLineageCompleteEvent();
+```
+
+#### Customize Lineage Event
+Please follow the [generated code](https://boozallen.github.io/aissemble/current/lineage-medatada-capture-overview.html#_what_gets_generated) instructions to customize the lineage event accordingly.
+
+
+## Final Steps
+
+### Finalizing the Upgrade - Required for All Projects
+1. Run `mvn clean install` and resolve any manual actions that are suggested
+ - **NOTE:** This will update any aiSSEMBLE dependencies in 'pyproject.toml' files automatically
+2. Repeat the previous step until all manual actions are resolved
+
+# What's Changed
diff --git a/foundation/foundation-upgrade/src/main/java/com/boozallen/aissemble/upgrade/migration/PoetryMigration.java b/foundation/foundation-upgrade/src/main/java/com/boozallen/aissemble/upgrade/migration/PoetryMigration.java
new file mode 100644
index 000000000..8340a4a42
--- /dev/null
+++ b/foundation/foundation-upgrade/src/main/java/com/boozallen/aissemble/upgrade/migration/PoetryMigration.java
@@ -0,0 +1,115 @@
+package com.boozallen.aissemble.upgrade.migration;
+
+/*-
+ * #%L
+ * aiSSEMBLE::Foundation::Upgrade
+ * %%
+ * Copyright (C) 2021 Booz Allen
+ * %%
+ * This software package is licensed under the Booz Allen Public License. All Rights Reserved.
+ * #L%
+ */
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Optional;
+import java.util.Set;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.boozallen.aissemble.upgrade.util.FileUtils;
+import com.electronwill.nightconfig.core.Config;
+import com.electronwill.nightconfig.core.file.FileConfig;
+
+/**
+ *
+ * Baton migration used to migrate aiSSEMBLE python packages to the new naming convention
+ */
+public class PoetryMigration extends AbstractAissembleMigration {
+
+ protected static final Logger logger = LoggerFactory.getLogger(PoetryMigration.class);
+ protected static final String DEPENDENCIES_KEY = "tool.poetry.dependencies";
+ protected static final String PYTHON_PACKAGE_PREFIX = "aissemble-";
+
+ protected static final Set OLD_PYTHON_PACKAGES = Set.of(
+ "foundation-core-python",
+ "foundation-pdp-client-python",
+ "foundation-model-lineage",
+ "foundation-encryption-policy-python",
+ "extensions-encryption-vault-python",
+ "extensions-data-delivery-spark-py",
+ "foundation-data-lineage-python"
+ );
+
+ @Override
+ protected boolean shouldExecuteOnFile(File file) {
+ if(file == null || !file.exists()) {
+ logger.error("Unable to read file {} to check if migration should be executed", file.getAbsolutePath());
+ }
+ boolean shouldExecute = false;
+ FileConfig poetryConfig = FileConfig.of(file);
+ poetryConfig.load();
+ Optional dependenciesOpt = poetryConfig.getOptional(DEPENDENCIES_KEY);
+ if(dependenciesOpt.isPresent()) {
+ Set dependencies = dependenciesOpt.get().valueMap().keySet();
+ shouldExecute = hasOldPackages(dependencies);
+ }
+ else {
+ logger.warn("Could not get dependencies for file {}", file.getAbsolutePath());
+ }
+ return shouldExecute;
+ }
+
+ private boolean hasOldPackages(Set packages) {
+ boolean hasOldPackages = false;
+ for(String oldPackage : OLD_PYTHON_PACKAGES) {
+ if(packages.contains(oldPackage)) {
+ hasOldPackages = true;
+ break;
+ }
+ }
+ return hasOldPackages;
+ }
+
+ @Override
+ protected boolean performMigration(File file) {
+ if(file == null || !file.exists()) {
+ logger.error("Unable to read file {} for migration", file.getAbsolutePath());
+ }
+ boolean performedSuccessfully = false;
+ FileConfig pyproject = FileConfig.of(file);
+ pyproject.load();
+ Optional dependenciesOpt = pyproject.getOptional(DEPENDENCIES_KEY);
+ if(dependenciesOpt.isPresent()) {
+ try {
+ performedSuccessfully = migrateOldPythonPackages(dependenciesOpt.get(), file);
+ } catch (IOException e) {
+ logger.error(e.getMessage());
+ }
+ }
+ else {
+ logger.warn("Could not get dependencies for file {}", file.getAbsolutePath());
+ }
+ return performedSuccessfully;
+ }
+
+ private boolean migrateOldPythonPackages(Config config, File pyproject) throws IOException {
+ boolean success = false;
+ for(String oldPackageName : OLD_PYTHON_PACKAGES) {
+ Optional packageName = config.getOptional(oldPackageName);
+ if(packageName.isPresent()) {
+ String newPackageName = getNewPackageName(oldPackageName);
+ FileUtils.replaceInFile(pyproject, oldPackageName, newPackageName);
+ logger.info("Replacing python package {} -> {}", oldPackageName, newPackageName);
+ success = true;
+ }
+ }
+ return success;
+ }
+
+ protected static String getNewPackageName(String oldPackageName) {
+ return String.format("%s%s", PYTHON_PACKAGE_PREFIX, oldPackageName);
+ }
+
+}
diff --git a/foundation/foundation-upgrade/src/main/resources/migrations.json b/foundation/foundation-upgrade/src/main/resources/migrations.json
index b8de1b11c..96146f89a 100644
--- a/foundation/foundation-upgrade/src/main/resources/migrations.json
+++ b/foundation/foundation-upgrade/src/main/resources/migrations.json
@@ -17,6 +17,15 @@
}
]
},
+ {
+ "name": "upgrade-poetry-aissemble-python-package-migration",
+ "implementation": "com.boozallen.aissemble.upgrade.migration.PoetryMigration",
+ "fileSets": [
+ {
+ "includes": ["*-pipelines/**/pyproject.toml"]
+ }
+ ]
+ },
{
"name": "upgrade-v1-chart-files-aissemble-version-migration",
"implementation": "com.boozallen.aissemble.upgrade.migration.HelmChartsV1Migration",
diff --git a/foundation/foundation-upgrade/src/test/java/com/boozallen/aissemble/upgrade/migration/AbstractMigrationTest.java b/foundation/foundation-upgrade/src/test/java/com/boozallen/aissemble/upgrade/migration/AbstractMigrationTest.java
index 8df302dc8..7553fc8be 100644
--- a/foundation/foundation-upgrade/src/test/java/com/boozallen/aissemble/upgrade/migration/AbstractMigrationTest.java
+++ b/foundation/foundation-upgrade/src/test/java/com/boozallen/aissemble/upgrade/migration/AbstractMigrationTest.java
@@ -11,6 +11,7 @@
*/
import java.io.File;
+import java.io.IOException;
import java.nio.file.Paths;
public class AbstractMigrationTest {
@@ -18,6 +19,17 @@ public class AbstractMigrationTest {
protected boolean shouldExecute;
protected boolean successful;
+ private static final String TEST_FILES_FOLDER = Paths.get("target", "test-classes", "test-files").toString();
+
+ protected static void addTestFile(String subPath) throws IOException {
+ File testFile = Paths.get(TEST_FILES_FOLDER, subPath).toFile();
+ if(testFile.exists()) {
+ throw new RuntimeException(String.format("Test file at %s already exists", subPath));
+ }
+ testFile.getParentFile().mkdirs();
+ testFile.createNewFile();
+ }
+
protected static File getTestFile(String subPath) {
File testFile = Paths.get("target", "test-classes", "test-files", subPath).toFile();
File dir = testFile.getParentFile();
diff --git a/foundation/foundation-upgrade/src/test/java/com/boozallen/aissemble/upgrade/migration/PoetryMigrationSteps.java b/foundation/foundation-upgrade/src/test/java/com/boozallen/aissemble/upgrade/migration/PoetryMigrationSteps.java
new file mode 100644
index 000000000..2beef1e05
--- /dev/null
+++ b/foundation/foundation-upgrade/src/test/java/com/boozallen/aissemble/upgrade/migration/PoetryMigrationSteps.java
@@ -0,0 +1,122 @@
+package com.boozallen.aissemble.upgrade.migration;
+
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.io.IOException;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+
+import com.electronwill.nightconfig.core.Config;
+import com.electronwill.nightconfig.core.file.FileConfig;
+
+/*-
+ * #%L
+ * aiSSEMBLE::Foundation::Upgrade
+ * %%
+ * Copyright (C) 2021 Booz Allen
+ * %%
+ * This software package is licensed under the Booz Allen Public License. All Rights Reserved.
+ * #L%
+ */
+
+import io.cucumber.java.en.Given;
+import io.cucumber.java.en.Then;
+import io.cucumber.java.en.When;
+
+public class PoetryMigrationSteps extends AbstractMigrationTest {
+
+ private static final String PYPROJECT_FILE = Paths.get("pyproject", "pyproject.toml").toString();
+
+ @BeforeClass
+ public void createPyproject() throws IOException {
+ addTestFile(PYPROJECT_FILE);
+ }
+
+ @AfterClass
+ public void deletePyproject() {
+ getTestFile(PYPROJECT_FILE).delete();
+ }
+
+ @Given("a pyproject.toml file with old aiSSEMBLE Python dependency naming conventions")
+ public void a_pyproject_toml_file_with_all_old_ai_ssemble_python_dependencies() throws IOException {
+ createPyproject(PoetryMigration.OLD_PYTHON_PACKAGES);
+ }
+
+ @Given("a pyproject.toml file with old and new aiSSEMBLE Python dependency naming conventions")
+ public void a_pyproject_toml_file_with_old_ai_ssemble_python_dependencies() throws IOException {
+ createPyproject(getPartialOldPythonPackages());
+ }
+
+ @Given("a pyproject.toml file with new aiSSEMBLE Python dependencies")
+ public void a_pyproject_toml_file_with_new_ai_ssemble_python_dependencies() {
+ createPyproject(getNewPackages());
+ }
+
+ private void createPyproject(Set pythonPackages) {
+ FileConfig pyproject = FileConfig.of(getTestFile(PYPROJECT_FILE));
+ Config config = Config.inMemory();
+ pythonPackages.stream()
+ .forEach(packageName -> config.add(packageName, "1.0.0"));
+ pyproject.add(PoetryMigration.DEPENDENCIES_KEY, config);
+ pyproject.save();
+ pyproject.close();
+ }
+
+ @When("the pyproject.toml file migration is executed")
+ public void the_pyproject_toml_file_migration_is_executed() {
+ testFile = getTestFile(PYPROJECT_FILE);
+ performMigration(new PoetryMigration());
+ }
+
+ @Then("the migration is skipped")
+ public void the_migration_is_skipped() {
+ assertFalse(shouldExecute);
+ }
+
+ @Then("the dependencies are updated to the newest naming convention")
+ public void the_dependencies_are_updated_to_the_newest_naming_convention() {
+ FileConfig pyproject = FileConfig.of(getTestFile(PYPROJECT_FILE));
+ pyproject.load();
+ Optional dependenciesOpt = pyproject.getOptional(PoetryMigration.DEPENDENCIES_KEY);
+ if(dependenciesOpt.isPresent()) {
+ Config config = dependenciesOpt.get();
+ for(String newPackageName : getNewPackages()) {
+ assertTrue(config.getOptional(newPackageName).isPresent());
+ }
+ }
+ }
+
+ private Set getPartialOldPythonPackages() {
+ Set pythonPackages = new HashSet<>();
+ List oldPoetryPackages = new ArrayList<>(PoetryMigration.OLD_PYTHON_PACKAGES);
+ for(int i=0;i getNewPackages() {
+ Set newPackages = new HashSet<>();
+ for(String oldPackageName : PoetryMigration.OLD_PYTHON_PACKAGES) {
+ String newPackageName = PoetryMigration.getNewPackageName(oldPackageName);
+ newPackages.add(newPackageName);
+ }
+ return newPackages;
+ }
+
+}
diff --git a/foundation/foundation-upgrade/src/test/resources/specifications/poetry-migration.feature b/foundation/foundation-upgrade/src/test/resources/specifications/poetry-migration.feature
new file mode 100644
index 000000000..3c964646b
--- /dev/null
+++ b/foundation/foundation-upgrade/src/test/resources/specifications/poetry-migration.feature
@@ -0,0 +1,16 @@
+Feature: As an aiSSEMBLE user, I want my aiSSEMBLE Python packages updated to the latest naming convention automatically so upgrade errors are minimized
+
+Scenario Outline: Upgrade aiSSEMBLE Python packages to latest naming convention
+ Given a pyproject.toml file with old aiSSEMBLE Python dependency naming conventions
+ When the pyproject.toml file migration is executed
+ Then the dependencies are updated to the newest naming convention
+
+Scenario: Upgrade aiSSEMBLE Python packages to latest naming convention with partialy upgraded Python packages
+ Given a pyproject.toml file with old and new aiSSEMBLE Python dependency naming conventions
+ When the pyproject.toml file migration is executed
+ Then the dependencies are updated to the newest naming convention
+
+Scenario: Skip upgrade
+ Given a pyproject.toml file with new aiSSEMBLE Python dependencies
+ When the pyproject.toml file migration is executed
+ Then the migration is skipped
\ No newline at end of file