-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
#15: Adding Poetry Python package migration in foundation-upgrade
- Loading branch information
1 parent
8828ab4
commit b9c3a60
Showing
6 changed files
with
451 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
# Major Additions | ||
|
||
* Python modules were renamed to reflect aiSSEMBLE. These include the following. | ||
| Old Python Module | New Python Module | | ||
|--------------------------------------------|----------------------------------------------------------------| | ||
| foundation-core-python | aissemble-core-python | | ||
| foundation-model-training-api | aissemble-foundation-model-training-api | | ||
| foundation-versioning-service | aissemble-foundation-versioning-service | | ||
| foundation-drift-detection-client | aissemble-foundation-drift-detection-client | | ||
| foundation-encryption-policy-python | aissemble-foundation-encryption-policy-python | | ||
| foundation-model-lineage | aissemble-foundation-model-lineage | | ||
| foundation-data-lineage-python | aissemble-foundation-data-lineage-python | | ||
| foundation-messaging-python-client | aissemble-foundation-messaging-python-client | | ||
| foundation-pdp-client-python | aissemble-foundation-pdp-client-python | | ||
| foundation-transform-core-python | aissemble-Foundation-transform-core-python | | ||
| extensions-model-training-api-sagemaker | aissemble-extensions-model-training-api-sagemaker | | ||
| extensions-data-delivery-spark-py | aissemble-extensions-data-delivery-spark-py | | ||
| extensions-encryption-valut-python | aissemble-extensions-encryption-valut-python | | ||
| extensions-transform-spark-python | aissemble-extensions-transform-spark-python | | ||
| test-data-delivery-pyspark-model | aissemble-test-data-delivery-pyspark-model | | ||
| test-data-delivery-pyspark-model-basic | aissemble-test-data-delivery-pyspark-model-basic | | ||
| machine-learning-inference | aissemble-machine-learning-inference | | ||
| machine-learning-training | aissemble-machine-learning-training | | ||
| machine-learning-training-base | aissemble-machine-learning-training-base | | ||
| machine-learning-sagemaker-training | aissemble-machine-learning-sagemaker-training | | ||
|
||
|
||
## OpenLineage Namespace Conventions | ||
Conventions for setting namespaces when leveraging `Data Lineage` has been updated to better follow [OpenLineage's guidelines](https://openlineage.io/docs/spec/naming/). Moving forward, namespaces should be defined in the `data-lineage.properties` file, such that Jobs are tied to pipelines and Datasets are tied to data sources. This is a departure from the old pattern of one single namespace property (`data.lineage.namespace`) being leveraged for an entire project. Refer to the [GitHub docs](https://boozallen.github.io/aissemble/current-dev/lineage-medatada-capture-overview.html#_configuration) for updated guidance. Usage of the `data.lineage.namespace` property in a project's `data-lineage.properties` file will be supported as a fallback but should not be used in practice. | ||
|
||
# Breaking Changes | ||
There are no breaking changes in the 1.7.0 release. | ||
|
||
## DataLineage and ModelLineage Event Changes | ||
To associate the pipeline step's lineage event with the pipeline's, we have created a pipeline level lineage event, and a way | ||
for each pipeline step's lineage event to be associated with the pipeline's lineage run event. | ||
|
||
We have also made adjustments regarding customizing the lineage event so that we can customize the lineage event | ||
based on the event type. The below functions have been removed, and replaced by event type-specific functions: | ||
|
||
| Python Method Signature | Java Method Signature | | ||
|--------------------------------------------------------------------|----------------------------------------------------------| | ||
| create_run(self) → Run | Run createRun() | | ||
| create_job(self) → Job | Job createJob() | | ||
| create_run_event(self, run: Run, job: Job, status: str) → RunEvent | RunEvent createRunEvent(Run run, Job job, String status) | | ||
|
||
If you have overridden these functions in your project, please refer to below [Customize Lineage Event] section to make changes accordingly. | ||
|
||
The default producer value that will be generated into the data-lineage.properties file is now pulled from the scm url tag in the project's | ||
root pom.xml file. | ||
|
||
# Known Issues | ||
There are no known issues with the 1.7.0 release. | ||
|
||
# Known Vulnerabilities | ||
| Date<br/>identified | Vulnerability | Severity | Package | Affected <br/>versions | CVE | Fixed <br/>in | | ||
|---------------------|-----------------------------------------|------------|------------|------------------------|-----|---------------| | ||
|
||
# How to Upgrade | ||
The following steps will upgrade your project to 1.7. These instructions consist of multiple phases: | ||
- Automatic Upgrades - no manual action required | ||
- Precondition Steps - needed in all situations | ||
- Conditional Steps (e.g., Python steps, Java steps, if you use Metadata, etc) | ||
- Final Steps - needed in all situations | ||
|
||
## Automatic Upgrades | ||
To reduce burden of upgrading aiSSEMBLE, the Baton project is used to automate the migration of some files to the new version. These migrations run automatically when you build your project, and are included by default when you update the `build-parent` version in your root POM. Below is a description of all of the Baton migrations that are included with this version of aiSSEMBLE. | ||
|
||
| Migration Name | Description | | ||
|------------------------------------------------------|--------------------------------------------------------------| | ||
| upgrade-tiltfile-aissemble-version-migration | Updates the aiSSEMBLE version within your project's Tiltfile | | ||
| upgrade-v2-chart-files-aissemble-version-migration | Updates the helm chart dependencies within your project's deployment resources (<YOUR_PROJECT>-deploy/src/main/resources/apps/) to use the latest version of the aiSSEMBLE | | ||
| upgrade-v1-chart-files-aissemble-version-migration | Updates the docker image tags within your project's deployment resources (<YOUR_PROJECT>-deploy/src/main/resources/apps/) to use the latest version of the aiSSEMBLE | | ||
| upgrade-mlflow-v2-external-s3-migration | Update the mlflow V2 deployment (if present) in your project to utilize Localstack for local development and SealedSecrets for remote deployments | | ||
| <pyproject-migration> | Will need to migrate pyproject.toml files to reflect newly named Python modules | | ||
|
||
To deactivate any of these migrations, add the following configuration to the `baton-maven-plugin` within your root `pom.xml`: | ||
|
||
```diff | ||
<plugin> | ||
<groupId>org.technologybrewery.baton</groupId> | ||
<artifactId>baton-maven-plugin</artifactId> | ||
<dependencies> | ||
<dependency> | ||
<groupId>com.boozallen.aissemble</groupId> | ||
<artifactId>foundation-upgrade</artifactId> | ||
<version>${version.aissemble}</version> | ||
</dependency> | ||
</dependencies> | ||
+ <configuration> | ||
+ <deactivateMigrations> | ||
+ <deactivateMigration>NAME_OF_MIGRATION</deactivateMigration> | ||
+ <deactivateMigration>NAME_OF_MIGRATION</deactivateMigration> | ||
+ </deactivateMigrations> | ||
+ </configuration> | ||
</plugin> | ||
``` | ||
|
||
## Precondition Steps | ||
|
||
### Beginning the Upgrade - Required for All Projects | ||
To start your aiSSEMBLE upgrade, update your project's pom.xml to use the 1.7.0 version of the build-parent: | ||
```xml | ||
<parent> | ||
<groupId>com.boozallen.aissemble</groupId> | ||
<artifactId>build-parent</artifactId> | ||
<version>1.7.0</version> | ||
</parent> | ||
``` | ||
|
||
## Conditional Steps | ||
|
||
### Upgrade Steps for Projects Leveraging Data Lineage | ||
|
||
#### Updated Namespace Conventions with Data Lineage | ||
In order to follow standards for defining namespaces for OpenLineage Jobs and Datasets, the following steps can be taken to leverage proper namespace conventions: | ||
1. [Optional] If you are already setting the `data.lineage.namespace` value in your `<project-name>-docker/<project-name>-spark-worker-docker/src/main/resources/krausening/base/data-lineage.properties` file, it is recommended that you follow the [configuration documentation]((https://boozallen.github.io/aissemble/current-dev/lineage-medatada-capture-overview.html#_configuration)) and set `data.lineage.<pipeline>.namespace` and `data.lineage.<pipeline>.<step>.namespace` instead, and remove `data.lineage.namespace` property. | ||
2. If you project does not have a `data-lineage.properties` file, one will be generated during your next build. | ||
3. If your pipeline leverages any lineage Datasets, you must define a namespace for each dataset, per the [GitHub docs guidance](https://boozallen.github.io/aissemble/current-dev/lineage-medatada-capture-overview.html#_configuration): | ||
```text | ||
data.lineage.<dataset-name>.namespace=<dataset's-source-name> | ||
``` | ||
Note: An exception will be thrown if both the dataset's namespace and `data.lineage.namespace` are not configured. | ||
|
||
#### Associate Step Lineage Events to Pipeline | ||
The data lineage now supports pipeline level lineage run event, which provides the parent run facet for all the step level lineage events, and helps to preserve pipeline-step job hierarchy and to tie all the step level lineage events' job together. | ||
##### pyspark pipeline driver class | ||
* add PipelineBase import | ||
* add the `PipelineBase().record_pipeline_lineage_start_event()` before all the steps' executions | ||
* add the `PipelineBase().record_pipeline_lineage_complete_event()` after all the steps' executions | ||
```text | ||
from krausening.logging import LogManager | ||
+ from first_process.generated.pipeline.pipeline_base import PipelineBase | ||
if __name__ == "__main__": | ||
logger.info("STARTED: FirstProcess driver") | ||
+ PipelineBase().record_pipeline_lineage_start_event() | ||
Ingest1().execute_step() | ||
... | ||
Ingest4().execute_step() | ||
+ PipelineBase().record_pipeline_lineage_complete_event() | ||
``` | ||
##### spark pipeline driver class | ||
* add PipelineBase import | ||
* add the `PipelineBase.getInstance().recordPipelineLineageStartEvent();` before all the steps' executions | ||
* add the `PipelineBase.getInstance().recordPipelineLineageCompleteEvent();` after all the steps' executions | ||
```text | ||
+ import com.boozallen.pipeline.PipelineBase; | ||
import org.slf4j.Logger; | ||
... | ||
public static void main(String[] args) { | ||
logger.info("STARTED: {} driver", "SparkPipeline"); | ||
SparkPipelineBaseDriver.main(args); | ||
+ PipelineBase.getInstance().recordPipelineLineageStartEvent(); | ||
... | ||
final Step2 step2 = CDI.current().select(Step2.class, new Any.Literal()).get(); | ||
CompletionStage<Void> step2Result = step2.executeStep(); | ||
... | ||
+ PipelineBase.getInstance().recordPipelineLineageCompleteEvent(); | ||
``` | ||
#### Customize Lineage Event | ||
Please follow the [generated code](https://boozallen.github.io/aissemble/current/lineage-medatada-capture-overview.html#_what_gets_generated) instructions to customize the lineage event accordingly. | ||
## Final Steps | ||
### Finalizing the Upgrade - Required for All Projects | ||
1. Run `mvn clean install` and resolve any manual actions that are suggested | ||
- **NOTE:** This will update any aiSSEMBLE dependencies in 'pyproject.toml' files automatically | ||
2. Repeat the previous step until all manual actions are resolved | ||
# What's Changed |
115 changes: 115 additions & 0 deletions
115
...tion-upgrade/src/main/java/com/boozallen/aissemble/upgrade/migration/PoetryMigration.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
package com.boozallen.aissemble.upgrade.migration; | ||
|
||
/*- | ||
* #%L | ||
* aiSSEMBLE::Foundation::Upgrade | ||
* %% | ||
* Copyright (C) 2021 Booz Allen | ||
* %% | ||
* This software package is licensed under the Booz Allen Public License. All Rights Reserved. | ||
* #L% | ||
*/ | ||
|
||
import java.io.File; | ||
import java.io.IOException; | ||
import java.util.Optional; | ||
import java.util.Set; | ||
|
||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
import com.boozallen.aissemble.upgrade.util.FileUtils; | ||
import com.electronwill.nightconfig.core.Config; | ||
import com.electronwill.nightconfig.core.file.FileConfig; | ||
|
||
/** | ||
* | ||
* Baton migration used to migrate aiSSEMBLE python packages to the new naming convention | ||
*/ | ||
public class PoetryMigration extends AbstractAissembleMigration { | ||
|
||
protected static final Logger logger = LoggerFactory.getLogger(PoetryMigration.class); | ||
protected static final String DEPENDENCIES_KEY = "tool.poetry.dependencies"; | ||
protected static final String PYTHON_PACKAGE_PREFIX = "aissemble-"; | ||
|
||
protected static final Set<String> OLD_PYTHON_PACKAGES = Set.of( | ||
"foundation-core-python", | ||
"foundation-pdp-client-python", | ||
"foundation-model-lineage", | ||
"foundation-encryption-policy-python", | ||
"extensions-encryption-vault-python", | ||
"extensions-data-delivery-spark-py", | ||
"foundation-data-lineage-python" | ||
); | ||
|
||
@Override | ||
protected boolean shouldExecuteOnFile(File file) { | ||
if(file == null || !file.exists()) { | ||
logger.error("Unable to read file {} to check if migration should be executed", file.getAbsolutePath()); | ||
} | ||
boolean shouldExecute = false; | ||
FileConfig poetryConfig = FileConfig.of(file); | ||
poetryConfig.load(); | ||
Optional<Config> dependenciesOpt = poetryConfig.getOptional(DEPENDENCIES_KEY); | ||
if(dependenciesOpt.isPresent()) { | ||
Set<String> dependencies = dependenciesOpt.get().valueMap().keySet(); | ||
shouldExecute = hasOldPackages(dependencies); | ||
} | ||
else { | ||
logger.warn("Could not get dependencies for file {}", file.getAbsolutePath()); | ||
} | ||
return shouldExecute; | ||
} | ||
|
||
private boolean hasOldPackages(Set<String> packages) { | ||
boolean hasOldPackages = false; | ||
for(String oldPackage : OLD_PYTHON_PACKAGES) { | ||
if(packages.contains(oldPackage)) { | ||
hasOldPackages = true; | ||
break; | ||
} | ||
} | ||
return hasOldPackages; | ||
} | ||
|
||
@Override | ||
protected boolean performMigration(File file) { | ||
if(file == null || !file.exists()) { | ||
logger.error("Unable to read file {} for migration", file.getAbsolutePath()); | ||
} | ||
boolean performedSuccessfully = false; | ||
FileConfig pyproject = FileConfig.of(file); | ||
pyproject.load(); | ||
Optional<Config> dependenciesOpt = pyproject.getOptional(DEPENDENCIES_KEY); | ||
if(dependenciesOpt.isPresent()) { | ||
try { | ||
performedSuccessfully = migrateOldPythonPackages(dependenciesOpt.get(), file); | ||
} catch (IOException e) { | ||
logger.error(e.getMessage()); | ||
} | ||
} | ||
else { | ||
logger.warn("Could not get dependencies for file {}", file.getAbsolutePath()); | ||
} | ||
return performedSuccessfully; | ||
} | ||
|
||
private boolean migrateOldPythonPackages(Config config, File pyproject) throws IOException { | ||
boolean success = false; | ||
for(String oldPackageName : OLD_PYTHON_PACKAGES) { | ||
Optional<String> packageName = config.getOptional(oldPackageName); | ||
if(packageName.isPresent()) { | ||
String newPackageName = getNewPackageName(oldPackageName); | ||
FileUtils.replaceInFile(pyproject, oldPackageName, newPackageName); | ||
logger.info("Replacing python package {} -> {}", oldPackageName, newPackageName); | ||
success = true; | ||
} | ||
} | ||
return success; | ||
} | ||
|
||
protected static String getNewPackageName(String oldPackageName) { | ||
return String.format("%s%s", PYTHON_PACKAGE_PREFIX, oldPackageName); | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.