Skip to content

Version 1.8 Release Notes

David Freels Sr edited this page Jul 6, 2022 · 20 revisions

1.8.0

(#174) Split Step

The split and merge step types allow pipeline designers the ability to run different step sequences in parallel. The merge step is used to indicate where the executions should stop and normal processing resume.

Fork Steps

  • (#208) Allow Embedded Fork Steps

(#232) Delta Lake Steps

The Metalus Delta Lake project includes step objects for updating, deleting, and merging deltalake datasources.

Metadata Extractor

  • (#203) Create Application and Execution Template Metadata Extractor
  • (#206) Metadata Extract Fails for certain versions
  • (#204) Metalus GCP fails during Step Metadata Extraction
  • (#205) Maven Dependency Resolver Should verify MD5 if available
  • (#209) Implement Custom Form Support for Metadata Extractors
  • Added scopes to the metalus-aws dependencies json to allow easier classpath configuration

Streaming

  • (#215) Streaming Drivers Not Stopping on Failure
  • (#211) Implement Event Based PipelineListener (Kinesis, Kafka and Pub/Sub Implementations)
  • (#211) Created CombinedPipelineListenener to allow more than one within an application
  • (#202) Expose results from streaming executions to be shared between runs
  • Fixed an issue with the KinesisPipelineDriver where the consumer-streams could not be parsed properly

Enhancements

  • Audit report is now printed out by the DefaultPipelineListener
  • Added AWS and GCP Secrets Manager Credential Providers
  • Added CredentialSteps to make working with credentials easier
  • (#212) Added Credential Mapping (%) character
  • Added step to authenticate (S3) DataFrame outside of normal steps.
  • Added new step object CatalogSteps that exposes many of the methods in spark Catalog class.
  • (#233) Added support for accessing Arrays and Lists elements via index in pipeline mappings
  • New step functions added to DataSteps: count, rename column and drop duplicate records
  • added spark configuration and spark settings to ExecutorAudits

Bug Fixes

  • (#175) Remove list/object parsing from mapByValue
  • (#207) Remove Support for Unsupported Spark (Spark 2.3 and Spark 2.4 with Scala 2.12)
  • (#231) Support auto-casting primitives in PipelineStepMapper.castToType

1.8.1

Supported Versions

  • Spark 2.4 Scala 2.12 has been restored
  • Spark 3.1 Scala 2.12 has been added

Metadata

  • New execution templates and pipelines have been added to ingest data into Bronze

Metalus AWS

  • Added support for role based authentication to the AWSCredential trait.
  • Added support for role based authentication to the KinesisPipelineDriver.

Bug Fixes

  • (#238) Inline string concatenation fails when mapped parameters are wrapped in multiple options
  • (#243) Comma in datatype of scalascript type parameter causes issues

1.8.2

General

  • (#248) Added the ability to escape mapping characters within string templates
  • Added PipelineContext to exceptions
  • Performance improvements for the S3OutputStream

Metalus AWS

  • (#247) Fixed an issue related to consumerStreams with KinesisPipelineDriver.

Metalus GCP

  • Created a experimental BigQuerySteps

1.8.3

  • (#255) New Connectors Architecture
  • (#252) Generic read and write step that uses the new connectors architecture
  • (#258) Create a new copy pipeline and load to bronze that uses the new connector architecture

Steps/Pipelines

  • (#254) Generic Retry for step on error and a step that will retry a specified number of times.
  • (#259) Spark Configuration Steps
  • (#256) Streaming Pipeline Drivers (Kinesis and GCP PubSub) now take a credential name from the command line
  • (#260) Moved Fork Step value validation to allow the forkMethod to be mapped

Bug Fixes

  • (#253) Secrets Manager Credential Providers not adding default credential parser
  • (#257) Fork Step now allows a limit on the number of concurrent threads to be used in parallel

1.8.4

  • #263 JDBCDataConnector

Structured Streaming

  • #266 Validated streaming connectors
  • #270 Fixes to streaming connectors
  • #272 Updated DefaultPipelineDriver to simulate a streaming driver when using structured streaming

Application

  • #275 Added ability to start an application at specific executions

FileManager

  • #274 Implemented recursive file listings

Bug Fixes

  • #267 Fixed an issue with the Big Query dependency
  • #268 Fixed an issue with the Big Query dependency
  • #269 Fixed a bug with application globals mapping introduced in 1.8.3

1.8.5

Application

FileManager

Connectors

  • #278 Fixed issues with MongoDataConnector

Split Step

  • #279 SplitFlow now pushes global updates

GCP

  • #281 Fixes made while testing metalus-gcp against GCP Dataproc and Databricks GCP

Core

  • #284 Moved JavascriptSteps and ScalaSteps to the core library
  • Added new ExceptionSteps to make throwing exceptions in a pipeline easier.

Bug Fixes

  • #286 Added support for nullable fields in the Schema object
  • Bugfix related to S3OutputStream not properly draining the buffer.

1.8.6

General

  • #308 Allow steps to register global links
  • Added ability to pass hints to the JSON4S deserializer code within applications
  • Added new getStatus function to FileManager and implementations
  • Added a new JSON API Connector for basic data interactions
  • Created a new step to determine if a value is empty
  • Updated Schema to support long, float and boolean types
  • Pipeline now includes a description field

GCP BigQuery

  • #303 Additional bug fixes for BigQuery support

Streaming

  • #306 Enhancements to streaming support
  • Added streaming monitor pipeline to allow executions to easily implement streaming connectors

1.8.7

Bug fixes

  • Fixed an issue where execution forks do not run the join pipelines
  • Added a new property named executionForkValueIndex to each fork execution to indicate the index within the value list of the current execution