Skip to content

Versatile Data Kit 0.6

Compare
Choose a tag to compare
@antoniivanov antoniivanov released this 23 Aug 13:03
· 1436 commits to main since this release
7d3da40

Summary

Major features include:

Configuration auto-wiring improvement: detect non vdk_ prefixed environment variables

Before configuration option must have been prefixed with "vdk_" when set as an environment variable in order to be recognized.
This was very error prone since the options are documented without the prefix.

Now they can be set without a prefix as well.

The following are equivalent:

export VDK_DB_DEFAULT_TYPE='impala'
export DB_DEFAULT_TYPE='impala'

If both are set, the "prefixed" variable has a higher priority.

New plugin/library: vdk-lineage-model

VDK Lineage Model plugin aims to abstract emitting lineage data from VDK data jobs, so that different lineage loggers can be configured at run time in any plugin that supports emitting lineage data

Check out more at the plugin page.

New export-csv command

Alongside vdk ingest-csv which enabled users to import (or ingest) CSV data into a table.
Users can now export CSV with a simple command from SQL query:

vdk export-csv -q "select * from my_table --file 'output.csv'

Checkout out more at the plugin page

In memory properties client

Until now properties required Control Service to be able to work. Sometimes for prototyping and testing purposes, you do not need to connect to external services.

  • New configuration value can be set.

In a specific job's config file (config.ini

[vdk]
properties_default_type = memory

Or as an environment variable

export properties_default_type="memory"
  • Now the properties would be entirely in memory. That means they will be "deleted" after the job's run.

New example: Ingest and anonymize

Example how to anonymize any data being ingested using VDK with a plugin.

Check out more at the example page

New example: Airflow integration

Example how to create dependencies between data job in Airflow.

Check out more at the example page

Package versions

See installation instructions here.
The versions of VDK components released under VDK 0.6 are:

Main components

control-service 1.5.620438292
vdk-core==0.3.620677184

Plugins

airflow-provider-vdk==0.0.602273476
vdk-lineage-model== 0.0.581430542
vdk-kerberos-auth==0.3.584577337
vdk-ingest-http==0.2.616713987
vdk-impala==0.4.613570906
vdk-lineage== 0.3.604201902
vdk-trino== 0.4.605101952

What's Changed

  • airflow-provider-vdk: Add hidden fields to VDK Connection by @doks5 in #883
  • control-service: Atomic job cancellation by @gageorgiev in #860
  • control-service: Fluentd integration for data jobs by @mivanov1988 in #940
  • control-service: Secure job builder image by @gageorgiev in #936
  • control-service: add default jwt jwk uri by @mrMoZ1 in #873
  • control-service: fix the examples in swagger by @tozka in #945
  • control-service: fix vdk-server startup issues by @mrMoZ1 in #908
  • control-service: increase integration test builder memory by @mrMoZ1 in #929
  • control-service: upgrade docker container used in cicd by @mrMoZ1 in #911
  • vdk-airflow: populate readme by @tozka in #924
  • vdk-control-cli: remove hidden flag for CLI commands by @tozka in #902
  • vdk-control-cli: use latest dependencies version during build by @tozka in #903
  • vdk-core,vdk-impala,vdk-lineage,vdk-trino: Support for pluggy 1.0 by @gageorgiev in #931
  • vdk-core: Add printed output to set-default and reset-default by @gageorgiev in #884
  • vdk-core: BaseVdkError exception propagation flaw fix by @ivakoleva in #917
  • vdk-core: Improve ingestion error logging by @gageorgiev in #930
  • vdk-core: add memory properties client by @tozka in #921
  • vdk-core: add option to disable version check by @tozka in #876
  • vdk-core: detect non vdk_ prefixed environment values for config by @tozka in #874
  • vdk-core: execution result missing exception and blamee fix by @ivakoleva in #938
  • vdk-core: hide native cursor from execute hook by @tozka in #875
  • vdk-core: make db_default_type case insensitive by @tozka in #935
  • vdk-core: show log_level_vdk in help by @tozka in #905
  • vdk-core: step loading failure misclassified as Platform error fix by @ivakoleva in #920
  • vdk-core: termination message now idempotent by @mrMoZ1 in #909
  • vdk-core: vdk_exception hook exit code fix by @ivakoleva in #912
  • vdk-core: vdk_exception hook exit code fix by @ivakoleva in #915
  • vdk-csv: add export-csv command by @duyguHsnHsn in #934
  • vdk-examples: add ingest and anonymize example by @tozka in #922
  • vdk-impala, vdk-trino: Remove deprecated use of result field by @gageorgiev in #933
  • vdk-impala: Add performance logs by @VladimirPetkov1 in #939
  • vdk-impala: Add support for lineage in vdk-impala by @VladimirPetkov1 in #932
  • vdk-ingest-http: reduce verbosity of ingestion logs by @tozka in #943
  • vdk-kerberos-auth: Separate async event loop by @doks5 in #885
  • vdk-lineage-model: Extract Lineage Model in separate plugin by @VladimirPetkov1 in #896
  • vdk-server: Pin kubernetes API version by @doks5 in #919
  • vdk-server: fix for vdk server crashing on startup by @mrMoZ1 in #907
  • vdk-trino, vdk-linage: Switch to vdk-lineage-model by @VladimirPetkov1 in #898
  • vdk-trino: fix broken tests by @tozka in #900
  • versatile-data-kit: Add Data lifecycle image and minor changes by @zverulacis in #887
  • versatile-data-kit: Add getting started, ask for help, PR checklist by @zverulacis in #881
  • versatile-data-kit: Add intro part to contributing.md from the template by @zverulacis in #880
  • versatile-data-kit: Airflow Documentation by @gageorgiev in #857
  • versatile-data-kit: add link to csv example doc by @tozka in #893
  • versatile-data-kit: add logo image by @tozka in #877
  • versatile-data-kit: make easier slack instructions by @tozka in #925
  • versatile-data-kit: update link in examples by @tozka in #892
  • versatile-data-kit: update logo for dark mode by @tozka in #878

New Contributors

Full Changelog: v0.5...v0.6