-
Notifications
You must be signed in to change notification settings - Fork 28
News
We have released Version 0.10.2 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
We have changed the materialization logic of materializeOnce
views such that they no longer ask their child views to materialize if the materializeOnce
views have been materialized already. This improves performance.
We have released Version 0.10.1 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This is a bugfix release correcting the order of the TBLPROPERTIES and LOCATION clauses in the Hive DDL generated for views. Please do note that if you use the tblProperties
clause in some views, this change affects the DDL checksum making Schedoscope drop and recreate the respective tables. Hence the version bump to 0.10.1.
Thanks to Julian Keppel for reporting the issue and providing the fix.
We have released Version 0.9.13 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Removed derelict indirect CDH5.12.0 dependencies incurred by Cloudera's Spark 2.2.0-Cloudera2 dependency.
We have released Version 0.9.11 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Added configuration parameter schedoscope.export.disableAll
to globally disable all view exports. Useful in test environments.
We have released Version 0.9.10 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Upgraded Cloudera dependencies to CDH 5.14.0.
We have released Version 0.9.9 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Export your views to Google Cloud Platform's BigQuery via a simple exportAs()
statement.
BigQuery export now compresses view data before sending it off to Google Cloud Storage.
We have released Version 0.9.7 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Optimized performance of BigQuery export by moving more work to the map phase of the export job.
We have released Version 0.9.6 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Corrected a problem with command line argument construction within BigQuery exportAs()
clauses in a Kerberized cluster.
We have released Version 0.9.5 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Export your views to Google Cloud Platform's BigQuery via a simple exportAs()
statement.
We have released Version 0.9.4 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Emergency bug fix for Schedoscope crashing upon exports. Do not use 0.9.3!
We have released Version 0.9.3 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Minor bug fix. Show view name in resource manager also for transformations of views that have exportAs statements.
We have released Version 0.9.2 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Minor bug fixes. Improved Metascope performance by optionally circumventing the Hive Metastore API and accessing the Metastore DB directly.
We have released Version 0.9.1 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
We fixed a bug in the Spark driver that could lead to incomplete consumption of the error stream of the Spark submit subprocess resulting in transformation freezes.
We have released Version 0.9.0 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This release upgrades Spark transformations from Spark version 1.6.0 to Spark version 2.2.0 based on Cloudera's CDH 5.12 Spark 2.2 beta parcel. As a consequence, Schedoscope has been lifted to Scala 2.11 and JDK8 as well.
This is an incompatible change likely requiring adaptation of Spark jobs, dependencies, and build pipelines of existing Schedoscope projects - hence the incrememtation of the minor release number.
We have released Version 0.8.9 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This release contains the following enhancements and changes:
- Cloudera client libraries updated to CDH-5.12.0;
- a DistCp transformation for view materialization by parallel, cross-cluser file copying;
- a new development mode setup that helps developers to easily copy data from a production environment to the direct dependencies of the view they are developing;
- shell transformations had to be moved back into
schedoscope-core
to facilitate development mode; - a versioning issue with the Scala Maven compiler plugin with regard to Scala 2.10 was fixed so that finally Schedoscope compiles and runs under JDK8 as well.
We have released Version 0.8.7 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This version contains a critical Metascope bugfix introduced with the last version preventing startup. Also, finally Metascope field lineage documentation has been provided in the View DSL Primer and the Metascope Primer.
We have released Version 0.8.6 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This version includes support for field level data lineage - automatically inferred from Hive transformations, declaratively specifyable for other transformations - in Metascope. Also, Metascope lineage graph rendering has been reworked. Extensive documentation to come.
Schedoscope now fails immediately if a driver specified in schedoscope.conf cannot be found on the classpath.
We have released Version 0.8.5 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This version adds support for float view fields to JDBC exports.
We have released Version 0.8.4 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This version removes a race condition the file system driver initialization that seems to have been introduced with CDH-5.10. Also, we have changed the way how we delete and recreate output folders for Map/Reduce transformations to avoid Hive partitions pointing to temporarily non-existing folders.
We have released Version 0.8.3 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This version has been built against Cloudera's CDH 5.10.1 client libraries. The test framework no longer artificially sets the storage formats of views under test to text, making testing of Spark jobs writing Parquet files simpler. The robustness of the Schedoscope HTTP service has been improved in face of invalid view parameters.
We have released Version 0.8.2 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This version provides significant performance improvements when initializing the scheduling state for a large number of views.
We have released Version 0.8.1 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This fixes a critical bug that could result in applying commands to all views in a table and not just the ones addressed. Do not use Release 0.8.0
We have released Version 0.8.0 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Schedoscope 0.8.0 includes, among other things:
- significant rework of Schedoscope's actor system that supports testing and uses significantly fewer actors reducing stress for poor Akka;
- support for a lot more Hive storage formats;
- definition of arbitrary Hive table properties / SerDes;
- stability, performance, and UI improvements to Metascope;
- the names of views being transformed appear as the job name in the Hadoop resource manager.
Please note that Metascope's database schema has changed with this release, so back up your database before deploying.
We have released Version 0.7.1 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This release includes a fix removing bad default values for the driver setting location
for some transformation types. Moreover, it now includes the config setting schedoscope.hadoop.viewDataHdfsRoot
which allows one to set a root folder different from /hdp
for view table data without having to register a new dbPathBuilder
builder function for each view.
Spark transformations, finally! Build views based on Scala and Python Spark 1.6.0 jobs or run your Hive transformations on Spark. Test them using the Schedoscope test framework like any other transformation type. HiveContext
is supported.
We have also upgraded Schedoscope's dependencies to CDH-5.8.3. There is catch, though: we had to backport Schedoscope 0.7.0 to Scala 2.10 for compatibility with Cloudera's Spark 1.6.0 dependencies.
We have released Version 0.7.0 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Minor improvements to test framework.
We have released Version 0.6.6 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
The test framework has received some love. There are two new testing styles that can make your tests look prettier and run faster:
- compute a view once and execute multiple tests on its data;
- create the Hive structures for input views and views under test once and load these with different data within each test case saving Hive environment setup overhead and keeping input data and assertions next to each other within each test.
We have released Version 0.6.5 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
We have factored out Oozie, Pig, and shell transformations and their drivers into separate modules and removed knowledge about which transformation types exist from schedoscope-core
. Thus, one can now extend Schedoscope with new tranformation types without touching the core.
We have fixed a bug in the test framework where sorting results with null values yielded a null pointer exception.
We have released Version 0.6.4 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
We have added:
- simple parallel (S)FTP exporting of views
- the ability to manually assign versions to transformations with
defineVersion
in order to avoid unnecessary recomputations in complex cases where the automatic transformation logic change detection generates too many false positives.
We have released Version 0.6.3 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
We have fixed a security issue with Metascope that allowed non-admin users to edit taxonomies.
We have released Version 0.6.2 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Hadoop dependencies have been updated to CDH-5.7.1. A critical bug that could result in no more views transforming while depending views still waiting has been fixed. Reliability of Metascope has been improved.
We have released Version 0.6.1 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Hive transformations are no longer submitted via Hive Server 2 to the cluster but directly via the hive-exec
library. The reason for this change are stability and resource leakage issues commonly encountered when operating Hive Server 2. Please note that Hive transformations are now issued with hive.auto.convert.join
set to false by default to limit heap consumption in Schedoscope due to involuntary local map join operations. Refer to Hive Transformation for more information on how to reenable map joins for queries that need them.
Also: quite a few bug fixes, better error messages when using the CLI client, improved parallelization of JDBC exports.
We have released Version 0.6.0 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
We have updated the checksumming algorithm for Hive transformations such that changes to comments, settings, and formatting no longer affect the checksum. This should significantly reduce operations worries. However, the checksums of all your Hive queries compared to Release 0.5.0 will change. Take care that you issue a materialization request with [mode RESET_TRANSFORMATION_CHECKSUMS
](Scheduling Command Reference) when switching to this version to avoid unwanted view recomputations! Hence the switch of the minor release number.
The test framework now automatically checks whether there is an ON
condition for each JOIN
clause in your Hive queries. Also, it checks whether each input view you provide in basedOn
is also declared as a dependency.
We have released Version 0.5.0 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This is a biggie. We have added Metascope to our distribution. Metascope is a collaborative metadata management, documentation, exploration, and data lineage tracing tool that exploits the integrated specification of data structure, dependencies, and computation logic in Schedoscope views. See the tutorial and the Metascope primer for more information.
We have released Version 0.4.3 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This release makes exportTo
support the isPrivacySensitive
clause of the View DSL. Fields and partition parameters marked with isPrivacySensitive
are hashed during export.
We have released Version 0.4.2 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This is a bugfix release solving an issue with an overly pedantic view pattern checker in the HTTP-API sabotaging the views
command.
We have released Version 0.4.0 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This is a big release including:
-
a complete overhaul of the scheduling state machine with significant improvement of test coverage
-
exportTo
clause for simple, seamless, and parallel export of views to relational databases, Redis key-value stores, and Kafka topics (see View DSL Primer) -
new materialization modes
SET_ONLY
andTRANSFORMATION_ONLY
for more flexible ops (see Scheduling Command Reference)
We have released Version 0.3.5 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This release migrates Schedoscope's Hadoop dependencies to CDH-5.5.1. Furthermore, the test framework has been ported to Hive 1.1.0. Finally, Schedoscope's resilience against Metastore failures has been improved. It is able to reconnect and resume work when the Metastore has become unavailable in more error cases.
We have released Version 0.3.4 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This release fixes a bug in Schedoscope which led to not correctly instantiating ViewActors for newly appearing dependencies such as date changes. Moreover, checksum versioning code has been cleaned up. Note that checksumming is not backwards compatible; you might want to execute your next materializations with the -m RESET_TRANSFORMATION_CHECKSUMS option.
We have released Version 0.3.3 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This release gets some order into the logging framework mess inherited from the various libraries used. It does so by routing Java util logging and Apache commons logging through SLF4J and SLF4J to logback. By muting log4j and setting an appropriate logback-test.xml test outputs are now a lot less chatty.
We have released Version 0.3.2 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This fixes a nasty resource leak in the Touch FileSystemTransformation
We have released Version 0.3.1 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Fields can now be given comments as well: val id = fieldOf[String]("An ID.")
We have released Version 0.3.0 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This is a big release, with the following major changes:
- Migration to Scala 2.11 and Akka 2.3.14
- Support of Hive 1.1.0 in test framework
- Significant code cleanup
- Significant round of Scaladoc documentation
- Significant performance improvements when dealing with many views / partitions
Please note that the cleanup incurred some breaking of the API. In particular, the storage format classes have been moved to a separate package org.schedoscope.dsl.storageformats
. Moreover, the various path builders for views have been renamed in a more systematic way. See Storage Paths.
We have released Version 0.2.2 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
Notable changes:
-
materializeOnce
clause for views (see Schedoscope View DSL Primer / Materialize Once) -
RESET_TRANSFORMATION_CHECKSUMS_AND_TIMESTAMPS
materialization mode (see Command Reference / Materialize)
We have released Version 0.2.1 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
We have added a configurable DriverRunCompletionHandler mechanism. These handlers are being called after a driver run has finished. This can be later exploited for monitoring. See reference.conf
We have released Version 0.2.0 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
The two major changes involve:
-
producing no-data state for views with filesystem transformations that actually do not result in any data instead of materialized state;
-
the
MonthlyParameterization
trait now includes an additional automatically generated partition valuemonth_id
of the formYYYYMM
to offer additional partition selection options in queries. This is similar todate_id
ofDailyParameterization
.
Note that the change to MonthlyParameterization
results in recomputation of views that implement it.
We are happy to present Schedoscope as a system demo at Strata NYC on Wednesday, 30th September 2015.
We have released Version 0.1.1 as a Maven artifact to our Bintray repository (see Setting Up A Schedoscope Project for an example pom).
This is a minor release that comprises some code cleanup and performance optimizations with regard to view initialization and Morphline transformations. We also ímplemented a shell transformation (more documentation to come).