- New data processing graph and new graph based
Pipeline
with customizable execution policy and with pre-execution tests - New MongoDB backend with a store, data object and few demo ops
- New XLS backend with a store and data object
- New operations (see below)
New operations:
filter_by_range
,filter_not_empty
: rows, sqlsplit_date
: rows, sqlfield_filter
: mongo (withoutrename
)distinct
: mongoinsert
: (rows, sql) and (sql, sql)assert_contains
,assert_missing
: sqlempty_to_missing
: rows – experimentalstring_to_date
: rows – still experimental, format will change to SQL date format
Changed and fixed operations:
aggregate
accepts empty measure list – yields only count
- Added
document
storage data type to represent JSON-like objects - object store can be cloned using
clone()
which should provide another store with different configuration but possibility of mutually composable objects - new
FieldError
exception - Take into account object's data consumability on object use (naive implementation for the time being)
- CSVStore (
csv
) is now able to create CSV targets withcsv_target
factory name - New
Resource
class representing file-like resources with optional call toclose()
- Added
FileSystemStore
for read-only CSV and XLS files with default settings. - Added
Store.exists()
, implemented in SQL backend. ProbeAssertionError
has areason
attribute
Pipeline and execution:
Graph
andNode
structure for building operation processing graphs- operation list has an operation prototype that includes operation operand and parameter names
- Added
ExecutionEngine
, currently semi-private, but will serve as basis for future custom graph execution policies - Added
Pipeline.execution_plan
- Added thread_local - thread local variable storage
- Added
retry_deny
andretry_allow
to the operation context - Added insert operation accessible through
Pipeline.insert_into
andPipeline.insert_into_object
- Added
test_if_needed()
andtest_if_satisfied()
methods which are fork() -like but executed before running the pipeline (see documentation for more information)
- Original
Pipeline
implementation replaced – instead of immediate execution a graph is being created. Explicitrun()
is required. - calling operations decorated with
@experimental
will cause a warning to be logged - renamed module
doc
todev
, will contain more development tools in the future, such as operation auditing or data object API conformance checking default_context
is now a thread-local variable, created on first useopen_resource
now returns aResource
object- renamed engine
prepare_execution_plan
toexecution_plan
- operation context's
o
accessor was renamed toop
and now also supports getitem:context.op["duplicates"]
is equal tocontext.op.duplicates
. - data objects should respond to
retained()
andis_consumable()
- default field storage type is now
string
instead ofunknown
for convenience. - Removed default setting for debug logging, uses warning level
- Renamed namespace object name customization class variable
_ns_object_name
to__identifier__
- Problem described in the Issue #4 works as expected
- Fixed problem with filter_by_value
- Fixed aggregate key issues