Releases: rstudio/pointblank
v0.12.2
This release provides a few minor improvements along with many bug fixes.
-
New argument
extract_tbl_checked
added tointerrogate()
. WhenFALSE
, the$tbl_checked
column from the validation set will be dropped before returning the agent. This may be helpful in reducing object size for large agents (#542). (#554) -
The new argument
na_rm
insnip_list()
suppresses anyNA
values so that they won't included in the snippet's list of items (#547). (#556) -
Improved readability of error messages rendered as tooltips in the agent report. (#543)
-
col_vals_expr()
shows used columns in the agent report when interrogated. (#570) -
Improved the matching of rows between
agent$validation_step
and the rows of the agent report (#563). (#565) -
Functions accepting
...
now userlang::list2()
, enabling dynamic dots. For example, a multiagent can now be constructed from alist()
of agents usingcreate_multiagent(!!!list_of_agents)
(#552). (#553) -
Fixed bug with non-standard column names in some validation functions (#545, #546). (#555)
-
Fixed a regression in
col_vals_*()
functions, wherevars("col")
was evaluating to the string"col"
. Behavior ofvars("col")
is now aligned back withvars(col)
- both evaluate to the column namecol
. (#535) -
Problems arising from comparing
columns
to avalue
of different class (for example, comparing a datetime column to a date valueSys.Date()
instead of another datetime valueSys.time()
) are now signalled appropriately atinterrogate()
(#536, #537). (#539) -
Fixed bug in
has_columns()
failing to detect non-existing columns when supplied as a character vector. (#540) -
Replace uses of
crayon::make_style()
withcli::make_ansi_style()
, removing thecrayon
dependency. (#559, thanks @olivroy!) -
Use
rlang::check_installed()
to perform checks of optional package installs. (#559, @olivroy) -
Modernized CI workflows with dedicated linting action. (#560, @olivroy)
-
Avoid unwanted equation formatting in agent report arising from arbitrary
"$"
characters (#561). (#562)
v0.12.1
-
Ensured that the column string is a symbol before constructing the expression for the
col_vals_*()
functions. -
No longer resolve columns with tidyselect when the target table cannot be materialized.
-
Relaxed tests on tidyselect error messages.
v0.12.0
New features
-
Complete
{tidyselect}
support for thecolumns
argument of all validation functions, as well as inhas_columns()
andinfo_columns
. Thecolumns
argument can now take familiar column-selection expressions as one would use insidedplyr::select()
. This also begins a process of deprecation:columns = vars(...)
will continue to work, butc()
now supersedesvars()
.- If passing an external vector of column names, it should be wrapped in
all_of()
.
-
The
label
argument of validation functions now exposes the following string variables via{glue}
syntax:"{.step}"
: The validation step name"{.col}"
: The current column name"{.seg_col}"
: The current segment's column name"{.seg_val}"
: The current segment's value/group
These dynamic values may be useful for validations that get expanded into multiple steps.
-
interrogate()
gains two new options for printing progress in the console output:progress
: Whether interrogation progress should be printed to the console (TRUE
for interactive sessions, same as before)show_step_label
: Whether each validation step's label value should be printed alongside the progress.
Minor improvements and bug fixes
-
Fixes issue with rendering reports in Quarto HTML documents.
-
When no columns are returned from a
{tidyselect}
expression incolumns
, the agent's report now displays the originally supplied expression instead of simply blank (e.g., increate_agent(small_table) |> col_vals_null(matches("z"))
). -
Fixes issue with the hashing implementation to improve performance and alignment of validation steps in the multiagent.
v0.11.4
- Fixes issue with gt
0.9.0
compatibility.
v0.11.3
- Fixes issue with tables not rendering due to interaction with the gt package.
v0.11.2
- Internal changes were made to ensure compatibility with an in-development version of R.
v0.11.1
- Updated all help files to pass HTML validation.
v0.11.0
New features
-
The
row_count_match()
function can now match the count of rows in the target table to a literal value (in addition to comparing row counts to a secondary table). -
The analogous
col_count_match()
function was added to compare column counts in the target table to a secondary table, or, to match on a literal value. -
Substitution syntax has been added to the
tbl_store()
function with{{ <name> }}
. This is a great way to make table-prep more concise, readable, and less prone to errors. -
The
get_informant_report()
has been enhanced with morewidth
options. Aside from the"standard"
and"small"
sizes we can now supply any pixel- or percent-based width to precisely size the reporting. -
Added support for validating data in BigQuery tables.
Documentation
- All functions in the package now have better usage examples.
v0.10.0
New features
-
The new function
row_count_match()
(plusexpect_row_count_match()
andtest_row_count_match()
) checks for exact matching of rows across two tables (the target table and a comparison table of your choosing). Works equally well for local tables and for database and Spark tables. -
The new
tbl_match()
function (along withexpect_tbl_match()
andtest_tbl_match()
) checks for an exact matching of the target table with a comparison table. It will check for a strict match on table schemas, on equivalent row counts, and then exact matches on cell values across the two tables.
Minor improvements and bug fixes
-
The
set_tbl()
function was given thetbl_name
andlabel
arguments to provide an opportunity to set metadata on the new target table. -
Support for
mssql
tables has been restored and works exceedingly well for the majority of validation functions (the few that are incompatible provide messaging about not being supported).
Documentation
-
All functions in the package now have usage examples.
-
An RStudio Cloud project has been prepared with .Rmd files that contain explainers and runnable examples for each function in the package. Look at the project README for a link to the project.
Breaking changes
-
The
read_fn
argument increate_agent()
andcreate_informant()
has been deprecated in favor of an enhancedtbl
argument. Now, we can supply a variety of inputs totbl
for associating a target table to an agent or an informant. Withtbl
, it's now possible to provide a table (e.g.,data.frame
,tbl_df
,tbl_dbi
,tbl_spark
, etc.), an expression (a table-prep formula or a function) to read in the table only at interrogation time, or a table source expression to get table preparations from a table store (as an in-memory object or as defined in a YAML file). -
The
set_read_fn()
,remove_read_fn()
, andremove_tbl()
functions were removed since theread_fn
argument has been deprecated (and there's virtually no need to remove a table from an object withremove_tbl()
now).
v0.9.0
New features
-
The new
rows_complete()
validation function (along with theexpect_rows_complete()
andtest_rows_complete()
expectation and test variants) check whether rows contain anyNA
/NULL
values (optionally constrained to a selection of specifiedcolumns
). -
The new function
serially()
(along withexpect_serially()
andtest_serially()
) allows for a series of tests to run in sequence before either culminating in a final validation step or simply exiting the series. This construction allows for pre-testing that may make sense before a validation step. For example, there may be situations where it's vital to check a column type before performing a validation on the same column. -
The
specially()
/expect_specially()
/test_specially()
functions enable custom validations/tests/expectations with a user-defined function. We still havepreconditions
and other common args available for convenience. The great thing about this is that because we require the UDF to return a logical vector of passing/failing test units (or a table where the rightmost column is logical), we can incorporate the results quite easily in the standard pointblank reporting. -
The
info_columns_from_tbl()
function is a super-convenient wrapper for theinfo_columns()
function. Say you're making a data dictionary with an informant and you already have the table metadata somewhere as a table: you can use that here and not have to callinfo_columns()
many, many times. -
Added the
game_revenue_info
dataset which contains metadata for the extantgame_revenue
dataset. Both datasets pair nicely together in examples that create a data dictionary withcreate_informant()
andinfo_columns_from_tbl()
. -
Added the table transformer function
tt_tbl_colnames()
to get a table's column names for validation.
Minor improvements and bug fixes
-
Input data tables with
label
attribute values in their columns will be displayed in the 'Variables' section of thescan_data()
report. This is useful when scanning imported SAS tables (which often have labeled variables). -
The
all_passed()
function has been improved such that failed validation steps (that return an evaluation error, perhaps because of a missing column) result inFALSE
; thei
argument has been added toall_passed()
to optionally get a subset of validation steps before evaluation. -
For those
expect_*()
functions that can handle multiple columns, pointblank now correctly stops at the first failure and provides the correct reporting for that. Passing multiple columns really should mean processing multiple steps in serial, and previously this was handled incorrectly.