Skip to content

Releases: rstudio/pointblank

v0.12.2

23 Oct 13:58
a37333e
Compare
Choose a tag to compare

This release provides a few minor improvements along with many bug fixes.

  • New argument extract_tbl_checked added to interrogate(). When FALSE, the $tbl_checked column from the validation set will be dropped before returning the agent. This may be helpful in reducing object size for large agents (#542). (#554)

  • The new argument na_rm in snip_list() suppresses any NA values so that they won't included in the snippet's list of items (#547). (#556)

  • Improved readability of error messages rendered as tooltips in the agent report. (#543)

  • col_vals_expr() shows used columns in the agent report when interrogated. (#570)

  • Improved the matching of rows between agent$validation_step and the rows of the agent report (#563). (#565)

  • Functions accepting ... now use rlang::list2(), enabling dynamic dots. For example, a multiagent can now be constructed from a list() of agents using create_multiagent(!!!list_of_agents) (#552). (#553)

  • Fixed bug with non-standard column names in some validation functions (#545, #546). (#555)

  • Fixed a regression in col_vals_*() functions, where vars("col") was evaluating to the string "col". Behavior of vars("col") is now aligned back with vars(col) - both evaluate to the column name col. (#535)

  • Problems arising from comparing columns to a value of different class (for example, comparing a datetime column to a date value Sys.Date() instead of another datetime value Sys.time()) are now signalled appropriately at interrogate() (#536, #537). (#539)

  • Fixed bug in has_columns() failing to detect non-existing columns when supplied as a character vector. (#540)

  • Replace uses of crayon::make_style() with cli::make_ansi_style(), removing the crayon dependency. (#559, thanks @olivroy!)

  • Use rlang::check_installed() to perform checks of optional package installs. (#559, @olivroy)

  • Modernized CI workflows with dedicated linting action. (#560, @olivroy)

  • Avoid unwanted equation formatting in agent report arising from arbitrary "$" characters (#561). (#562)

v0.12.1

25 Mar 19:26
46c9ff5
Compare
Choose a tag to compare
  • Ensured that the column string is a symbol before constructing the expression for the col_vals_*() functions.

  • No longer resolve columns with tidyselect when the target table cannot be materialized.

  • Relaxed tests on tidyselect error messages.

v0.12.0

01 Mar 13:26
e131437
Compare
Choose a tag to compare

New features

  • Complete {tidyselect} support for the columns argument of all validation functions, as well as in has_columns() and info_columns. The columns argument can now take familiar column-selection expressions as one would use inside dplyr::select(). This also begins a process of deprecation:

    • columns = vars(...) will continue to work, but c() now supersedes vars().
    • If passing an external vector of column names, it should be wrapped in all_of().
  • The label argument of validation functions now exposes the following string variables via {glue} syntax:

    • "{.step}": The validation step name
    • "{.col}": The current column name
    • "{.seg_col}": The current segment's column name
    • "{.seg_val}": The current segment's value/group

    These dynamic values may be useful for validations that get expanded into multiple steps.

  • interrogate() gains two new options for printing progress in the console output:

    • progress: Whether interrogation progress should be printed to the console (TRUE for interactive sessions, same as before)
    • show_step_label: Whether each validation step's label value should be printed alongside the progress.

Minor improvements and bug fixes

  • Fixes issue with rendering reports in Quarto HTML documents.

  • When no columns are returned from a {tidyselect} expression in columns, the agent's report now displays the originally supplied expression instead of simply blank (e.g., in create_agent(small_table) |> col_vals_null(matches("z"))).

  • Fixes issue with the hashing implementation to improve performance and alignment of validation steps in the multiagent.

v0.11.4

25 Apr 15:15
47834e5
Compare
Choose a tag to compare
  • Fixes issue with gt 0.9.0 compatibility.

v0.11.3

09 Feb 21:56
5e3e60a
Compare
Choose a tag to compare
  • Fixes issue with tables not rendering due to interaction with the gt package.

v0.11.2

09 Oct 17:00
6d328d3
Compare
Choose a tag to compare
  • Internal changes were made to ensure compatibility with an in-development version of R.

v0.11.1

06 Sep 15:40
d87d55b
Compare
Choose a tag to compare
  • Updated all help files to pass HTML validation.

v0.11.0

14 Jul 02:50
b056ce3
Compare
Choose a tag to compare

New features

  • The row_count_match() function can now match the count of rows in the target table to a literal value (in addition to comparing row counts to a secondary table).

  • The analogous col_count_match() function was added to compare column counts in the target table to a secondary table, or, to match on a literal value.

  • Substitution syntax has been added to the tbl_store() function with {{ <name> }}. This is a great way to make table-prep more concise, readable, and less prone to errors.

  • The get_informant_report() has been enhanced with more width options. Aside from the "standard" and "small" sizes we can now supply any pixel- or percent-based width to precisely size the reporting.

  • Added support for validating data in BigQuery tables.

Documentation

  • All functions in the package now have better usage examples.

v0.10.0

23 Jan 22:09
4ef8d6b
Compare
Choose a tag to compare

New features

  • The new function row_count_match() (plus expect_row_count_match() and test_row_count_match()) checks for exact matching of rows across two tables (the target table and a comparison table of your choosing). Works equally well for local tables and for database and Spark tables.

  • The new tbl_match() function (along with expect_tbl_match() and test_tbl_match()) checks for an exact matching of the target table with a comparison table. It will check for a strict match on table schemas, on equivalent row counts, and then exact matches on cell values across the two tables.

Minor improvements and bug fixes

  • The set_tbl() function was given the tbl_name and label arguments to provide an opportunity to set metadata on the new target table.

  • Support for mssql tables has been restored and works exceedingly well for the majority of validation functions (the few that are incompatible provide messaging about not being supported).

Documentation

  • All functions in the package now have usage examples.

  • An RStudio Cloud project has been prepared with .Rmd files that contain explainers and runnable examples for each function in the package. Look at the project README for a link to the project.

Breaking changes

  • The read_fn argument in create_agent() and create_informant() has been deprecated in favor of an enhanced tbl argument. Now, we can supply a variety of inputs to tbl for associating a target table to an agent or an informant. With tbl, it's now possible to provide a table (e.g., data.frame, tbl_df, tbl_dbi, tbl_spark, etc.), an expression (a table-prep formula or a function) to read in the table only at interrogation time, or a table source expression to get table preparations from a table store (as an in-memory object or as defined in a YAML file).

  • The set_read_fn(), remove_read_fn(), and remove_tbl() functions were removed since the read_fn argument has been deprecated (and there's virtually no need to remove a table from an object with remove_tbl() now).

v0.9.0

28 Oct 17:57
88772ee
Compare
Choose a tag to compare

New features

  • The new rows_complete() validation function (along with the expect_rows_complete() and test_rows_complete() expectation and test variants) check whether rows contain any NA/NULL values (optionally constrained to a selection of specified columns).

  • The new function serially() (along with expect_serially() and test_serially()) allows for a series of tests to run in sequence before either culminating in a final validation step or simply exiting the series. This construction allows for pre-testing that may make sense before a validation step. For example, there may be situations where it's vital to check a column type before performing a validation on the same column.

  • The specially()/expect_specially()/test_specially() functions enable custom validations/tests/expectations with a user-defined function. We still have preconditions and other common args available for convenience. The great thing about this is that because we require the UDF to return a logical vector of passing/failing test units (or a table where the rightmost column is logical), we can incorporate the results quite easily in the standard pointblank reporting.

  • The info_columns_from_tbl() function is a super-convenient wrapper for the info_columns() function. Say you're making a data dictionary with an informant and you already have the table metadata somewhere as a table: you can use that here and not have to call info_columns() many, many times.

  • Added the game_revenue_info dataset which contains metadata for the extant game_revenue dataset. Both datasets pair nicely together in examples that create a data dictionary with create_informant() and info_columns_from_tbl().

  • Added the table transformer function tt_tbl_colnames() to get a table's column names for validation.

Minor improvements and bug fixes

  • Input data tables with label attribute values in their columns will be displayed in the 'Variables' section of the scan_data() report. This is useful when scanning imported SAS tables (which often have labeled variables).

  • The all_passed() function has been improved such that failed validation steps (that return an evaluation error, perhaps because of a missing column) result in FALSE; the i argument has been added to all_passed() to optionally get a subset of validation steps before evaluation.

  • For those expect_*() functions that can handle multiple columns, pointblank now correctly stops at the first failure and provides the correct reporting for that. Passing multiple columns really should mean processing multiple steps in serial, and previously this was handled incorrectly.