Skip to content

v0.1.5

Pre-release
Pre-release
Compare
Choose a tag to compare
@BiomedDAR BiomedDAR released this 02 Apr 02:18
· 6 commits to main since this release

Description

CleanData Updates

  • Previously when reading .csv -type data files, all na type strings are automatically removed. Following that, columns of object datatype might be converted to float, which might conflict with the user's definition in the data dictionary. This is still the default behaviour, but with an updated option to allow such values to be loaded as they are, unless a data field/column is explicitly set out in the dictionary to be numeric.
  • The Generate Data Report feature now includes additional fields. Readers can now identify fields that are out-of-range (numerical types) or not-defined (categorical types), based on what is defined in the data dictionary.
  • The attribute var_list was not available when discrepancies are detected between the listed data fields in the dictionary and data files. It is now available, defaulted to the fields found in the data file.
  • Previously, the CleanData module generates the required output data folders as promised, but only after it tries to record its actions in a log file from a non-existent output data folder. Now it does the sensible thing by ensuring the data folders exist first.
  • An additional option is now available to modify the dataframe "index" by concatenating existing "Index"-type data in the data dictionary, so as to uniquely identify rows, when they are not already uniquely identified by existing "Index"-type columns. This is useful when generating reports, and pin-pointing the exact rows which are problematic. To activate this option, specify CREATE_UNIQUE_INDEX to True in the definitions.py. Other settings include UNIQUE_INDEX_COMPOSITION_LIST and UNIQUE_INDEX_DELIMITER.
  • If the value for OUTPUT_TYPE_DATA is xlsx in the definitions.py file, converting_ascii crashes if there are <NA> type values in the data. The problem is now fixed to skip ASCII conversion for <NA> type entries.
  • Additional function add_dictionary_row is now available to add entries to the Data Dictionary. This is useful when creating secondary variables and syncing the data dictionary along with the new creation.

TabulaCopula Updates

  • Bug fix for data paths in non-windows based systems.

Constraints Updates

  • Updated functions "multiparent_conditions", "evaluate_df_column" with new options. It is now able to create secondary columns with names that have appended suffixes, instead of replacing the original variables. It also generates more comprehensive logs, on the rows that have been replaced.
  • Updated function "convertBlankstoValue" to also convert strings that are empty, on top of those that are null.
  • New functionality "find_mismatch" to find mismatches between any two columns in a dataframe.

Utils Updates

  • New function extract_year_month_day is available to extract the year, month, and day from a given string-type date using a specified format.
  • Minor bug fixes in "mapping_dictDateFormatConversion".

VIsualPlot Updates

  • Added "bins" option to histogram plots.

Package URL

https://pypi.org/project/bdarpack/0.1.5/