Skip to content

Releases: BiomedDAR/copula-tabular

v0.1.6

03 Apr 08:59
Compare
Choose a tag to compare
v0.1.6 Pre-release
Pre-release

Description

Constraints update

  • minor bug fixes

CleanData bug fixes

  • gen_data_report minor bug fixes for variable mismatch

Package URL

https://pypi.org/project/bdarpack/0.1.6/

v0.1.5

02 Apr 02:18
Compare
Choose a tag to compare
v0.1.5 Pre-release
Pre-release

Description

CleanData Updates

  • Previously when reading .csv -type data files, all na type strings are automatically removed. Following that, columns of object datatype might be converted to float, which might conflict with the user's definition in the data dictionary. This is still the default behaviour, but with an updated option to allow such values to be loaded as they are, unless a data field/column is explicitly set out in the dictionary to be numeric.
  • The Generate Data Report feature now includes additional fields. Readers can now identify fields that are out-of-range (numerical types) or not-defined (categorical types), based on what is defined in the data dictionary.
  • The attribute var_list was not available when discrepancies are detected between the listed data fields in the dictionary and data files. It is now available, defaulted to the fields found in the data file.
  • Previously, the CleanData module generates the required output data folders as promised, but only after it tries to record its actions in a log file from a non-existent output data folder. Now it does the sensible thing by ensuring the data folders exist first.
  • An additional option is now available to modify the dataframe "index" by concatenating existing "Index"-type data in the data dictionary, so as to uniquely identify rows, when they are not already uniquely identified by existing "Index"-type columns. This is useful when generating reports, and pin-pointing the exact rows which are problematic. To activate this option, specify CREATE_UNIQUE_INDEX to True in the definitions.py. Other settings include UNIQUE_INDEX_COMPOSITION_LIST and UNIQUE_INDEX_DELIMITER.
  • If the value for OUTPUT_TYPE_DATA is xlsx in the definitions.py file, converting_ascii crashes if there are <NA> type values in the data. The problem is now fixed to skip ASCII conversion for <NA> type entries.
  • Additional function add_dictionary_row is now available to add entries to the Data Dictionary. This is useful when creating secondary variables and syncing the data dictionary along with the new creation.

TabulaCopula Updates

  • Bug fix for data paths in non-windows based systems.

Constraints Updates

  • Updated functions "multiparent_conditions", "evaluate_df_column" with new options. It is now able to create secondary columns with names that have appended suffixes, instead of replacing the original variables. It also generates more comprehensive logs, on the rows that have been replaced.
  • Updated function "convertBlankstoValue" to also convert strings that are empty, on top of those that are null.
  • New functionality "find_mismatch" to find mismatches between any two columns in a dataframe.

Utils Updates

  • New function extract_year_month_day is available to extract the year, month, and day from a given string-type date using a specified format.
  • Minor bug fixes in "mapping_dictDateFormatConversion".

VIsualPlot Updates

  • Added "bins" option to histogram plots.

Package URL

https://pypi.org/project/bdarpack/0.1.5/

v0.1.4

23 Feb 07:21
Compare
Choose a tag to compare
v0.1.4 Pre-release
Pre-release

Description

Utilities update

  • new function gen_interpolation for creating new datapoints via interpolation
  • new function conversionFromTIMSTxtToCSV for reading oddly delimited .txt files and convert them to .csv format

CleanData bug fixes

  • gen_data_report no longer ignores TYPE categories in data dictionary when they come with trailing spaces
  • gen_data_report now accepts a variety of TYPE categories in data dictionary, on top of the standard numeric, string, date, bool.
  • CleanData will now allow users to define sheetname for EXCEL outputs, using the RAWDICTXLSX_SHEETNAME attribute in definitions.

Package URL

https://pypi.org/project/bdarpack/0.1.4/

v0.1.3

09 Jan 02:00
Compare
Choose a tag to compare
v0.1.3 Pre-release
Pre-release

First release to PyPI

Description

Package includes

  • Data cleaning tools
  • Transformation tools for converting non-numeric data into numeric equivalents
  • Univariate Marginal Distribution modelling from raw data
  • Conditional-Copula Implementations for generating synthetic data
  • Privacy Metric evaluation wrapper

Package URL

https://pypi.org/project/bdarpack/0.1.3/