- #154 PR: Added YAML option to allow HTML elements when choosing PDF report.
- #165: Added
geom_jitter
option toplot_boxplot
andplot_scatterplot
. - #176 PR: Improved legend ordering in
plot_missing
. - #177 PR: Added group color customization in
plot_missing
.
- #139: Added
by
argument toplot_bar
.
- #148: Address CRAN removal due to vignette build failure.
- #111: Continuous distributions can now be plotted with different scales, i.e., histogram, density, boxplot, scatterplot.
- #126: Cleaned up labels in legend guide.
- #127 (PR): Added option to plot columns with missing values only in
plot_missing
. - Cleaned up code for
create_report
.
- #109: Fixed a bug causing unordered bar charts.
- #114: Removed redundant message in
dummify
. - #116: Fixed pandoc document conversion error 99.
- #120: Fixed type
logical
being parsed assymbol
inconfigure_report
. - #121: Fixed missing value bug when
split_columns(..., binary_as_factor = TRUE)
. - #130 (PR):
plot_prcomp
now drops columns with zero variance.
- #92: Added
update_columns
to transform any selected columns.
- #87: Added
configure_report
function to customize report content. - #89: Added option to customize
geom_text
andgeom_label
arguments. - #91:
create_report
now displays full report directory after completion. - #95: Added better exception handling for
plot_bar
. - #98: Added band customization to
plot_missing
. - #100: Switched
geom_text
togeom_label
. - #103: Report title can now be customized in
create_report
. - #108: Added option to treat binary features as discrete in
plot_bar
,plot_histogram
,plot_density
andplot_boxplot
. - Updated d3.min.js to v5.9.2.
- #88: Added
plot_intro
to report config. - #90: Added first plot in
plot_prcomp
to output andpage_0
. - #94: Fixed typo for PCA.
- #86: Replaced
gridExtra::grid.arrange
with facets. - Added seeds to vignette and README for re-producible examples.
- Hid all internal functions.
- #42: Applied S3 methods for plotting functions.
- #77:
dummify
now works on selected columns. - #78: All ggplot objects from
plot_*
are now invisibly returned. As a result, extractedprofile_missing
fromplot_missing
for missing value profiles. - #83: Removed all deprecated functions.
- #85: Users can now specify number of rows/columns for plot page layout.
plot_prcomp
now passedscale. = TRUE
toprcomp
by default.- Added
sampled_rows
argument toplot_scatterplot
. - Added option to parallelize plot object construction.
- Updated default config for
create_report
.
- #74: Fixed a bug causing
create_report
failure due to zero complete rows. - #75: Fixed a bug in
plot_str
when plotting data.frame with more than 100 columns. - #82: Removed hard-coded scales from all plot functions.
- Fixed a bug causing wrong column indices in
split_columns
. - Fixed a bug using standard deviation instead of variance in
plot_prcomp
.
- Updated vignette for better clarity.
- #71: Added better error handler for
plot_prcomp
.
- #69: Fixed bug causing
create_report
failure (specifically fromplot_prcomp
) wheny
is specified. - Added more unit tests for
create_report
andplot_prcomp
.
- #15: Added
plot_prcomp
to visualize principal component analysis. - #54: Extracted
dummify
fromplot_correlation
as a new function. - #59: Added
introduce
for basic metadata.
- #41:
create_report
can now be customized. - #53: Added page number for plots that span multiple pages.
- #56: Added support for theme and customization for individual components.
- #62:
plot_bar
now supports optional measures (in addition to categorical frequency) using argumentwith
. - #66: Feature engineering functions works on other classes in addition to just data.table.
plot_missing
:- Percentage text labels from output plot now has 2 decimals to prevent small percentages from being truncated to 0%.
- Added example to quickly drop columns with too many missing values.
- Added
.ignoreCat
and.getAllMissing
to helper.
- #55: Fixed bugs and updated vignette with latest functions.
- #57: Fixed
plot_str
bug for not supporting S4 objects. - #63: Fixed
plot_histogram
andplot_density
not working with column names containing spaces.
- #48: Added
plot_scatterplot
to visualize relationship of one feature against all other. - #50: Added
plot_boxplot
to visualize continuous distributions broken down by another feature.
- #44: Added option to exclude categories in
group_category
. - #45: Added title option for all plots.
- #46: Added option to exclude columns in
set_missing
. - #49 [Breaking Change]: Switched package to tidyverse style. All old functions are in
.Deprecated
mode. List of name changes in alphabetical order:BarDiscrete
->plot_bar
CollapseCategory
->group_category
CorrelationContinuous
->plot_correlation(..., type = "continuous")
CorrelationDiscrete
->plot_correlation(..., type = "discrete")
DensityContinuous
->plot_density
DropVar
->drop_columns
GenerateReport
->create_report
HistogramContinuous
->plot_histogram
PlotMissing
->plot_missing
PlotStr
->plot_str
SetNaTo
->set_missing
SplitColType
->split_columns
- #52: Combined
CorrelationContinuous
andCorrelationDiscrete
into one function, and added option to view correlation of all features at once. - Optimized layout for multiple plots.
- #47: Fixed color scale for correlation heatmap.
- #32: Fixed pandoc requirement error in unit test on cran.
- #34: Fixed error message when
quiet
is not supplied. In addition, report directory are printed throughmessage()
instead ofcat()
. - #35: Fixed rprojroot not found error.
- #12: Added vignette: dataexplorer-intro.
- #36: Fixed warnings from data.table in
DropVar
. - #37: Changed all
cat()
tomessage()
. - #38: Added option to order bars in
BarDiscrete
. - #39: Extended
SetNaTo
to discrete features. - Added more examples to README.md.
- #25: Added
SetNaTo
to quickly reset missing numerical values. - #29: Added
DropVar
to quickly drop variables by either name or column position.
- #24:
CorrelationDiscrete
now displays all factor levels instead of full rank matrix frommodel.matrix
.
- #11: Functions with return values will now match the input class and set it back.
- #22: Added documentation for
num_all_missing
inSplitColType
. - #23: Added additional measures (in addition to frequency) to
CollapseCategory
. - #26: Removed density estimation section from report template.
- #31: Added flexibility to name the new category in
CollapseCategory
.
- #30: In
CollapseCategory
,update = TRUE
will only work with input data asdata.table
. However, it is still possible to view the frequency distribution with any input data class, as long asupdate = FALSE
.
- #20: Fixed permission denied bug due to intermediates_dir argument in
knitr::render
.
- #16: Improved handling of missing values.
- #18:
GenerateReport
now handles data without discrete or continuous features.
- #14: Updated rmarkdown template for
GenerateReport
. - #1: Features with all
NA
values will be ignored inBarDiscrete
.
- Fixed a major bug in
GenerateReport
function due to package renaming.
GenerateReport
will now print the directory of the report to console.
- Added function
CollapseCategory
to collapse sparse categories for discrete features. - Added correlation heatmap for both continuous and discrete features.
- Added density plot for continuous features.
- Fixed a bug in
BarDiscrete
andCorrelationDiscrete
for not plotting non-factor class. - Minor changes for CRAN re-submission.
- Changed grid layout for
BarDiscrete
andHistogramContinuous
. - Features with all missing values will be ignored.
- Switched position between continuous and discrete features in report template.
- Renamed package name to DataExplorer.
- Added NEWS.md.
- Removed
BoxplotContinuous
.