Skip to content

Commit

Permalink
Fixed #92; Refixed #108; Updated D3
Browse files Browse the repository at this point in the history
  • Loading branch information
boxuancui committed Mar 15, 2019
1 parent e320d4a commit c565f16
Show file tree
Hide file tree
Showing 16 changed files with 135 additions and 33 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ export(plot_str)
export(profile_missing)
export(set_missing)
export(split_columns)
export(update_columns)
import(data.table)
import(ggplot2)
import(gridExtra)
Expand Down
14 changes: 9 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# DataExplorer 0.7.1.9000
## Bug Fixes
* [#88](https://github.com/boxuancui/DataExplorer/issues/88): Added `plot_intro` to report config.
* [#90](https://github.com/boxuancui/DataExplorer/issues/90): Added first plot in `plot_prcomp` to output and `page_0`.
* [#94](https://github.com/boxuancui/DataExplorer/issues/94): Fixed typo for PCA.
## New Features
* [#92](https://github.com/boxuancui/DataExplorer/issues/92): Added `update_columns` to transform any selected columns.

## Enhancements
* [#89](https://github.com/boxuancui/DataExplorer/issues/89): Added option to customize `geom_text` and `geom_label` arguments.
Expand All @@ -11,7 +9,13 @@
* [#98](https://github.com/boxuancui/DataExplorer/issues/98): Added band customization to `plot_missing`.
* [#100](https://github.com/boxuancui/DataExplorer/issues/100): Switched `geom_text` to `geom_label`.
* [#103](https://github.com/boxuancui/DataExplorer/issues/103): Report title can now be customized in `create_report`.
* [#108](https://github.com/boxuancui/DataExplorer/issues/108): Added option to treat binary features as discrete in `plot_bar`, `plot_histogram` and `plot_density`.
* [#108](https://github.com/boxuancui/DataExplorer/issues/108): Added option to treat binary features as discrete in `plot_bar`, `plot_histogram`, `plot_density` and `plot_boxplot`.
* Updated d3.min.js to v5.9.2.

## Bug Fixes
* [#88](https://github.com/boxuancui/DataExplorer/issues/88): Added `plot_intro` to report config.
* [#90](https://github.com/boxuancui/DataExplorer/issues/90): Added first plot in `plot_prcomp` to output and `page_0`.
* [#94](https://github.com/boxuancui/DataExplorer/issues/94): Fixed typo for PCA.

# DataExplorer 0.7.1
## Enhancements
Expand Down
4 changes: 2 additions & 2 deletions R/drop_columns.r
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
#' Drop selected variables
#'
#' Quickly drop variables by either name or column position.
#' Quickly drop variables by either column names or positions.
#' @param data input data
#' @param ind a vector of either names or column positions of the variables to be dropped.
#' @keywords drop_columns
#' @details \bold{This function updates \link{data.table} object directly.} Otherwise, output data will be returned matching input object class.
#' @import data.table
#' @export drop_columns
#' @export
#' @examples
#' # Load packages
#' library(data.table)
Expand Down
5 changes: 3 additions & 2 deletions R/plot_boxplot.r
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#' This function creates boxplot for each continuous feature based on a selected feature.
#' @param data input data
#' @param by feature name to be broken down by. If selecting a continuous feature, boxplot will be grouped by 5 equal ranges, otherwise, all existing categories for a discrete feature.
#' @param binary_as_factor treat binary as categorical? Default is \code{TRUE}.
#' @param geom_boxplot_args a list of other arguments to \link{geom_boxplot}
#' @param title plot title
#' @param ggtheme complete ggplot2 themes. The default is \link{theme_gray}.
Expand All @@ -20,13 +21,13 @@
#' plot_boxplot(iris, by = "Species", nrow = 2L, ncol = 2L)
#' plot_boxplot(iris, by = "Species", geom_boxplot_args = list("outlier.color" = "red"))

plot_boxplot <- function(data, by, geom_boxplot_args = list(), title = NULL, ggtheme = theme_gray(), theme_config = list(), nrow = 3L, ncol = 4L, parallel = FALSE) {
plot_boxplot <- function(data, by, binary_as_factor = TRUE, geom_boxplot_args = list(), title = NULL, ggtheme = theme_gray(), theme_config = list(), nrow = 3L, ncol = 4L, parallel = FALSE) {
## Declare variable first to pass R CMD check
variable <- by_f <- value <- NULL
## Check if input is data.table
if (!is.data.table(data)) data <- data.table(data)
## Stop if no continuous features
split_obj <- split_columns(data)
split_obj <- split_columns(data, binary_as_factor = binary_as_factor)
if (split_obj$num_continuous == 0) stop("No Continuous Features!")
## Get continuous features
continuous <- split_obj$continuous
Expand Down
4 changes: 2 additions & 2 deletions R/plot_density.r
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#'
#' Plot density estimates for each continuous feature
#' @param data input data
#' @param geom_density_args a list of other arguments to \link{geom_density}
#' @param binary_as_factor treat binary as categorical? Default is \code{TRUE}.
#' @param geom_density_args a list of other arguments to \link{geom_density}
#' @param title plot title
#' @param ggtheme complete ggplot2 themes. The default is \link{theme_gray}.
#' @param theme_config a list of configurations to be passed to \link{theme}.
Expand All @@ -28,7 +28,7 @@
#' # Add color to density area
#' plot_density(data, geom_density_args = list("fill" = "black", "alpha" = 0.6))

plot_density <- function(data, geom_density_args = list(), binary_as_factor = TRUE, title = NULL, ggtheme = theme_gray(), theme_config = list(), nrow = 4L, ncol = 4L, parallel = FALSE) {
plot_density <- function(data, binary_as_factor = TRUE, geom_density_args = list(), title = NULL, ggtheme = theme_gray(), theme_config = list(), nrow = 4L, ncol = 4L, parallel = FALSE) {
## Declare variable first to pass R CMD check
variable <- value <- NULL
## Check if input is data.table
Expand Down
4 changes: 2 additions & 2 deletions R/plot_histogram.r
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
#'
#' Plot histogram for each continuous feature
#' @param data input data
#' @param geom_histogram_args a list of other arguments to \link{geom_histogram}
#' @param binary_as_factor treat binary as categorical? Default is \code{TRUE}.
#' @param geom_histogram_args a list of other arguments to \link{geom_histogram}
#' @param title plot title
#' @param ggtheme complete ggplot2 themes. The default is \link{theme_gray}.
#' @param theme_config a list of configurations to be passed to \link{theme}.
Expand All @@ -25,7 +25,7 @@
#' data <- data.frame(replicate(16L, rnorm(50)))
#' plot_histogram(data)

plot_histogram <- function(data, geom_histogram_args = list("bins" = 30L), binary_as_factor = TRUE, title = NULL, ggtheme = theme_gray(), theme_config = list(), nrow = 4L, ncol = 4L, parallel = FALSE) {
plot_histogram <- function(data, binary_as_factor = TRUE, geom_histogram_args = list("bins" = 30L), title = NULL, ggtheme = theme_gray(), theme_config = list(), nrow = 4L, ncol = 4L, parallel = FALSE) {
## Declare variable first to pass R CMD check
variable <- value <- NULL
## Check if input is data.table
Expand Down
39 changes: 39 additions & 0 deletions R/update_columns.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#' Update variable types or values
#'
#' Quickly update selected variables using column names or positions.
#' @param data input data
#' @param ind a vector of either names or column positions of the variables to be dropped.
#' @param what either a function or a non-empty character string naming the function to be called. See \link{do.call}.
#' @keywords drop_columns
#' @details \bold{This function updates \link{data.table} object directly.} Otherwise, output data will be returned matching input object class.
#' @import data.table
#' @export
#' @examples
#' str(update_columns(iris, 1L, as.factor))
#' str(update_columns(iris, c("Sepal.Width", "Petal.Length"), "as.integer"))
#'
#' ## Apply log transformation to all columns
#' summary(airquality)
#' summary(update_columns(airquality, names(airquality), log))
#'
#' ## Force set factor to numeric
#' df <- data.frame("a" = as.factor(sample.int(10L)))
#' str(df)
#' str(update_columns(df, "a", function(x) as.numeric(levels(x))[x]))

update_columns <- function(data, ind, what) {
## Check if input is data.table
is_data_table <- is.data.table(data)
## Detect input data class
data_class <- class(data)
## Set data to data.table
if (!is_data_table) data <- data.table(data)
## Transform columns
if (is.numeric(ind)) ind <- as.integer(ind)
for (j in ind) set(x = data, j = j, value = do.call(what, list(data[[j]])))
## Set data class back to original
if (!is_data_table) {
class(data) <- data_class
return(data)
}
}
4 changes: 4 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,10 @@ df <- data.frame("a" = rnorm(260), "b" = rep(letters, 10))
df[sample.int(260, 50), ] <- NA
set_missing(df, list(0L, "unknown"))
## Update columns
update_columns(airquality, c("Month", "Day"), as.factor)
update_columns(airquality, 1L, function(x) x^2)
## Drop columns
drop_columns(diamonds, 8:10)
drop_columns(diamonds, "clarity")
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,10 @@ df <- data.frame("a" = rnorm(260), "b" = rep(letters, 10))
df[sample.int(260, 50), ] <- NA
set_missing(df, list(0L, "unknown"))

## Update columns
update_columns(airquality, c("Month", "Day"), as.factor)
update_columns(airquality, 1L, function(x) x^2)

## Drop columns
drop_columns(diamonds, 8:10)
drop_columns(diamonds, "clarity")
Expand Down
10 changes: 2 additions & 8 deletions inst/rmd_template/d3.min.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion man/drop_columns.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions man/plot_boxplot.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions man/plot_density.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 6 additions & 5 deletions man/plot_histogram.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

35 changes: 35 additions & 0 deletions man/update_columns.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 17 additions & 0 deletions tests/testthat/test-update-columns.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
context("update variables")

test_that("test basic functionality", {
iris_dt <- data.table(iris)
update_columns(iris_dt, 1L, as.character)
update_columns(iris_dt, c("Sepal.Width", "Petal.Length"), as.factor)
update_columns(iris_dt, "Petal.Width", as.integer)
expect_is(iris_dt[[1]], "character")
expect_is(iris_dt$Sepal.Width, "factor")
expect_is(iris_dt$Petal.Length, "factor")
expect_is(iris_dt$Petal.Width, "integer")
expect_is(iris_dt, "data.table")
})

test_that("test non-data.table objects", {
expect_is(update_columns(iris, 1L, as.character), "data.frame")
})

0 comments on commit c565f16

Please sign in to comment.