-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #383 histogram can plot bars as frequency #384
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,15 +3,21 @@ | |
#' Producing Histograms | ||
#' | ||
#' @inheritParams addScatter | ||
#' @param frequency logical defining if histogram displays a frequency in y axis | ||
#' @param bins Number or edges of bins. | ||
#' If `bins` is provided as a single numeric values, `bin` corresponds to number of bins. | ||
#' The bin edges are then equally spaced within the range of data. | ||
#' If `bins` is provided as an array of numeric values, `bin` corresponds to their edges. | ||
#' Default value, `bins=NULL`, uses the value defined by `dataMapping` | ||
#' @param binwidth Numerical value of defining the width of each bin. | ||
#' If defined, `binwidth` can overwrite `bins` if `bins` was not provided or simply provided as a single value. | ||
#' Default value, `binwidth=NULL`, uses the value defined by `dataMapping` | ||
#' @param stack Logical defining for multiple histograms if their bars are stacked | ||
#' Default value, `stack=NULL`, uses the value defined by `dataMapping` | ||
#' @param distribution Name of distribution to fit to the data. | ||
#' Only 2 distributions are currently available: `"normal"` and `"logNormal"` | ||
#' Use `distribution="none"` to prevent fit of distribution | ||
#' Default value, `distribution=NULL`, uses the value defined by `dataMapping` | ||
#' @param dataMapping | ||
#' A `HistogramDataMapping` object mapping `x` and aesthetic groups to their variable names of `data`. | ||
#' @param plotConfiguration | ||
|
@@ -27,6 +33,9 @@ | |
#' # Produce histogram of normally distributed data | ||
#' plotHistogram(x = rnorm(100)) | ||
#' | ||
#' # Produce histogram of normally distributed data normalized in y axis | ||
#' plotHistogram(x = rnorm(100), frequency = TRUE) | ||
#' | ||
#' # Produce histogram of normally distributed data with many bins | ||
#' plotHistogram(x = rlnorm(100), bins = 21) | ||
#' | ||
|
@@ -37,6 +46,7 @@ plotHistogram <- function(data = NULL, | |
metaData = NULL, | ||
x = NULL, | ||
dataMapping = NULL, | ||
frequency = NULL, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In plot functions, I had made the choice of leaving |
||
bins = NULL, | ||
binwidth = NULL, | ||
stack = NULL, | ||
|
@@ -61,16 +71,26 @@ plotHistogram <- function(data = NULL, | |
validateIsNumeric(binwidth, nullAllowed = TRUE) | ||
validateIsLogical(stack, nullAllowed = TRUE) | ||
validateIsIncluded(distribution, c("normal", "logNormal", "none"), nullAllowed = TRUE) | ||
validateIsLogical(frequency, nullAllowed = TRUE) | ||
|
||
dataMapping$frequency <- frequency %||% dataMapping$frequency | ||
dataMapping$stack <- stack %||% dataMapping$stack | ||
dataMapping$distribution <- distribution %||% dataMapping$distribution | ||
dataMapping$bins <- bins %||% dataMapping$bins | ||
dataMapping$binwidth <- binwidth %||% dataMapping$binwidth | ||
|
||
# Check for default labeling to update plotConfiguration after using .setPlotConfiguration | ||
ylabel <- NULL | ||
if (isEmpty(plotConfiguration)) { | ||
ylabel <- ifelse(dataMapping$frequency, "Relative frequency", "Count") | ||
} | ||
|
||
plotConfiguration <- .setPlotConfiguration( | ||
plotConfiguration, HistogramPlotConfiguration, | ||
data, metaData, dataMapping | ||
) | ||
# Update default ylabel based on frequency | ||
plotConfiguration$labels$ylabel <- ylabel %||% plotConfiguration$labels$ylabel | ||
plotObject <- .setPlotObject(plotObject, plotConfiguration) | ||
|
||
mapData <- dataMapping$checkMapData(data) | ||
|
@@ -89,6 +109,23 @@ plotHistogram <- function(data = NULL, | |
if (length(dataMapping$bins) > 1) { | ||
edges <- dataMapping$bins | ||
} | ||
# Manage ggplot aes_string property depending on stack and frequency options | ||
# geom_histogram can use computed variables defined between two dots | ||
# see https://ggplot2.tidyverse.org/reference/geom_histogram.html for more info | ||
yAes <- "..count.." | ||
|
||
if (dataMapping$frequency) { | ||
# If histogram bars are not stacked, calculate frequency within each data groups | ||
# Since there is no direct computed variable | ||
# ncount variable is scaled by binwidth*dnorm(0) to get an area of ~1 | ||
yAes <- paste0("..ncount..*max(..width..)*", stats::dnorm(0)) | ||
if (dataMapping$stack) { | ||
# If histogram bars are stacked, | ||
# Calculate overall frequency as count per bin / total | ||
# This results in same histogram shapes no matter the data groups | ||
yAes <- "..count../sum(..count..)" | ||
} | ||
} | ||
|
||
aestheticValues <- .getAestheticValuesFromConfiguration( | ||
n = 1, | ||
|
@@ -101,6 +138,7 @@ plotHistogram <- function(data = NULL, | |
data = mapData, | ||
mapping = ggplot2::aes_string( | ||
x = mapLabels$x, | ||
y = yAes, | ||
fill = mapLabels$fill | ||
), | ||
position = position, | ||
|
@@ -202,9 +240,10 @@ plotHistogram <- function(data = NULL, | |
binwidth <- dataMapping$binwidth %||% binwidth | ||
|
||
if (dataMapping$stack) { | ||
yScaling <- binwidth * ifelse(dataMapping$frequency, 1, length(x)) | ||
dataFit <- data.frame( | ||
x = xFit, | ||
y = length(x) * binwidth * switch(dataMapping$distribution, | ||
y = yScaling * switch(dataMapping$distribution, | ||
"normal" = stats::dnorm(xFit, mean = mean(x, na.rm = TRUE), sd = stats::sd(x, na.rm = TRUE)), | ||
"logNormal" = stats::dlnorm(xFit, meanlog = mean(log(x), na.rm = TRUE), sdlog = stats::sd(log(x), na.rm = TRUE)) | ||
), | ||
|
@@ -216,11 +255,12 @@ plotHistogram <- function(data = NULL, | |
dataFit <- NULL | ||
for (groupLevel in levels(data$legendLabels)) { | ||
selectedGroup <- data$legendLabels %in% groupLevel | ||
yScaling <- binwidth * ifelse(dataMapping$frequency, 1, length(x[selectedGroup])) | ||
dataFit <- rbind.data.frame( | ||
dataFit, | ||
data.frame( | ||
x = xFit, | ||
y = length(x[selectedGroup]) * binwidth * switch(dataMapping$distribution, | ||
y = yScaling * switch(dataMapping$distribution, | ||
"normal" = stats::dnorm( | ||
xFit, | ||
mean = mean(x[selectedGroup], na.rm = TRUE), | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be
frequency = FALSE,
instead of NULL?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, both work as the actual value is defined during the initialization by
frequency=FALSE
.Unless
unlock_object=TRUE
, a field namedfrequency
has to be available before the initialization of the object no matter its value.