Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SVD entropy in R #123

Merged
merged 4 commits into from
May 5, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ export(generate_component_matrix)
export(normalize)
export(replicate_correlation)
export(sparse_random_projection)
export(svd_entropy)
export(transform)
export(variable_importance)
export(variable_select)
Expand Down
64 changes: 64 additions & 0 deletions R/svd_entropy.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
utils::globalVariables(c(".", "i"))
#' Feature importance based on data entropy.
#'
#' \code{svd_entropy} measures the contribution of each feature in decreasing the data entropy.
#'
#' @param variables character vector specifying observation variables.
#' @param sample tbl containing sample used to estimate parameters.
#' @param cores optional integer specifying number of CPU cores used for parallel computing using \code{doParallel}.
#'
#' @return data frame specifying the contribution of each feature in decreasing the data entropy.
#' Higher values indicate more information.
#'
#' @importFrom foreach %dopar%
#' @importFrom magrittr %>%
#'
#' @examples
#' sample <- tibble::data_frame(
#' AreaShape_MinorAxisLength = c(10, 12, 15, 16, 8, 8, 7, 7, 13, 18),
#' AreaShape_MajorAxisLength = c(35, 18, 22, 16, 9, 20, 11, 15, 18, 42),
#' AreaShape_Area = c(245, 151, 231, 179, 50, 112, 53, 73, 164, 529)
#' )
#' variables <- c("AreaShape_MinorAxisLength", "AreaShape_MajorAxisLength", "AreaShape_Area")
#' svd_entropy(variables, sample, cores = 1)
#'
#' @export
svd_entropy <- function(variables, sample, cores = NULL) {
doParallel::registerDoParallel(cores = cores)

singular_value_entropy <- function(A) {
singular_values <- svd(A, 0, 0)$d

# normalize
singular_values <- singular_values / sum(singular_values)

# entropy
-sum(singular_values * log10(singular_values))
}

entropy_score <- function(data) {

# calculate contribution of each features to the entropy by leaving that feature out
sv_entropy <-
foreach::foreach(i = 1:ncol(data), .combine = c) %dopar% singular_value_entropy(data[-i, -i])

singular_value_entropy(data) - sv_entropy
}

sample %<>%
dplyr::select(dplyr::one_of(variables)) %>%
dplyr::collect()

# to ensure the ordering is captured
variables <- colnames(sample)

entropy_scores <-
as.matrix(sample) %>%
crossprod(., .) %>%
entropy_score()

dplyr::data_frame(variable = variables,
svd_entropy = entropy_scores)

}

4 changes: 2 additions & 2 deletions docs/LICENSE.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

281 changes: 131 additions & 150 deletions docs/articles/cytominer-pipeline.html

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/articles/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/authors.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 13 additions & 17 deletions docs/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 5 additions & 6 deletions docs/news/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 7 additions & 7 deletions docs/reference/aggregate.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 10 additions & 10 deletions docs/reference/correlation_threshold.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/reference/count_na_rows.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/reference/covariance.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/reference/drop_na_columns.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading