Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 support for phyloseq and TreeSE #17

Merged
merged 12 commits into from
Aug 14, 2021
Merged
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ README.md
TODO.txt
^.*\.DS_Store
docs
inst/extras
38 changes: 27 additions & 11 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
Package: philr
Type: Package
Title: Phylogenetic partitioning based ILR transform for metagenomics data
Version: 1.13.1
Date: 2017-04-07
Authors@R: c(person("Justin", "Silverman", role=c("aut", "cre"),
email = "[email protected]"))
Author: Justin Silverman
Maintainer: Justin Silverman <[email protected]>
Version: 1.19.1
Date: 2021-08-11
Authors@R:
c(person("Justin", "Silverman", role=c("aut", "cre"),
email = "[email protected]"),
person(given = "Leo", family = "Lahti", role = c("ctb"),
comment = c(ORCID = "0000-0001-5537-637X")))
Description: PhILR is short for Phylogenetic Isometric Log-Ratio Transform.
This package provides functions for the analysis of compositional
data (e.g., data representing proportions of different variables/parts).
Expand All @@ -15,11 +16,26 @@ Description: PhILR is short for Phylogenetic Isometric Log-Ratio Transform.
data) and makes available the Isometric Log Ratio transform built from the
phylogenetic tree and utilizing a weighted reference measure.
License: GPL-3
LazyData: TRUE
RoxygenNote: 6.1.1
Imports: ape, phangorn, tidyr, ggplot2, ggtree
Depends:
Suggests: testthat, knitr, rmarkdown, BiocStyle, phyloseq, glmnet, dplyr
RoxygenNote: 7.1.1
Imports:
ape,
phangorn,
tidyr,
ggplot2,
ggtree,
methods
Suggests:
testthat,
knitr,
ecodist,
rmarkdown,
BiocStyle,
phyloseq,
SummarizedExperiment,
TreeSummarizedExperiment,
glmnet,
dplyr,
mia
VignetteBuilder: knitr
biocViews: ImmunoOncology, Sequencing, Microbiome, Metagenomics, Software
URL: https://github.com/jsilve24/philr
Expand Down
9 changes: 9 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Generated by roxygen2: do not edit by hand

S3method(philr,TreeSummarizedExperiment)
S3method(philr,array)
S3method(philr,data.frame)
S3method(philr,default)
S3method(philr,matrix)
S3method(philr,numeric)
S3method(philr,phyloseq)
export(annotate_balance)
export(buildilrBasep)
export(calculate.blw)
Expand All @@ -24,5 +31,7 @@ importFrom(ape,rtree)
importFrom(ggplot2,annotate)
importFrom(ggtree,get_clade_position)
importFrom(ggtree,ggtree)
importFrom(methods,as)
importFrom(methods,is)
importFrom(phangorn,Children)
importFrom(tidyr,gather_)
67 changes: 39 additions & 28 deletions R/branch_length_calculations.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,28 @@
#' Calculates the weightings for ILR coordinates based on branch lenghts of
#' a phylogenetic tree via a few different methods (see details).
#' @aliases calc.blw
#' @param tree a \code{phylo} class tree object that is binary (see \code{\link[ape]{multi2di}})
#' @param method options include: (default) \code{'sum.children'} and \code{'mean.descendants'}
#' @param tree a \code{phylo} class tree object that is binary (see
#' \code{\link[ape]{multi2di}})
#' @param method options include: (default) \code{'sum.children'} and
#' \code{'mean.descendants'}
#' see details for more information.
#' @return vector of weightings for ILR coordinates produced via specified method.
#' @return vector of weightings for ILR coordinates produced via specified
#' method.
#' @details
#' ILR balances built from a binary partition of a phylogenetic tree
#' can be imbued with branch length information. This function is helpful in
#' calculating those weightings.\cr \cr
#' There are a number of methods for calculating these weightings, the default
#' \code{'sum.children'} calculates the weighting for a given balance as the sum
#' of its two direct children's branch length. An alternative that has been as yet less
#' studied is \code{'mean.descendants'} to calculate the weighting for a given balance
#' as the sum of its two direct children's branch lengths PLUS for each child the average
#' distance from it to its descendant tips. \cr \cr
#' \emph{Note:} That some trees contain tips with branch lengths of zero length. This can result
#' in that tip being unreasonably downweighted, as such this function automatically
#' adds a small pseudocount to those tips with zero length (equal to the smallest non-zero)
#' of its two direct children's branch length. An alternative that has been
#' as yet less studied is \code{'mean.descendants'} to calculate the weighting
#' for a given balance as the sum of its two direct children's branch lengths
#' PLUS for each child the average distance from it to its descendant tips.
#' \cr \cr
#' \emph{Note:} That some trees contain tips with branch lengths of zero length.
#' This can result in that tip being unreasonably downweighted, as such this
#' function automatically adds a small pseudocount to those tips with zero
#' length (equal to the smallest non-zero)
#' branch length on the tree.
#' @author Justin Silverman
#' @export
Expand All @@ -35,20 +40,24 @@ calculate.blw <- function(tree, method='sum.children'){
# In these cases will replace those zero values with
# the minimum branch length (greater than 0) found in the tree.
# Note this is like adding a psudocount to branch lengths.
min.nonzero <- min(tree$edge.length[tree$edge.length>0 & !is.na(tree$edge.length)])
min.nonzero <- min(tree$edge.length[tree$edge.length>0 &
!is.na(tree$edge.length)])
# Find the edges that end in tips and have zero length
tip.edges.zero <- (tree$edge[,2] <= nTips) & (tree$edge.length == 0)
# print warning/note statement
n.replace <- sum(tip.edges.zero)
if (n.replace >0 ){
warning(paste('Note: a total of', sum(tip.edges.zero), 'tip edges with zero length',
warning(paste('Note: a total of', sum(tip.edges.zero),
'tip edges with zero length',
'have been replaced with a small pseudocount of the minimum',
'non-zero edge length (',min.nonzero,').',sep=" "))
tree$edge.length[tip.edges.zero] <- min.nonzero
}

if (method=='sum.children')return(blw.sum.children(tree))
else if (method=='mean.descendants')return(blw.mean.descendants.sum.children(tree))
else if (method=='mean.descendants') {
return(blw.mean.descendants.sum.children(tree))
}
}

# Now calculate Branch Length Weighting as the sum of a nodes
Expand All @@ -64,26 +73,27 @@ blw.sum.children <- function(tree){
res
}

# Calculates the sum of the children's nodes average distance to descendant tips
# Calculates the sum of the children's nodes average distance to
# descendant tips
# Which should be a slightly better calculation than MDTT when we are
# primarily interested in weightings on ILR coordinates
blw.mean.descendants.sum.children <- function(tree){
nTips = ape::Ntip(tree)
X <- phangorn::Children(tree, (nTips+1):(nTips+tree$Nnode))
nTips = ape::Ntip(tree)
X <- phangorn::Children(tree, (nTips+1):(nTips+tree$Nnode))

# Children's average branch length to tips (zero for tips)
BMD <- mean_dist_to_tips(tree)
names(BMD) <- NULL
BMD <- c(numeric(nTips), BMD)
# Children's average branch length to tips (zero for tips)
BMD <- mean_dist_to_tips(tree)
names(BMD) <- NULL
BMD <- c(numeric(nTips), BMD)

# Each child's edge length
EL <- numeric(max(tree$edge))
EL[tree$edge[,2]] <- tree$edge.length
# Each child's edge length
EL <- numeric(max(tree$edge))
EL[tree$edge[,2]] <- tree$edge.length

fun <- function(x, el, bmd)sum(el[x] + bmd[x])
res <- sapply(X, fun, EL, BMD)
if(!is.null(tree$node.label)) names(res) <- tree$node.label
return(res)
fun <- function(x, el, bmd)sum(el[x] + bmd[x])
res <- sapply(X, fun, EL, BMD)
if(!is.null(tree$node.label)) names(res) <- tree$node.label
return(res)
}

#' Mean distance from internal nodes to descendant tips
Expand All @@ -105,7 +115,8 @@ mean_dist_to_tips <- function(tree){

dist <- ape::dist.nodes(tree)
node.numbers <- (nTips+1):(nTips+tree$Nnode)
chld.tips <- phangorn::Descendants(tree, (nTips+1):(nTips+tree$Nnode), "tips")
chld.tips <- phangorn::Descendants(tree, (nTips+1):(nTips+tree$Nnode),
"tips")

# Turn dist into a nodes (rows) x tips (columns) matrix
dist <- dist[node.numbers, 1:nTips]
Expand Down
74 changes: 38 additions & 36 deletions R/general_utils.R
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@

# Generally Helpful for Validation ----------------------------------------

# check for zeros and throw error if present
# target is just a name for the warning message
check.zeroes <- function(x, target){
if (any(x==0)){
warning(paste(target, "should not contain zeroes"))
}
if (any(x==0)){
warning(paste(target, "should not contain zeroes"))
}
}

# convert vector to row vector.
vec_to_mat <- function(x){
if (is.vector(x)) {
n <- names(x)
x <- matrix(x, nrow = 1)
colnames(x) <- n
}
x
if (is.vector(x)) {
n <- names(x)
x <- matrix(x, nrow = 1)
colnames(x) <- n
}
x
}

# ACCESSOR FUNCTIONS FROM NAMES TO NUMBERS --------------------------------
Expand All @@ -41,30 +40,33 @@ NULL
#' @rdname name_nodenumber_conversion
#' @export
nn.to.name <- function(tr, x){
if(!is.numeric(x)) stop('node numbers must be numeric')
labels <- c(tr$tip.label, tr$node.label)
labels[x]
if(!is.numeric(x)) stop('node numbers must be numeric')
labels <- c(tr$tip.label, tr$node.label)
labels[x]
}

#' @rdname name_nodenumber_conversion
#' @export
name.to.nn <- function(tr, x){
if(!is.character(x)) stop('node/tip names (x) should be a character vector')
labels <- c(tr$tip.label, tr$node.label)
match(x, labels)
if(!is.character(x)) stop('node/tip names (x) should be a character vector')
labels <- c(tr$tip.label, tr$node.label)
match(x, labels)
}


# Generally Helpful for Plotting ------------------------------------------

#' Converts wide format ILR transformed data to long format
#'
#' Converts wide format ILR transformed data (see \code{\link{philr}}) to long format
#' useful in various plotting functions where long format data is required.
#' Converts wide format ILR transformed data (see \code{\link{philr}}) to
#' long format useful in various plotting functions where long format data is
#' required.
#'
#' @param df PhILR transformed data in wide format (samples by balances) (see \code{\link{philr}})
#' @param labels vector (of length \code{nrow(df)}) with labels to group samples by
#' @return \code{df} in long format with columns
#' @param x PhILR transformed data in wide format (samples by balances)
#' (see \code{\link{philr}})
#' @param labels vector (of length \code{nrow(x)}) with labels to group
#' samples by
#' @return \code{x} in long format with columns
#' \itemize{
#' \item sample
#' \item labels
Expand All @@ -75,20 +77,20 @@ name.to.nn <- function(tr, x){
#' @importFrom tidyr gather_
#' @examples
#' tr <- named_rtree(5)
#' df <- t(rmultinom(10,100,c(.1,.6,.2,.3,.2))) + 0.65 # add a small pseudocount
#' colnames(df) <- tr$tip.label
#' x <- t(rmultinom(10,100,c(.1,.6,.2,.3,.2))) + 0.65 # add small pseudocount
#' colnames(x) <- tr$tip.label
#'
#' df.philr <- philr(df, tr, part.weights='enorm.x.gm.counts',
#' ilr.weights='blw.sqrt', return.all=FALSE)
#' convert_to_long(df.philr, rep(c('a','b'), 5))
convert_to_long <- function(df, labels){
coord.names <- colnames(df)
df.long <- as.data.frame(df)
df.long$sample <- rownames(df.long)
df.long$labels <- labels
df.long <- gather_(df.long, "coord", "value",
setdiff(names(df.long), c("labels", "sample")))
return(df.long)
#' x.philr <- philr(x, tree=tr, part.weights='enorm.x.gm.counts',
#' ilr.weights='blw.sqrt', return.all=FALSE)
#' convert_to_long(x.philr, rep(c('a','b'), 5))
convert_to_long <- function(x, labels){
coord.names <- colnames(x)
x.long <- as.data.frame(x)
x.long$sample <- rownames(x.long)
x.long$labels <- labels
x.long <- gather_(x.long, "coord", "value",
setdiff(names(x.long), c("labels", "sample")))
return(x.long)
}


Expand All @@ -107,6 +109,6 @@ convert_to_long <- function(df, labels){
#' @examples
#' named_rtree(5)
named_rtree <- function(n){
tr <- rtree(n)
makeNodeLabel(tr, 'number', 'n')
tr <- rtree(n)
makeNodeLabel(tr, 'number', 'n')
}
Loading