Skip to content

Commit

Permalink
Merge pull request #144 from satijalab/matrix-update
Browse files Browse the repository at this point in the history
Updates to handle Matrix 1.4-2 related changes
  • Loading branch information
saketkc authored Aug 20, 2022
2 parents 7e9c955 + 3cbc836 commit 0b52965
Show file tree
Hide file tree
Showing 10 changed files with 75 additions and 67 deletions.
8 changes: 4 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: sctransform
Type: Package
Title: Variance Stabilizing Transformations for Single Cell UMI Data
Version: 0.3.3
Date: 2022-01-10
Version: 0.3.4
Date: 2022-08-19
Authors@R: c(
person(given = "Christoph", family = "Hafemeister", email = "[email protected]", role = "aut", comment = c(ORCID = "0000-0001-6365-8254")),
person(given = "Saket", family = "Choudhary", email = "[email protected]", role = c("aut", "cre"), comment = c(ORCID = "0000-0001-5202-7633")),
Expand All @@ -13,7 +13,7 @@ Description: A normalization method for single-cell UMI count data using a
negative binomial regression model with regularized parameters. As part of the
same regression framework, this package also provides functions for
batch correction, and data correction. See Hafemeister and Satija (2019)
<doi:10.1186/s13059-019-1874-1>, and Choudhary and Satija (2021) <doi:10.1101/2021.07.07.451498>
<doi:10.1186/s13059-019-1874-1>, and Choudhary and Satija (2022) <doi:10.1186/s13059-021-02584-9>
for more details.
URL: https://github.com/satijalab/sctransform
BugReports: https://github.com/satijalab/sctransform/issues
Expand Down Expand Up @@ -42,4 +42,4 @@ Suggests:
knitr
Enhances:
glmGamPoi
RoxygenNote: 7.1.2
RoxygenNote: 7.2.1
14 changes: 11 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,22 @@
# News
All notable changes will be documented in this file.

## [0.3.3] - UNRELEASED
## [0.3.4] - 2022-08-19

### Added
- Add `make.sparse` to handle `dgCMatrix` coercsions

### Fixed
- Convert bitwise operators to boolean operators in utils.cpp

## [0.3.3] - 2022-01-13

### Added
- `vst.flavor` argument to `vst()` to allow for invoking running updated regularization (sctransform v2, proposed in [Satija and Choudhary, 2021](https://doi.org/10.1101/2021.07.07.451498). See paper for details.
- `scale_factor` to `correct()` to allow for a custom library size when correcting counts


## [0.3.2.9008] - 2021-07-28
## [0.3.2] - 2021-07-28
### Added
- Add future.seed = TRUE to all `future_lapply()` calls

Expand All @@ -18,7 +26,7 @@ All notable changes will be documented in this file.
### Fixed
- Fix logical comparison of vectors of length one in `diff_mean_test()`

## [0.3.2.9003] - 2020-02-11
## [0.3.2] - 2020-02-11
### Added
- `compare` argument to the nonparametric differential expression test `diff_mean_test()` to allow for multiple comparisons and various ways to specify which groups to compare
- Input checking at various places in `vst()` and `diff_mean_test()`
Expand Down
2 changes: 1 addition & 1 deletion R/denoise.R
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ correct_counts <- function(x, umi, cell_attr = x$cell_attr, scale_factor = NA, v
y.res <- mu + pearson_residual * sqrt(variance)
y.res <- round(y.res, 0)
y.res[y.res < 0] <- 0
corrected_data[[length(corrected_data) + 1]] <- as(y.res, Class = 'dgCMatrix')
corrected_data[[length(corrected_data) + 1]] <- make.sparse(mat = y.res)
if (verbosity > 1) {
setTxtProgressBar(pb, i)
}
Expand Down
2 changes: 1 addition & 1 deletion R/generate.R
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,6 @@ generate <- function(vst_out, genes = rownames(vst_out$model_pars_fit),
x <- MASS::rnegbin(n = length(gene.mu), mu = gene.mu, theta = theta[gene])
return(x)
}))
x.sim <- as(x.sim, Class = 'dgCMatrix')
x.sim <- make.sparse(mat = x.sim)
return(x.sim)
}
9 changes: 9 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -524,3 +524,12 @@ get_nz_median2 <- function(umi){
return (median(umi@x))
}

#' Convert a given matrix to dgCMatrix
#'
#' @param mat Input matrix
#'
#' @return A dgCMatrix
make.sparse <- function(mat){
mat <- as(object = mat, Class = "Matrix")
return (as(object = as(object = as(object = mat, Class = "dMatrix"), Class = "generalMatrix"), Class = "CsparseMatrix"))
}
2 changes: 1 addition & 1 deletion R/vst.R
Original file line number Diff line number Diff line change
Expand Up @@ -432,7 +432,7 @@ vst <- function(umi,
} else {
rv$umi_corrected <- sctransform::correct(rv, do_round = TRUE, do_pos = TRUE, scale_factor = scale_factor,
verbosity = verbosity)
rv$umi_corrected <- as(object = rv$umi_corrected, Class = 'dgCMatrix')
rv$umi_corrected <- make.sparse(mat = rv$umi_corrected)
}
}

Expand Down
10 changes: 4 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# sctransform
## R package for normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression

The sctransform package was developed by Christoph Hafemeister in [Rahul Satija's lab](https://satijalab.org/) at the New York Genome Center and described in [Hafemeister and Satija, Genome Biology 2019](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1874-1). Recent updates are described in [(Choudhary and Satija, Genome Biology 2022)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02584-9). Core functionality of this package has been integrated into [Seurat](https://satijalab.org/seurat/), an R package designed for QC, analysis, and exploration of single cell RNA-seq data.
The sctransform package was developed by Christoph Hafemeister in [Rahul Satija's lab](https://satijalab.org/) at the New York Genome Center and described in [Hafemeister and Satija, Genome Biology 2019](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1874-1). Recent updates are described in [(Choudhary and Satija, Genome Biology, 2022)](https://doi.org/10.1186/s13059-021-02584-9).
Core functionality of this package has been integrated into [Seurat](https://satijalab.org/seurat/), an R package designed for QC, analysis, and exploration of single cell RNA-seq data.

## Quick start

Expand Down Expand Up @@ -43,15 +44,12 @@ Available vignettes:
- [Using sctransform in Seurat](https://htmlpreview.github.io/?https://github.com/satijalab/sctransform/blob/supp_html/supplement/seurat.html)
- [Examples of how to perform normalization, feature selection, integration, and differential expression with sctransform v2 regularization](https://satijalab.org/seurat/articles/sctransform_v2_vignette.html)

## Known Issues

* `node stack overflow` error when Rfast package is loaded. The Rfast package does not play nicely with the future.apply package. Try to avoid loading the Rfast package. See discussions: https://github.com/RfastOfficial/Rfast/issues/5 https://github.com/satijalab/sctransform/issues/108

Please use [the issue tracker](https://github.com/satijalab/sctransform/issues) if you encounter a problem

## References

- Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20, 296 (December 23, 2019). [https://doi.org/10.1186/s13059-019-1874-1](https://doi.org/10.1186/s13059-019-1874-1). An early version of this work was used in the paper [Developmental diversification of cortical inhibitory interneurons, Nature 555, 2018](https://github.com/ChristophH/in-lineage).
- Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biology 20, 296 (2019). [https://doi.org/10.1186/s13059-019-1874-1](https://doi.org/10.1186/s13059-019-1874-1). An early version of this work was used in the paper [Developmental diversification of cortical inhibitory interneurons, Nature 555, 2018](https://github.com/ChristophH/in-lineage).

- Choudhary, S. & Satija, R. Comparison and evaluation of statistical error models for scRNA-seq. Genome Biology (2022). [https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02584-9](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02584-9)
- Choudhary, S. & Satija, R. Comparison and evaluation of statistical error models for scRNA-seq. Genome Biology 23.1 (2022). [https://doi.org/10.1186/s13059-021-02584-9](https://doi.org/10.1186/s13059-021-02584-9)

26 changes: 1 addition & 25 deletions cran-comments.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,29 +11,5 @@

## R CMD check results

0 errors | 0 warnings | 2 notes
0 errors | 0 warnings | 0 notes

```
* checking CRAN incoming feasibility ... NOTE
```

There is change in maintaine status:
New maintainer:
Saket Choudhary <[email protected]>
Old maintainer(s):
Christoph Hafemeister <[email protected]>

```
* checking package dependencies ... NOTE
Package which this enhances but not available for checking: ‘glmGamPoi’
S
```
`glmGamPoi` is an entirely optional package that is not required for core functionality, but only needed for alternative/faster implementations of the methods in this package. It is only available on Bioconductor.

## Reverse dependencies

Tested using `revdepcheck::revdep_check`:
We checked 4 reverse dependencies (1 from CRAN + 3 from Bioconductor), comparing R CMD check results across CRAN and dev versions of this package.

* We saw 0 new problems
* We failed to check 0 packages
17 changes: 17 additions & 0 deletions man/make.sparse.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

52 changes: 26 additions & 26 deletions src/utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ NumericVector row_mean_dgcmatrix(S4 matrix) {
IntegerVector dim = matrix.slot("Dim");
int rows = dim[0];
int cols = dim[1];

NumericVector ret(rows, 0.0);
int x_length = x.length();
for (int k=0; k<x_length; ++k) {
Expand All @@ -34,7 +34,7 @@ NumericVector row_mean_dgcmatrix(S4 matrix) {
}

// [[Rcpp::export]]
NumericMatrix row_mean_grouped_dgcmatrix(S4 matrix, IntegerVector group,
NumericMatrix row_mean_grouped_dgcmatrix(S4 matrix, IntegerVector group,
bool shuffle) {
NumericVector x = matrix.slot("x");
IntegerVector i = matrix.slot("i");
Expand All @@ -47,12 +47,12 @@ NumericMatrix row_mean_grouped_dgcmatrix(S4 matrix, IntegerVector group,
NumericMatrix ret(rows, groups);
IntegerVector groupsize(groups, 0);
int x_length = x.length();

if (shuffle) {
group = clone(group);
std::random_shuffle(group.begin(), group.end(), randWrapper);
}

int col = 0;
for (int k=0; k<x_length; ++k) {
while (k>=p[col]) {
Expand All @@ -65,7 +65,7 @@ NumericMatrix row_mean_grouped_dgcmatrix(S4 matrix, IntegerVector group,
++col;
++groupsize[group[col-1]-1];
}

for (int j=0; j<groups; ++j) {
if (groupsize[j] == 0) {
ret(_, j) = rep(NumericVector::get_na(), rows);
Expand All @@ -88,12 +88,12 @@ NumericVector row_gmean_dgcmatrix(S4 matrix, double eps) {
IntegerVector dim = matrix.slot("Dim");
int rows = dim[0];
int cols = dim[1];

NumericVector ret(rows, 0.0);
IntegerVector nzero(rows, cols);
int x_length = x.length();
double log_eps = log(eps);

for (int k=0; k<x_length; ++k) {
ret[i[k]] += log(x[k] + eps);
nzero[i[k]] -= 1;
Expand All @@ -109,7 +109,7 @@ NumericVector row_gmean_dgcmatrix(S4 matrix, double eps) {
}

// [[Rcpp::export]]
NumericMatrix row_gmean_grouped_dgcmatrix(S4 matrix, IntegerVector group,
NumericMatrix row_gmean_grouped_dgcmatrix(S4 matrix, IntegerVector group,
double eps, bool shuffle) {
NumericVector x = matrix.slot("x");
IntegerVector i = matrix.slot("i");
Expand All @@ -124,7 +124,7 @@ NumericMatrix row_gmean_grouped_dgcmatrix(S4 matrix, IntegerVector group,
int x_length = x.length();
IntegerMatrix nonzero(rows, groups);
double log_eps = log(eps);

if (shuffle) {
group = clone(group);
std::random_shuffle(group.begin(), group.end(), randWrapper);
Expand Down Expand Up @@ -162,13 +162,13 @@ IntegerVector row_nonzero_count_dgcmatrix(S4 matrix) {
IntegerVector i = matrix.slot("i");
IntegerVector dim = matrix.slot("Dim");
int rows = dim[0];

IntegerVector ret(rows, 0);
int i_len = i.length();
for(int k = 0; k < i_len; ++k) {
ret[i[k]]++;
}

List dn = matrix.slot("Dimnames");
if (dn[0] != R_NilValue) {
ret.attr("names") = as<CharacterVector>(dn[0]);
Expand All @@ -186,15 +186,15 @@ IntegerMatrix row_nonzero_count_grouped_dgcmatrix(S4 matrix, IntegerVector group
CharacterVector levs = group.attr("levels");
int groups = levs.length();
IntegerMatrix ret(rows, groups);

int col = 0;
for (int k=0; k<i_length; ++k) {
while (k>=p[col]) {
++col;
}
ret(i[k], group[col-1]-1)++;
}

colnames(ret) = levs;
List dn = matrix.slot("Dimnames");
if (dn[0] != R_NilValue) {
Expand Down Expand Up @@ -234,12 +234,12 @@ NumericVector grouped_mean_diff_per_row(NumericMatrix x, IntegerVector group, bo
NumericMatrix tmp(2, nrows);
IntegerVector groupsize(2);
NumericVector ret(nrows, 0.0);

if (shuffle) {
group = clone(group);
std::random_shuffle(group.begin(), group.end(), randWrapper);
}

for (int i = 0; i < ncols; i++) {
++groupsize(group(i));
for (int j = 0; j < nrows; j++) {
Expand All @@ -252,7 +252,7 @@ NumericVector grouped_mean_diff_per_row(NumericMatrix x, IntegerVector group, bo
return ret;
}

// Bootstrapped mean
// Bootstrapped mean
// [[Rcpp::export]]
NumericVector mean_boot(NumericVector x, int N, int S) {
NumericVector ret(N);
Expand All @@ -270,7 +270,7 @@ NumericMatrix mean_boot_grouped(NumericVector x, IntegerVector group, int N, int
int groups = max(group) + 1;
// we need as many columns
NumericMatrix ret(N, groups);

for (int g = 0; g < groups; g++) {
NumericVector xg = x[group == g];
ret(_, g) = mean_boot(xg, N, S);
Expand Down Expand Up @@ -344,33 +344,33 @@ NumericVector distribution_shift(NumericMatrix x) {
// with kind permission from the authors.
// It has been slightly adopted for our use case here.
// [[Rcpp::export]]
List qpois_reg(NumericMatrix X, NumericVector Y, const double tol, const int maxiters,
List qpois_reg(NumericMatrix X, NumericVector Y, const double tol, const int maxiters,
const double minphi, const bool returnfit){
const unsigned int n=X.nrow(), pcols=X.ncol(), d=pcols;

arma::colvec b_old(d, arma::fill::zeros), b_new(d), L1(d), yhat(n), y(Y.begin(), n, false), m(n), phi(n);
arma::vec unique_vals;
arma::mat L2, x(X.begin(), n, pcols, false), x_tr(n, pcols);
double dif;

// Identify the intercept term(s) and initialize the coefficients
for(int i=0;i<pcols;++i){
unique_vals = arma::unique(x.unsafe_col(i));

if(unique_vals.n_elem==1){
b_old(i)=log(mean(y));
break;
}
if((unique_vals.n_elem==2) & ((unique_vals[0] == 0) | (unique_vals[1] == 0))){
if((unique_vals.n_elem==2) && ((unique_vals[0] == 0) || (unique_vals[1] == 0))){
b_old(i)=arma::as_scalar(y.t()*x.unsafe_col(i));
b_old(i)=b_old(i)/sum(x.unsafe_col(i));
b_old(i)=log(std::max(1e-9, b_old(i)));
}
}

x_tr=x.t();
int ij=2;

for(dif=1.0;dif>tol;){
yhat=x*b_old;
m=(exp(yhat));
Expand All @@ -387,14 +387,14 @@ List qpois_reg(NumericMatrix X, NumericVector Y, const double tol, const int max
double p=sum(arma::square(phi)/m)/(n-pcols);
NumericVector coefs = NumericVector(b_new.begin(), b_new.end());
coefs.names() = colnames(X);

List l;
l["coefficients"]=coefs;
l["phi"]=p;
l["theta.guesstimate"]=mean(m)/(std::max(p, minphi)-1);
if(returnfit){
l["fitted"]=NumericVector(m.begin(), m.end());
}

return l;
}

0 comments on commit 0b52965

Please sign in to comment.