Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Bulk RNASeq as reference #17

Open
Rohit-Satyam opened this issue Apr 14, 2022 · 0 comments
Open

Using Bulk RNASeq as reference #17

Rohit-Satyam opened this issue Apr 14, 2022 · 0 comments

Comments

@Rohit-Satyam
Copy link

Rohit-Satyam commented Apr 14, 2022

Hi Developers!!

I have Time-series bulk data (with two biological replicates for each time) and now I wish to use it to annotate my single-cell data. My Single-cell data contains two samples, One control and one drug-treated.

Your vignette is quite straight forward but can you tell me how I combine two biological replicates per time point into one column so that I have one column for each time point and use it for assigning labels. Will log normalizing both replicates and then take average per gene per time point work? Do you have any guidelines for using Bulk RNASeq datasets to label single-cell data? Can you check if the code I used below is valid?

Note: My cells are highly synchronized at 10 hours post-infection and I wish to confirm it by the time tag transfer using Bulk RNASeq.

library(Seurat)
library(dplyr)
c <- read.csv("counts.csv", header = T, row.names = 1)
ribo <- read.csv("ribosomal_genes.csv", header = F)
## removing ribo genes 225
c <- c[!rownames(c) %in% ribo$V1,]
> c %>% head()
T0_rep1 T0_rep2 T2_rep1 T2_rep2 T4_rep1 T4_rep2 T6_rep1
Gene1      46      48     202     298     357     428     125
Gene2     130     137     693     844    1713    1504     630
Gene3       0       0       0       0       1       0       0
Gene4       1       1       1       0       0       1       1
Gene5     412     524     994    1569    1959    2095     771
Gene6   16909   16966   16490   23660   30364   30484   10008

meta <- read.csv("meta.csv", header = T)
> meta %>% head()
Sample_id Prefix Replicate Time timetag
T0_rep1   T0_rep1     T0      rep1    4    4hpi
T0_rep2   T0_rep2     T0      rep2    4    4hpi
T2_rep1   T2_rep1     T2      rep1    6    6hpi
T2_rep2   T2_rep2     T2      rep2    6    6hpi
T4_rep1   T4_rep1     T4      rep1    8    8hpi
T4_rep2   T4_rep2     T4      rep2    8    8hpi

rownames(meta) <- meta$Sample_id
bulk <- CreateSeuratObject(counts = c, project = "Time-Series")
bulk$Sample <- "T-series"
bulk$batch <- "T-series"
bulk$batch2 <- "T-series"
bulk@meta.data <- cbind(bulk@meta.data,meta)
bulk <- NormalizeData(bulk)
prefix <- unique(meta$Prefix)
mat.temp <- data.frame(GetAssayData(bulk, slot="data"))
l <- list()
biorepavg <- lapply(1:length(prefix),function(x){
  l[[x]] <- mat.temp[,grep(prefix[x], colnames(mat.temp))] %>% rowMeans()
  
})

df <- data.frame(sapply(biorepavg,c))
colnames(df) <- prefix
newmeta <- meta[,c(2,4,5)] %>% unique() %>% `rownames<-`(.$Prefix)
> newmeta %>% head
Prefix Time timetag
T0      T0    4    4hpi
T2      T2    6    6hpi
T4      T4    8    8hpi
T6      T6   10   10hpi
T8      T8   12   12hpi
T10    T10   14   14hpi
reference <- SingleCellExperiment(list(counts=df),colData=DataFrame(celltypes =newmeta$timetag))

library(CHETAH)
input <- SingleCellExperiment(assays = list(counts = GetAssayData(p10,slot = "counts")),colData=DataFrame(p10@meta.data))
input <- CHETAHclassifier(input = input, ref_cells = reference)
> table(input$celltype_CHETAH)

  4hpi  Node1 Node14 Node15 Node16 Node18  Node2  Node3  Node4 
  1398      9      3      3     17      3     56   9113    336 
> classify_all = Classify(input, 0)
> table(classify_all$celltype_CHETAH)

10hpi 32hpi 42hpi  4hpi  8hpi 
 9458     3    17  1454     6 

The bulk reference correlation map is as follows:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant