single-cell dataset (not 10x) running issue with genome version hg38 #75

lifan18 · 2022-08-03T04:50:19Z

Dear Dr. Akdess

CaSpER is a very wonderful tool to discovery CNVs in single cell dataset. I am following your example (https://rpubs.com/akdes/673120) to run it now.

My dataset is a single cell sequencing of human brain (not 10X) and I use hg38 as a reference.

However, there are some errors I cannot get through.

annotation part
annotation <- generateAnnotation(id_type="hgnc_symbol", genes=genes, centromere=centromere, ishg19 = T)
I used ishg19 = F and I don't know it should be F or not as I use hg38.
I generated the centromere information as your example curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz" | gunzip -c | grep acen > centromere.txt , but it seems cannot input as centromere="centromere.txt".
readBAFExtractOutput

> loh <- readBAFExtractOutput ( path="w1.baf", sequencing.type="single-cell")
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open file 'w1.baf/NA': Not a directory

It seems the baf file path should be a directory, however, although I make a directory for the BAFExtract output file (only one file, not files), it still cannot read the file.

Another question is how to generate loh.name.mapping file.

Sorry for my long questions. Hope you can help me.

Thank you very much!!!

The text was updated successfully, but these errors were encountered:

lifan18 · 2022-08-03T07:08:26Z

Hi,

Here is update. My question 2 is solved by #62 with the solution:
fill it with the path of the folder where you have your file instead of the name of the file. At the same time your file need to be a ".snp" file not an "_baf", change both.

I still don't understand which file is loh.name.mapping file and how to input centromere file.

Thank you very much!

lifan18 · 2022-08-03T08:04:13Z

Dear Dr. Akdess,

One more question, how to generate control.sample.ids file? There is no comment on this file in CaSpER documentation.

object <- CreateCasperObject(raw.data=data,loh.name.mapping=loh.name.mapping, sequencing.type="bulk", 
  cnv.scale=3, loh.scale=3, matrix.type="normalized", expr.cutoff=4.5,
  annotation=annotation, method="iterative", loh=loh, filter="median",  
  control.sample.ids=control.sample.ids, cytoband=cytoband)

Error in new(Class = "casper", raw.data = raw.data, loh = loh, annotation = annotation,  :
  argument "control.sample.ids" is missing, with no default

Thank you!

lifan18 · 2022-08-30T09:24:43Z

All issues are solved now. Annotation file can be generated in local queue. Control.sample.id can be assigned by in a data.frame list and can be customized.

lifan18 · 2022-08-30T09:52:11Z

BTW, I would like to share a reminder learning from my bug.

When I ran samples with the control and other clusters I assigned, there was an error like Performing HMM segmentation... Processing cnv.scale:1 loh.scale:1... Error in value[[jvseq[[jjj]]]] : subscript out of bounds Calls: runCaSpER ... calculateLOHShiftsForEachSegment -> [<- -> [<-.data.frame

I checked all input files and found it caused by one missing parameter, names(loh).

Although, I have correct loh.name.mapping file. The extra code line, names(loh) <- gsub(".snp", "", names(loh)), is still needed for the next processes. Otherwise your names of loh in casper will have suffix .snp and it is unacceptable by casper.

Hope this will help the next ;)

Best,

Fan

44REAM · 2022-12-30T13:07:30Z

@lifan18 Hi I have a same question on annotation part. Should I use F for ishg19 in generateAnnotation or not?
Thank you

lifan18 closed this as completed Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

single-cell dataset (not 10x) running issue with genome version hg38 #75

single-cell dataset (not 10x) running issue with genome version hg38 #75

lifan18 commented Aug 3, 2022

lifan18 commented Aug 3, 2022

lifan18 commented Aug 3, 2022

lifan18 commented Aug 30, 2022

lifan18 commented Aug 30, 2022

44REAM commented Dec 30, 2022

single-cell dataset (not 10x) running issue with genome version hg38 #75

single-cell dataset (not 10x) running issue with genome version hg38 #75

Comments

lifan18 commented Aug 3, 2022

lifan18 commented Aug 3, 2022

lifan18 commented Aug 3, 2022

lifan18 commented Aug 30, 2022

lifan18 commented Aug 30, 2022

44REAM commented Dec 30, 2022