-
Notifications
You must be signed in to change notification settings - Fork 0
Created merged matrices of gene raw counts for data release #389
Comments
@jharenza where we can find the gtex RNAseq counts file? also the tcga one? I don't think we have those. |
@afarrel can you inform please? Perhaps gtex (v8 we are using) counts can be gencode raw from here: https://gtexportal.org/home/datasets For TCGA, you will have to get from GDC (maybe the ones we previously released are raw counts and not expected, but I'm not sure). I believe you had queried the portal. Maybe @yuankunzhu can help with this. It's possible that only PBTA+GMKF were using expected but we want to be sure all datasets are using raw. Thanks! |
@jharenza I have prepared the cwl tool, and once gtex and tcga data are ready I can start the process. One thing, please clarify which count value do you want to keep in the merged matrices. As explained on the STAR manual. STAR outputs read counts per gene into
|
Hi @HuangXiaoyan0106. I suspect we will need to use a custom column based on the library types, of which we have:
@yuankunzhu or @zhangb1 or @afarrel can you inform which columns will go with which library types? |
Only the poly-A samples are unstranded... using column 2... @jharenza why the poly-A has 27900 sample? Others are rf-stranded... which need to use the column 4 |
Ha! TCGA + GTEX + TARGET, I believe! |
@HuangXiaoyan0106 can you set a rule, if |
Sure, I can set it, but I need a manifest that includes all related sample_id and RNA_library info. |
You can use the v11 histologies file for this @HuangXiaoyan0106 https://cavatica.sbgenomics.com/u/cavatica/opentarget/files/62cc6541baf2a418322dd179/ |
@yuankunzhu can you inform / gather a link for @zhangb1 to generate a raw count matrix for TCGA? |
not able to find raw counts from GDC, @chinwallaa will also look and explore GDC for this. |
@yuankunzhu @zhangb1 Are these the star-count (raw count - Gene Expression Quantification) data from GDC that we were looking to download ? |
matrices need to be created and subsettted (waiting for new matrrix generation) to other expression files for v12 (genecode 36) folder in s3 bucket. Mark as blocked until PBTA X01 release. |
@chinwallaa , the files I download for v11 is |
What data file(s) does this issue pertain to?
We currently only have gene expected counts and not raw counts in the data releases.
What release are you using?
v12
Put your question or report your issue here.
Create:
cc @afarrel in case I am missing anything
The text was updated successfully, but these errors were encountered: