-
Notifications
You must be signed in to change notification settings - Fork 0
Generate merged RNA isoform files #341
Comments
@zhangb1 can someone from your team do this merge today? I also haven't checked the files, but presumably this merge file would have the ENSEMBL transcript ids as well as the gene id, correct? |
@jharenza , I just came back from vacation today.... I will manager that. won't be finished by today though. |
Ok. Can it be done by monday? Also, do you know if the merges are a "set it and forget it" type of thing at this point (eg the files aren't needing to be downloaded)? If still downloading, maybe we can brainstorm to make this more real-time push of a button? Cc @yuankunzhu |
@jharenza , we used the code from github repo to merge the RSEM gene files: Also we made the cwl tool based on that to merge the monthly files : https://cavatica.sbgenomics.com/u/d3b-bixu-ops/tumor-broad-monthly-release/apps/#d3b-bixu-ops/tumor-broad-monthly-release/merge-rsem-gene/0 Do you think we can have the R script somewhere for merging the isoform? so we can make the cwl tool to merge monthly like RSEM gene files? |
We don't have one for isoform. In the past, for openPBTA, I think bix eng had done a merge. We had not included that yet for OpenPedCan, but would like to now. |
@zhangb1 maybe @kgaonkar6 had done this in the past- I found this code: https://github.com/d3b-center/D3b-codes/blob/master/OpenPBTA_v20_release_QC/code/00-create-and-add-rsem-isoforms-files.R But this also entails a download, which I think we want to get away from so this can be more automated. |
@HuangXiaoyan0106 ^ maybe you can check the R script Jo Lynne post above? Seems very similar to the rsem gene one.. |
@zhangb1 @jharenza |
Thank you @HuangXiaoyan0106 !! @runjin326 can you check this file and add to s3 please |
@afarrel can you inform please |
TPM |
@jharenza - TPM file is uploaded + md5sum updated. |
@HuangXiaoyan0106 can you update your code to only add ENST id to the # split gene id and symbol
isoforms <- isoforms %>%
mutate(transcript_id = str_replace(transcript_id, "_PAR_Y_", "_")) %>%
separate(transcript_id, c("transcript_id", "gene_symbol"), sep = "\\_", extra = "merge") %>%
unique() and then we would remove the gene symbol column completely. Can you update this matrix when you have a chance please? cc @zhangb1 @afarrel @sangeetashukla |
@jharenza I only merged them with
And here is my script:
|
Ah, then we would like to remove gene symbols. Can you use the above code to remove those and only keep ENST ids please? |
Ah? The merged isoform file doesn't contain the gene symbols, the format is as follows.
|
Can there be a separate column for gene symbols? |
@HuangXiaoyan0106 please hold on this until we regroup |
@HuangXiaoyan0106 will you please add the following to your code, which will split ENSEMBL trancript ids (ENST ids) and the Hugo Gene Symbols) and then also de-duplicate those in pseudoautosomal regions (PAR labels). # split gene id and symbol
merged_isoform <- merged_isoform %>%
mutate(transcript_id = str_replace(transcript_id, "_PAR_Y_", "_")) %>%
separate(transcript_id, c("transcript_id", "gene_symbol"), sep = "\\_", extra = "merge") %>%
unique() Also, if it is easier, we only need the tpm file. Thank you |
@jharenza The task has been completed as per your request. |
@jharenza - I checked the file, it is as expected with transcript and gene separated. Also |
Closed with d3b-center/OpenPedCan-analysis#188 |
What data file(s) does this issue pertain to?
Generate a merged RNA isoform tile for v11 data release
What release are you using?
v11
Put your question or report your issue here.
Please merge all the RNA isoform files to one (maybe call it
rna-isoform.tsv
?) and add to v11.The RNA BS ids are here.
v11_rna_bsids.txt
The text was updated successfully, but these errors were encountered: