Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed path in qc script, removed unwanted symbols from ensg mapping file #146

Merged
merged 3 commits into from
Oct 22, 2021

Conversation

sangeetashukla
Copy link
Collaborator

Purpose/implementation Section

Updated gene_match module to reflect correct path for the data files, and fix the ensg-hugo-pmtl-mapping.tsv file to filter out ensg_id = Symbol_Not_Found and gene_symbol = NA

What was your approach?

  • Remove version (v10) specification from the qc_ensg_hugo_pmtl_mapping.Rmd script for reading in data files
  • Add a filter to remove unwanted ensg_id and gene_symbol before writing to ensg-hugo-pmtl-mapping.tsv in the add_pmtl_ens_hugo.Rmd script

What GitHub issue does your pull request address?

Issue 231

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Please test if the scripts run as expected with symlink set up to read data files.

Is there anything that you want to discuss further?

No

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

ensg-hugo-pmtl-mapping.tsv

What is your summary of the results?

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

Copy link

@ewafula ewafula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sangeetashukla, please change to the OpenPedCan-analysis/data directory to be relative to the qc_ensg_hugo_pmtl_mapping.Rmd location as I have shown in the code comment. Not everyone running the module will have the OpenPedCan-analysis local cloned repo in their home directory (~/). Otherwise, the results are ok. The Symbol_Not_Found and NA values in gene_symbol and ensg_id column respectively have been filtered out of the results/ensg-hugo-pmtl-mapping.tsv file.

Quitting from lines 17-46 (qc_ensg_hugo_pmtl_mapping.Rmd)
Error: ‘~/OpenPedCan-analysis/data//snv-consensus-plus-hotspots.maf.tsv.gz’ does not exist.

Execution halted

@@ -14,7 +14,7 @@ suppressPackageStartupMessages({
## Read ensg-hugo-pmtl-mapping.tsv and OpenPedCan SNV, CNV, TPM, and fusion data

```{r read_open_ped_can_data}
open_ped_can_data_path <- '~/OpenPedCan-analysis/data/v10'
open_ped_can_data_path <- '~/OpenPedCan-analysis/data/'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:
open_ped_can_data_path <- '../../data'

@sangeetashukla
Copy link
Collaborator Author

@sangeetashukla, please change to the OpenPedCan-analysis/data directory to be relative to the qc_ensg_hugo_pmtl_mapping.Rmd location as I have shown in the code comment. Not everyone running the module will have the OpenPedCan-analysis local cloned repo in their home directory (~/). Otherwise, the results are ok. The Symbol_Not_Found and NA values in gene_symbol and ensg_id column respectively have been filtered out of the results/ensg-hugo-pmtl-mapping.tsv file.

Quitting from lines 17-46 (qc_ensg_hugo_pmtl_mapping.Rmd)
Error: ‘~/OpenPedCan-analysis/data//snv-consensus-plus-hotspots.maf.tsv.gz’ does not exist.

Execution halted

Thank you @ewafula for catching that. I have updated the file path appropriately and tested the script.

@@ -14,7 +14,7 @@ suppressPackageStartupMessages({
## Read ensg-hugo-pmtl-mapping.tsv and OpenPedCan SNV, CNV, TPM, and fusion data

```{r read_open_ped_can_data}
open_ped_can_data_path <- '~/OpenPedCan-analysis/data/'
open_ped_can_data_path <- '../../data/'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sangeetashukla, trailing backslash is unnecessary. R will add when joining to create a path. Will still work but you will have double backslashes as shown in the previous error message that I posted.
set to this: open_ped_can_data_path <- '../../data'

Copy link

@runjin326 runjin326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sangeetashukla for preparing this! Everything looks good to me now (thanks @ewafula for the detailed review).
@sangeetashukla, could you please do a dev merge and I think this should be ready for merge.

@runjin326 runjin326 merged commit 10fd9d5 into d3b-center:dev Oct 22, 2021
@sangeetashukla
Copy link
Collaborator Author

Thanks @sangeetashukla for preparing this! Everything looks good to me now (thanks @ewafula for the detailed review). @sangeetashukla, could you please do a dev merge and I think this should be ready for merge.

@runjin326 Done

@sangeetashukla sangeetashukla deleted the update-gene-match branch October 22, 2021 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants