Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v12 hgg subtyping (10/N) #331

Merged
merged 247 commits into from
Apr 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
247 commits
Select commit Hold shift + click to select a range
c17d939
update hgg for v12
Mar 2, 2023
45e77a9
update README
Mar 2, 2023
5fb55c2
update README
Mar 2, 2023
3dcaed5
v12 atrt subtyping
jharenza Mar 2, 2023
b6b2596
rework code to one specimen per line
jharenza Mar 2, 2023
499ac5b
update LGAT for v12
Mar 3, 2023
d101fca
update compile v12
jharenza Mar 3, 2023
c24794d
Merge branch 'v12-epn' into v12-path
jharenza Mar 3, 2023
df112dc
Merge remote-tracking branch 'origin/v12-lgg' into v12-path
jharenza Mar 3, 2023
46f6d6e
reslove repeated FGFR subtypes strings
Mar 3, 2023
04d0e8b
Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd
ewafula Mar 3, 2023
9e97d9e
run path subtyping
jharenza Mar 3, 2023
d9c92ac
regenerate results
Mar 4, 2023
33ebfee
update json terms, cell line composition, rerun
jharenza Mar 4, 2023
add4035
fix some typos, stopped at script 07
jharenza Mar 4, 2023
959f678
adding backticks to non-standard gene symbol variables
Mar 4, 2023
a0823d1
major overhaul, reorder subtyping
jharenza Mar 5, 2023
15a3c46
fix typos, rerun
jharenza Mar 5, 2023
794c876
update column names
jharenza Mar 5, 2023
107835c
fix missing samples missing due to not selecting na cols
jharenza Mar 5, 2023
7d8cce4
fix fusion summary path to data dir, not local!
jharenza Mar 5, 2023
1a7e1d9
rerun all
jharenza Mar 5, 2023
e4fa3c9
fix missing samples with a different tactic
jharenza Mar 6, 2023
ed29d0c
rerun
jharenza Mar 6, 2023
87a9ed4
add missing pt and sample ids to subtyping table
jharenza Mar 6, 2023
61ccf96
Merge branch 'v12-hgg' into v12-atrt
jharenza Mar 9, 2023
ed0d19d
Merge branch 'v12-atrt' into v12-lgg
jharenza Mar 9, 2023
5e89007
Merge branch 'v12-lgg' into v12-path
jharenza Mar 9, 2023
f8ae6b0
update path
jharenza Mar 9, 2023
b32ca3d
Merge branch 'v12-epn' into v12-path
jharenza Mar 9, 2023
ea9e26e
update files
jharenza Mar 9, 2023
e559e34
Merge branch 'v12-nbl' into v12-hgg
jharenza Mar 9, 2023
bffa90e
Merge branch 'v12-hgg' into v12-atrt
jharenza Mar 9, 2023
1bcc4e9
Merge branch 'v12-atrt' into v12-lgg
jharenza Mar 9, 2023
855c226
Merge branch 'v12-lgg' into v12-path
jharenza Mar 9, 2023
cfd6d3f
initial run of integrate
jharenza Mar 9, 2023
2bc0e4b
update wildtype of methly subtype exist
Mar 9, 2023
7d922e1
Merge branch 'v12-lgg' of github.com:PediatricOpenTargets/OpenPedCan-…
Mar 9, 2023
b674c54
update efo mondo file, few cancer groups
jharenza Mar 9, 2023
25c7d58
Merge remote-tracking branch 'origin/v12-nbl' into v12-hgg
Mar 19, 2023
37bffbb
rerun with updated v12 data release
Mar 19, 2023
e0e7aea
Merge remote-tracking branch 'origin/v12-hgg' into v12-atrt
Mar 19, 2023
162d6d0
rerun with updated v12 data release
Mar 19, 2023
e603e14
Merge remote-tracking branch 'origin/v12-atrt' into v12-lgg
Mar 19, 2023
1359366
rerun with updated v12 data release
Mar 19, 2023
b86c852
Merge remote-tracking branch 'origin/v12-lgg' into v12-path
Mar 20, 2023
20fddf8
rerun with updated v12 data release
Mar 20, 2023
8708691
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Mar 20, 2023
d2d7359
rerun with updated v12 release
Mar 20, 2023
698498f
rerun with v12 ef-mondo-ncit map
Mar 22, 2023
d2b69e6
v12 independent sample lists
Mar 22, 2023
15deb37
use match id, rerun, update 3 samples with discordant subtypes due to…
jharenza Apr 13, 2023
b5e7b8c
update patient with multiple subtypes
Apr 13, 2023
e0afc4f
merge v12-nbl
zzgeng Apr 13, 2023
84b1d00
re-run the module
zzgeng Apr 13, 2023
2073db6
merge v12-hgg
zzgeng Apr 13, 2023
a1eb5fb
re-run the module
zzgeng Apr 13, 2023
004d789
merge v12-atrt
zzgeng Apr 13, 2023
71d40a4
merge v12-lgg
zzgeng Apr 13, 2023
baaa3bf
re-run the module
zzgeng Apr 13, 2023
b6cd446
merge v12-path
zzgeng Apr 13, 2023
9983ca9
re-run the module
zzgeng Apr 13, 2023
d631498
merge v12-integrate
zzgeng Apr 13, 2023
6dd8b3d
re-run the module
zzgeng Apr 13, 2023
6b9ec8b
update Other tumors (all histiocytic JXG)
jharenza Apr 13, 2023
6955143
update SEGA subtype SEGA, To be classified
Apr 13, 2023
74666fb
Merge remote-tracking branch 'origin/v12-lgg' into v12-path
jharenza Apr 13, 2023
a9f7a7b
rerun
jharenza Apr 13, 2023
5a55eed
Merge branch 'v12-path' into v12-integrate
jharenza Apr 13, 2023
53c2678
rerun module
jharenza Apr 13, 2023
b9f5e3a
add missing methyl samples, rerun
jharenza Apr 14, 2023
ea37734
Merge branch 'v12-hgg' into v12-atrt
jharenza Apr 14, 2023
eb1f2eb
Merge branch 'v12-atrt' into v12-path
jharenza Apr 14, 2023
f3f325c
rerun with hgg changes
jharenza Apr 14, 2023
f72feed
Merge branch 'v12-path' into v12-integrate
jharenza Apr 14, 2023
bc46a14
rerun with hgg changes
jharenza Apr 14, 2023
7643718
get rid of discrepancies, rerun
jharenza Apr 14, 2023
b8ac633
Merge branch 'v12-hgg' into v12-atrt
jharenza Apr 14, 2023
dcab142
Merge branch 'v12-atrt' into v12-path
jharenza Apr 14, 2023
5f44f10
rerun
jharenza Apr 14, 2023
a3e6867
Merge branch 'v12-path' into v12-integrate
jharenza Apr 14, 2023
045c297
add mol subtype methyl in compiled file
jharenza Apr 14, 2023
893ad37
Merge branch 'v12-path' into v12-integrate
jharenza Apr 14, 2023
21c2d83
include non-subtyped LGG methyl
Apr 14, 2023
bae82b3
include non-subtyped LGG methyl
Apr 14, 2023
c5b19ce
update 00-ATRT_subtyping.R
zzgeng Apr 17, 2023
6e44949
merge v12-nbl
zzgeng Apr 17, 2023
f52958a
re-run the module
zzgeng Apr 17, 2023
0a22e5a
merge v12-hgg
zzgeng Apr 17, 2023
df658f4
update scritp and re-run
zzgeng Apr 17, 2023
897a3c5
merge v12-atrt
zzgeng Apr 17, 2023
78499f3
merge v12-lgg
zzgeng Apr 17, 2023
a0efc37
re-run the module
zzgeng Apr 17, 2023
104428d
merge v12-path
zzgeng Apr 17, 2023
133db74
re-run the module
zzgeng Apr 17, 2023
9c7f7ce
merge v12-integrate
zzgeng Apr 17, 2023
2d4bc26
re-run the module
zzgeng Apr 17, 2023
f0c84a1
update subtyping script to remove duplicates
zzgeng Apr 18, 2023
c9c62a4
Merge remote-tracking branch 'origin/v12-nbl' into v12-hgg
Apr 18, 2023
aa9d9ea
PR stacking rerun
Apr 18, 2023
ee0a633
Merge remote-tracking branch 'origin/v12-hgg' into v12-atrt
Apr 18, 2023
0102458
Merge remote-tracking branch 'origin/v12-atrt' into v12-lgg
Apr 18, 2023
5e5b74c
PR stacking reruns
Apr 18, 2023
8e937aa
Merge remote-tracking branch 'origin/v12-lgg' into v12-path
Apr 18, 2023
7561d51
PR stacking reruns
Apr 18, 2023
7c18fcd
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 18, 2023
878eb6f
PR stacking reruns
Apr 18, 2023
f28b36c
update the subtype script
zzgeng Apr 18, 2023
51e6627
merge v12-atrt
zzgeng Apr 18, 2023
16bf0e9
Merge remote-tracking branch 'origin/v12-nbl' into v12-hgg
Apr 18, 2023
eace7bf
Merge remote-tracking branch 'origin/v12-hgg' into v12-atrt
Apr 18, 2023
1277a5f
Merge branch 'v12-atrt' of github.com:PediatricOpenTargets/OpenPedCan…
Apr 18, 2023
a4d05db
Merge remote-tracking branch 'origin/v12-atrt' into v12-lgg
Apr 18, 2023
558bc16
set unmatched low confidence methyl to NA
Apr 18, 2023
1eeb01c
Merge remote-tracking branch 'origin/v12-lgg' into v12-path
Apr 18, 2023
4924a43
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 18, 2023
5e38dae
set unmatched low confidence methyl to NA
Apr 18, 2023
189e582
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 18, 2023
6030c01
set unmatched low confidence methyl to NA
Apr 18, 2023
2bd1257
resolve methyl duplicates
Apr 18, 2023
2aaaf59
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 18, 2023
6cb6880
resolve methyl duplicates
Apr 18, 2023
e6bb01e
add GNT NOS to path dx for gnts
jharenza Apr 18, 2023
ea012d6
add GNT NOS to path subtypes --> GNT in subtype
jharenza Apr 19, 2023
18cc101
Merge branch 'v12-lgg' into v12-path
jharenza Apr 19, 2023
0ae8846
remove NOS from tumors that are have subtypes!
jharenza Apr 19, 2023
0dbd494
get rid of dnets and fix methyl blank subtypes
jharenza Apr 19, 2023
1d40b28
Merge branch 'v12-lgg' into v12-path
jharenza Apr 19, 2023
df7e2ba
rerun
jharenza Apr 19, 2023
ba5178e
Merge branch 'v12-path' into v12-integrate
jharenza Apr 19, 2023
72b5efd
rerun
jharenza Apr 19, 2023
664d963
remove duplicates
jharenza Apr 19, 2023
da3dfef
Merge branch 'v12-path' into v12-integrate
jharenza Apr 19, 2023
c5d78b2
remove discrepancies
jharenza Apr 19, 2023
1a9484e
bring in DGD data
Apr 19, 2023
7bfb3ed
Merge remote-tracking branch 'origin/v12-lgg' into v12-path
Apr 19, 2023
7eeb8d2
rerun
Apr 19, 2023
d23e4fe
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 19, 2023
7ad8512
rerun
Apr 19, 2023
327beae
Merge remote-tracking branch 'origin/v12-emb' into v12-path
Apr 19, 2023
c4b8caf
rerun
Apr 19, 2023
a437e72
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 19, 2023
1b6185c
rerun
Apr 19, 2023
781c413
rerun without DGD samples
jharenza Apr 20, 2023
d3ced26
Merge branch 'v12-lgg' into v12-path
jharenza Apr 20, 2023
8f5d38a
rerun without DGD samples
jharenza Apr 20, 2023
e72c887
rerun without DGD samples
jharenza Apr 20, 2023
f930a1e
Merge branch 'v12-path' into v12-integrate
jharenza Apr 20, 2023
f3ecc0d
rerun without DGD samples
jharenza Apr 20, 2023
bde5938
rework LGG using match id
jharenza Apr 20, 2023
d9c3731
Merge branch 'v12-lgg' into v12-path
jharenza Apr 20, 2023
baa29b2
rerun with updated LGG
jharenza Apr 20, 2023
736bd6d
Merge branch 'v12-path' into v12-integrate
jharenza Apr 20, 2023
6bcb43a
rerun with LGG update
jharenza Apr 20, 2023
0054b2b
exclude DGD SNV MAF
Apr 20, 2023
0974b0c
rerun with new base histologies
jharenza Apr 21, 2023
dcf3d10
Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd
jharenza Apr 21, 2023
2715c6a
remove commented code
jharenza Apr 21, 2023
defa57e
merge v12-hgg
jharenza Apr 21, 2023
4f9b133
rerun without dgd
jharenza Apr 21, 2023
6a2e924
Merge branch 'v12-atrt' into v12-lgg
jharenza Apr 21, 2023
7d5b198
Merge branch 'v12-lgg' into v12-path
jharenza Apr 21, 2023
7c6e013
rerun
jharenza Apr 21, 2023
dcd606d
Merge branch 'v12-path' into v12-integrate
jharenza Apr 21, 2023
bb330e2
rerun
jharenza Apr 21, 2023
025a44c
update cancer group
jharenza Apr 21, 2023
bdc40a1
Merge remote-tracking branch 'origin/v12-emb' into v12-path
Apr 21, 2023
432a825
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 21, 2023
a2f170b
Merge remote-tracking branch 'origin/v12-nbl' into v12-hgg
Apr 25, 2023
f722faa
rerun with updated pre-release data
Apr 25, 2023
50a5a5e
Merge remote-tracking branch 'origin/v12-hgg' into v12-atrt
Apr 25, 2023
5a7665a
Merge remote-tracking branch 'origin/v12-atrt' into v12-lgg
Apr 25, 2023
f5721c8
rerun with updated pre-release data
Apr 25, 2023
a874a90
Merge remote-tracking branch 'origin/v12-lgg' into v12-path
Apr 25, 2023
8b0f1a9
rerun with updated pre-release data
Apr 25, 2023
4451f2e
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 25, 2023
a724b1e
rerun with updated pre-release data
Apr 25, 2023
6ed4494
Merge remote-tracking branch 'origin/v12-integrate' into v12-independ…
Apr 25, 2023
1f99842
rerun with updated final histologies
Apr 25, 2023
603963f
update to compile new epn subtyping file
Apr 25, 2023
3c00345
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 25, 2023
3de4f91
rerun with new compiled epn subtyping file
Apr 25, 2023
fb31455
update path dx selection
jharenza Apr 25, 2023
0c70072
Merge branch 'v12-epn' into v12-path
jharenza Apr 25, 2023
1af6af6
rerun with epn update
jharenza Apr 25, 2023
52f048b
Merge branch 'v12-path' into v12-integrate
jharenza Apr 25, 2023
c0e76fa
rerun hist with epn update
jharenza Apr 25, 2023
5dda43b
fix path dx and free text strings, rerun
jharenza Apr 25, 2023
5c7267a
Merge branch 'v12-hgg' into v12-path
jharenza Apr 25, 2023
7ff4ab0
rerun with hgg updates for path dx subset
jharenza Apr 25, 2023
9e56d2d
Merge branch 'v12-path' into v12-integrate
jharenza Apr 25, 2023
27aea21
rerun with latest hgg
jharenza Apr 25, 2023
7927e7e
also fix script 02!
jharenza Apr 25, 2023
7c7e04f
Merge branch 'v12-hgg' into v12-path
jharenza Apr 25, 2023
afe4e23
rerun with hgg update
jharenza Apr 25, 2023
e4e6006
Merge branch 'v12-path' into v12-integrate
jharenza Apr 25, 2023
f56cdee
update HGG cancer groups
jharenza Apr 25, 2023
dd027e5
rerun with new JSON terms
jharenza Apr 25, 2023
7cd69b5
Merge branch 'v12-emb' into v12-path
jharenza Apr 25, 2023
33a8235
get those last few embryonal stragglers!
jharenza Apr 25, 2023
bca2c46
Merge branch 'v12-emb' into v12-path
jharenza Apr 25, 2023
25a94a1
rerun with updated emb
jharenza Apr 25, 2023
1ac97f0
rerun
jharenza Apr 25, 2023
1c76d6e
Merge branch 'v12-path' into v12-integrate
jharenza Apr 25, 2023
3e6903f
rerun with updated emb
jharenza Apr 25, 2023
3cfd353
one more mixed path dx update in JSON
jharenza Apr 25, 2023
09da6fd
Merge branch 'v12-emb' into v12-path
jharenza Apr 25, 2023
89a4f73
rerun with emb
jharenza Apr 25, 2023
b2c2cb4
Merge branch 'v12-path' into v12-integrate
jharenza Apr 25, 2023
fa13d4c
rerun
jharenza Apr 25, 2023
d7d1cc4
exclude large input subset files
Apr 25, 2023
0463a72
rerun with updated pre-release
Apr 25, 2023
eafe189
Merge remote-tracking branch 'origin/v12-epn' into v12-cranio
Apr 26, 2023
565affd
Merge remote-tracking branch 'origin/v12-cranio' into v12-chordoma
Apr 26, 2023
c51c742
Merge remote-tracking branch 'origin/v12-chordoma' into v12-ews
Apr 26, 2023
54a311c
rerun with update pre-release data
Apr 26, 2023
8418fd5
Merge remote-tracking branch 'origin/v12-epn' into v12-cranio
Apr 26, 2023
ff68f69
Merge remote-tracking branch 'origin/v12-cranio' into v12-chordoma
Apr 26, 2023
b6b4af8
Merge remote-tracking branch 'origin/v12-chordoma' into v12-ews
Apr 26, 2023
3dd592f
Merge remote-tracking branch 'origin/v12-ews' into v12-emb
Apr 26, 2023
cf1f91b
Merge remote-tracking branch 'origin/v12-emb' into v12-neuro
Apr 26, 2023
91329ac
Merge remote-tracking branch 'origin/v12-neuro' into v12-hgg
Apr 26, 2023
590bc5e
Merge branch 'v12-hgg' of github.com:PediatricOpenTargets/OpenPedCan-…
Apr 26, 2023
183fa5d
rerun with updated pre-release data
Apr 26, 2023
6d0524c
Merge remote-tracking branch 'origin/v12-hgg' into v12-atrt
Apr 26, 2023
7fb3d1b
Merge remote-tracking branch 'origin/v12-atrt' into v12-lgg
Apr 26, 2023
a5bb66f
Merge remote-tracking branch 'origin/v12-lgg' into v12-path
Apr 26, 2023
71e456c
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 26, 2023
c14d5c3
update samples 7316-6047
Apr 26, 2023
2fa7968
update samples 7316-6047
Apr 26, 2023
7fb9c92
update samples 7316-6047
Apr 26, 2023
2b0db64
Merge remote-tracking branch 'origin/v12-path' into v12-integrate
Apr 26, 2023
5ee4b93
update samples 7316-6047
Apr 26, 2023
aa08934
Merge remote-tracking branch 'origin/v12-integrate' into v12-independ…
Apr 26, 2023
64d20a3
rerun with updates final histologies
Apr 26, 2023
1552211
v12 CI subset files
Apr 26, 2023
5171fff
update dockerfile to use debian stretch archive
Apr 27, 2023
3105cbe
update molecular-subtyping-MB to run with CI subsets
Apr 27, 2023
ff817f8
update subtyping module to run with CI subsets
Apr 28, 2023
dbad3aa
update tp53 to run with CI subsets
Apr 28, 2023
0c1eb9b
Merge pull request #355 from PediatricOpenTargets/v12-subset-files
jharenza Apr 28, 2023
1dd2a39
exclude immuned deconvo from GA checks
Apr 28, 2023
a278add
Merge pull request #344 from PediatricOpenTargets/v12-independent-sam…
jharenza Apr 29, 2023
68ed62e
Merge pull request #336 from PediatricOpenTargets/v12-integrate
jharenza Apr 29, 2023
646986a
Merge pull request #335 from PediatricOpenTargets/v12-path
jharenza Apr 29, 2023
a917d2b
Merge pull request #333 from PediatricOpenTargets/v12-lgg
jharenza Apr 29, 2023
719ab9b
Merge pull request #332 from PediatricOpenTargets/v12-atrt
jharenza Apr 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions .github/workflows/continuous_integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,10 @@ jobs:

- name: Molecular Subtyping - EPN
entrypoint: molecular-subtyping-EPN/run-molecular-subtyping-EPN.sh
openpbta_subset: 0

- name: Molecular Subtyping - EMBRYONAL
entrypoint: molecular-subtyping-embryonal/run-embryonal-subtyping.sh
openpbta_testing: 1
openpbta_subset: 0

- name: Molecular Subtyping - CHORDOMA
entrypoint: molecular-subtyping-chordoma/run-molecular-subtyping-chordoma.sh
Expand All @@ -85,7 +83,6 @@ jobs:

- name: Molecular Subtyping - HGG
entrypoint: molecular-subtyping-HGG/run-molecular-subtyping-HGG.sh
openpbta_subset: 0

- name: Molecular Subtyping - LGG
entrypoint: molecular-subtyping-LGAT/run_subtyping.sh
Expand Down Expand Up @@ -134,8 +131,8 @@ jobs:
- name: TMB calculation
entrypoint: tmb-calculation/run_tmb_calculation.sh

- name: Immune Deconvolution
entrypoint: immune-deconv/run-immune-deconv.sh
#- name: Immune Deconvolution
# entrypoint: immune-deconv/run-immune-deconv.sh

#- name: EFO/MONDO annotation
# entrypoint: efo-mondo-mapping/run_search_and_qc.sh
Expand Down
3 changes: 3 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ COPY scripts/install_bioc.r .

### Install apt-getable packages to start
#########################################

# stretch is EOL, so we need to use the archive
RUN echo "deb http://archive.debian.org/debian stretch main" > /etc/apt/sources.list
RUN apt-get update && apt-get install -y --no-install-recommends apt-utils dialog

# Add curl, bzip2 and some dev libs
Expand Down
10 changes: 5 additions & 5 deletions analyses/create-subset-files/01-get_biospecimen_identifiers.R
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,9 @@ get_biospecimen_ids <- function(filename, id_mapping_df) {
biospecimen_ids <- unique(bed_file$Kids_First_Biospecimen_ID)
} else if (grepl("cnv", filename)) {
# the two CNV files now have different structures
cnv_file <- readr::read_tsv(filename)
if (stringr::str_detect(filename, "gistic", negate = TRUE)) {
cnv_file <- readr::read_tsv(filename)
}
if (grepl("controlfreec|cnvkit_with_status", filename)) {
biospecimen_ids <- unique(cnv_file$Kids_First_Biospecimen_ID)
} else if (grepl("consensus_wgs_plus_cnvkit_wxs", filename)) {
Expand All @@ -96,10 +98,8 @@ get_biospecimen_ids <- function(filename, id_mapping_df) {
fusion_file <- readr::read_tsv(filename)
# the biospecimen IDs in the filtered/prioritize fusion list included with
# the download are in a column called 'Sample'
if (grepl("putative-oncogenic", filename)) {
if (grepl("putative-oncogenic|dgd|annoFuse", filename)) {
biospecimen_ids <- unique(fusion_file$Sample)
} else if(grepl("dgd", filename)) {
biospecimen_ids <- unique(fusion_file$Tumor_Sample_Barcode)
} else if (grepl("fusion_summary", filename)) {
biospecimen_ids <- unique(fusion_file$Kids_First_Biospecimen_ID)
} else {
Expand Down Expand Up @@ -502,7 +502,7 @@ matched_participant_id_list <- purrr::map(
nonmatched_participant_id_list <-
purrr::map(participant_id_list,
~ setdiff(.x, matched_participant_id_list)) %>%
purrr::map(~ sample(.x, num_nonmatched_participants))
purrr::map(~ sample(.x, min(length(.x),num_nonmatched_participants)))

# combine matched and nonmatched lists of ids for subsetting
participant_ids_for_subset <-
Expand Down
58 changes: 34 additions & 24 deletions analyses/create-subset-files/02-subset_files.R
Original file line number Diff line number Diff line change
Expand Up @@ -74,29 +74,35 @@ subset_files <- function(filename, biospecimen_ids, output_directory) {
# filtering strategy depends on the file type, mostly because how the sample
# IDs change based on the file type -- that's why this logic is required
if (grepl("snv", filename)) {
if (grepl("hotspots", filename)) {
snv_file <- data.table::fread(filename,
skip = 1, # skip version string
data.table = FALSE,
showProgress = FALSE)
# we need to obtain the version string from the first line of the MAF file
version_string <- readLines(filename, n = 1)
# filter + write to file with custom function
snv_file %>%
dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
write_maf_file(file_name = output_file,
version_string = version_string)
snv_file %>%
dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
readr::write_tsv(output_file)
} else {
# in a column 'Tumor_Sample_Barcode'
snv_file <- data.table::fread(filename, data.table = FALSE,
showProgress = FALSE)
snv_file %>%
dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
readr::write_tsv(output_file)
}
# if (grepl("hotspots", filename)) {
# snv_file <- data.table::fread(filename,
# skip = 1, # skip version string
# data.table = FALSE,
# showProgress = FALSE)
# # we need to obtain the version string from the first line of the MAF file
# version_string <- readLines(filename, n = 1)
# # filter + write to file with custom function
# snv_file %>%
# dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
# write_maf_file(file_name = output_file,
# version_string = version_string)
# snv_file %>%
# dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
# readr::write_tsv(output_file)
# } else {
# # in a column 'Tumor_Sample_Barcode'
# snv_file <- data.table::fread(filename, data.table = FALSE,
# showProgress = FALSE)
# snv_file %>%
# dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
# readr::write_tsv(output_file)
# }
# in a column 'Tumor_Sample_Barcode'
snv_file <- data.table::fread(filename, data.table = FALSE,
showProgress = FALSE)
snv_file %>%
dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
readr::write_tsv(output_file)
} else if (grepl("biospecimen", filename)) {
# in a column 'Kids_First_Biospecimen_ID'
bed_file <- readr::read_tsv(filename)
Expand Down Expand Up @@ -130,8 +136,12 @@ subset_files <- function(filename, biospecimen_ids, output_directory) {
readr::write_tsv(output_file)
} else if (grepl("dgd", filename)) {
fusion_file %>%
dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
dplyr::filter(Sample %in% biospecimen_ids) %>%
readr::write_tsv(output_file)
} else if (grepl("annoFuse", filename)) {
fusion_file %>%
dplyr::filter(Sample %in% biospecimen_ids) %>%
readr::write_tsv(output_file)
} else if (grepl("fusion_summary", filename)) {
fusion_file %>%
dplyr::filter(Kids_First_Biospecimen_ID %in% biospecimen_ids) %>%
Expand Down
Binary file modified analyses/create-subset-files/biospecimen_ids_for_subset.RDS
Binary file not shown.
6 changes: 2 additions & 4 deletions analyses/create-subset-files/create_subset_files.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ set -o pipefail

# Set defaults for release and biospecimen file name
BIOSPECIMEN_FILE=${BIOSPECIMEN_FILE:-biospecimen_ids_for_subset.RDS}
RELEASE=${RELEASE:-v11}
RELEASE=${RELEASE:-v12}
NUM_MATCHED=${NUM_MATCHED:-15}

# This option controls whether or not the two larger MAF files are skipped as
Expand Down Expand Up @@ -76,9 +76,7 @@ cp $FULL_DIRECTORY/histologies-base.tsv $SUBSET_DIRECTORY
cp $FULL_DIRECTORY/uberon-map-gtex-*.tsv $SUBSET_DIRECTORY
cp $FULL_DIRECTORY/efo-mondo-map.tsv $SUBSET_DIRECTORY
cp $FULL_DIRECTORY/ensg-hugo-pmtl-mapping.tsv $SUBSET_DIRECTORY
cp $FULL_DIRECTORY/infinium-annotation-mapping.tsv $SUBSET_DIRECTORY
cp $FULL_DIRECTORY/infinium-methylationepic-v-1-0-b5-manifest-file-csv.zip $SUBSET_DIRECTORY
cp $FULL_DIRECTORY/UCSC_hg19-GRCh37_Ensembl2RefSeq.tsv $SUBSET_DIRECTORY
cp $FULL_DIRECTORY/infinium.gencode.v39.probe.annotations.tsv.gz $SUBSET_DIRECTORY

# GISTIC output
cp $FULL_DIRECTORY/cnv-consensus-gistic.zip $SUBSET_DIRECTORY
Expand Down
38 changes: 22 additions & 16 deletions analyses/efo-mondo-mapping/results/efo-mondo-map-prefill.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ Adenocarcinoma EFO_0000228 MONDO_0004970 NCIT_C2852
Adrenocortical Carcinoma EFO_1000796 MONDO_0006639 NCIT_C9325
Anaplastic Large Cell Lymphoma EFO_0003032 MONDO_0020325 NCIT_C3720
Angiosarcoma EFO_0003968 MONDO_0016982 NCIT_C3088
Astroblastoma NA NA NA
Astrocytoma NA NA NA
Atypical choroid plexus papilloma NA NA NA
Atypical Teratoid Rhabdoid Tumor EFO_1002008 MONDO_0020560 NCIT_C6906
B Acute Lymphoblastic Leukemia/Lymphoma EFO_0000094 MONDO_0004967 NCIT_C8644
Bladder Urothelial Carcinoma EFO_0006544 MONDO_0005611 NCIT_C39851
Expand All @@ -15,58 +18,60 @@ Cavernoma EFO_1000151 MONDO_0003155 NCIT_C3086
Central neurocytoma EFO_1000856 MONDO_0019134 NCIT_C3791
Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma EFO_1000162 MONDO_0006143 NCIT_C157526
Cholangiocarcinoma EFO_0005221 MONDO_0019087 NCIT_C4436
Chondromyxoid fibroma EFO_0000332 MONDO_0018447 NCIT_C3830
Chordoma Orphanet_178 MONDO_0008978 NCIT_C2947
Choroid plexus carcinoma MONDO_0016718 MONDO_0016718 NCIT_C4715
Choroid plexus papilloma EFO_1000177 MONDO_0009837 NCIT_C3698
Choroid plexus tumor EFO_0007206 MONDO_0016717 NCIT_C4533
Chromophobe renal cell carcinoma EFO_0000335 MONDO_0017885 NCIT_C4146
Chronic Myelogenous Leukemia EFO_0000339 MONDO_0011996 NCIT_C3174
CIC-DUX4 Sarcoma EFO_0000691 MONDO_0005089 NCIT_C165663
Clear cell sarcoma of the kidney EFO_0000350 MONDO_0005006 NCIT_C4264
CNS Burkitt's lymphoma EFO_1000157 Orphanet_46135 NCIT_C9189
CNS Burkitt's lymphoma NA NA NA
CNS Embryonal tumor EFO_0005784 MONDO_0018843 NCIT_C5398
CNS Melanoma EFO_0002617 MONDO_0005191 NCIT_C133504
CNS neuroblastoma EFO_0000621 MONDO_0006130 NCIT_C4826
Colon Adenocarcinoma EFO_1001949 MONDO_0002271 NCIT_C4349
Craniopharyngioma EFO_1000209 MONDO_0002787 NCIT_C2964
Cutaneous Melanoma EFO_0000389 MONDO_0005012 NCIT_C3510
Desmoid-type fibromatosis EFO_0009907 Orphanet_873 NCIT_C9182
Desmoplastic infantile astrocytoma and ganglioglioma MONDO_0016731 MONDO_0016731 NCIT_C4747
Diffuse fibrillary astrocytoma Orphanet_251601 MONDO_0016688 NCIT_C4322
Diffuse fibrillary astrocytoma NA NA NA
Diffuse hemispheric glioma MONDO_0016680 MONDO_0016680 NA
Diffuse intrinsic pontine glioma EFO_1000026 MONDO_0006033 NCIT_C94764
Diffuse leptomeningeal glioneuronal tumor MONDO_0016745 MONDO_0016745 NCIT_C129424
Diffuse midline glioma EFO_1000026 MONDO_0006033 NCIT_C129309
Dysembryoplastic neuroepithelial tumor EFO_0005551 MONDO_0005505 NCIT_C9505
Dysgerminoma MONDO_0003002 MONDO_0003002 NCIT_C2996
Embryonal tumor EFO_0005784 MONDO_0005564 NCIT_C3264
Embryonal Tumor NOS EFO_0005784 MONDO_0005564 NCIT_C5398
Embryonal tumor with multilayer rosettes MONDO_0016715 MONDO_0016715 NCIT_C129499
Ependymoma EFO_1000028 MONDO_0016698 NCIT_C3017
Epstein-Barr virus-related tumor MONDO_0017342 MONDO_0017342 NA
Esophageal Carcinoma EFO_0002916 MONDO_0019086 NCIT_C3513
Ewing sarcoma EFO_0000174 MONDO_0012817 NCIT_C4817
Extraventricular neurocytoma EFO_1000856 MONDO_0016727 NCIT_C92555
Extraventricular neurocytoma MONDO_0016727 MONDO_0016727 NCIT_C92555
Fibromyxoid lesion MONDO_0037745 MONDO_0037745 NCIT_C66760
Ganglioglioma EFO_0003094 MONDO_0016733 NCIT_C3788
Ganglioneuroblastoma EFO_0000502 MONDO_0005035 NCIT_C3790
Ganglioneuroma EFO_0000500 MONDO_0005033 NCIT_C3049
Germ Cell Tumor EFO_0000514 MONDO_0005040 NCIT_C3708
Germinoma MONDO_0020580 MONDO_0020580 NCIT_C121618
Glial-neuronal tumor NOS MONDO_0016729 MONDO_0016729 NCIT_C4747
Glial-neuronal tumor MONDO_0016729 MONDO_0016729 NCIT_C4747
Glioblastoma NA NA NA
Glioblastoma Multiforme EFO_0000519 MONDO_0018177 NCIT_C3058
Gliomatosis Cerebri MONDO_0016683 MONDO_0016683 NCIT_C4318
Gliosarcoma EFO_1001465 MONDO_0016681 NCIT_C3796
Head and Neck Squamous Cell Carcinoma EFO_0000181 MONDO_0010150 NCIT_C34447
Hemangioblastoma MONDO_0016748 MONDO_0016748 NCIT_C3801
Hepatoblastoma EFO_1000292 MONDO_0018666 NCIT_C3728
Hepatocellular Carcinoma EFO_0000182 MONDO_0007256 NCIT_C3099
High-grade glioma/astrocytoma MONDO_0016680 MONDO_0016680 NCIT_C102897
High-grade glioma MONDO_0100342 MONDO_0100342 NCIT_C4822
Histiocytic tumor MONDO_0020081 MONDO_0020081 NCIT_C9294
Hodgkin's lymphoma EFO_0000183 MONDO_0004952 NCIT_C9357
Infant-type hemispheric glioma EFO_0005543 MONDO_0014695 NCIT_C185471
Infantile Fibrosarcoma MONDO_0002678 MONDO_0002678 NCIT_C4244
Inflammatory Myofibroblastic Tumor MONDO_0015798 MONDO_0015798 NCIT_C6481
Intrahepatic Cholangiocarcinoma EFO_1001961 MONDO_0003210 NCIT_C35417
Intraneural perineuroma MONDO_0015032 MONDO_0015032 NCIT_C6911
Juvenile xanthogranuloma EFO_1000311 MONDO_0015534 NCIT_C3451
Langerhans Cell histiocytosis EFO_1000318 MONDO_0018310 NCIT_C3107
Low-grade glioma/astrocytoma MONDO_0016685 MONDO_0016685 NCIT_C116342
Low-grade glioma MONDO_0021637 MONDO_0021637 NCIT_C132067
Lung Adenocarcinoma EFO_0000571 MONDO_0005061 NCIT_C3512
Lung Squamous Cell Carcinoma EFO_0000708 MONDO_0005097 NCIT_C3493
Lymphoid Neoplasm Diffuse Large B-cell Lymphoma EFO_0000403 MONDO_0018905 NCIT_C8851
Expand All @@ -75,24 +80,25 @@ Medulloblastoma EFO_0002939 MONDO_0007959 NCIT_C3222
Melanocytic tumor MONDO_0003222 MONDO_0003222 NCIT_C5504
Melanoma EFO_0000756 MONDO_0005105 NCIT_C3224
Meningioma Orphanet_2495 MONDO_0016642 NCIT_C3230
Mesenchymal tumor EFO_1000473 MONDO_0003512 NCIT_C7059
Mesothelioma EFO_0000588 MONDO_0005065 NCIT_C3234
Metastatic secondary tumors EFO_0009812 MONDO_0024883 NCIT_C4968
Mixed germ cell tumor MONDO_0015864 MONDO_0015864 NCIT_C4290
Myeloid Neoplasm EFO_0002427 MONDO_0005170 NCIT_C9290
Neuroblastoma EFO_0000621 MONDO_0005072 NCIT_C3270
Neuroepithelial neoplasm MONDO_0021193 MONDO_0021193 NCIT_C3787
Neurofibroma/Plexiform EFO_0000658 MONDO_0003304 NCIT_C3797
Non-germinomatous germ cell tumor MONDO_0020580 MONDO_0020580 NCIT_C121619
Non-Hodgkin Lymphoma EFO_0005952 MONDO_0018908 NCIT_C3211
Oligodendroglioma EFO_0000632 MONDO_0016695 NCIT_C3288
Osteosarcoma EFO_0000637 MONDO_0009807 NCIT_C9145
Ovarian Serous Cystadenocarcinoma EFO_1000043 MONDO_0006046 NCIT_C7978
Pancreatic Adenocarcinoma EFO_1000044 MONDO_0006047 NCIT_C8294
Perineuroma MONDO_0019404 MONDO_0019404 NCIT_C4973
Pheochromocytoma and Paraganglioma EFO_0020005 MONDO_0035540 NA
Pilocytic astrocytoma MONDO_0016691 MONDO_0016691 NCIT_C4047
Pineal tumor MONDO_0021232 MONDO_0021232 NCIT_C3328
Pilocytic astrocytoma NA NA NA
Pineoblastoma EFO_1000475 MONDO_0016722 NCIT_C9344
Pineocytoma EFO_1000476 MONDO_0016723 NCIT_C6966
Pleomorphic xanthoastrocytoma MONDO_0016690 MONDO_0016690 NCIT_C4323
Primary mediastinal large B cell lymphoma MONDO_0004021 MONDO_0020323 NCIT_C9280
Prostate Adenocarcinoma EFO_0000673 MONDO_0005082 NCIT_C2919
Rectum Adenocarcinoma EFO_0005631 MONDO_0002169 NCIT_C9383
Renal Clear Cell Carcinoma EFO_0000349 MONDO_0005005 NCIT_C4033
Expand All @@ -116,4 +122,4 @@ Thyroid Gland Papillary Carcinoma EFO_0000641 MONDO_0005075 NCIT_C4035
Uterine Carcinosarcoma EFO_1000613 MONDO_0006485 NCIT_C42700
Uterine Corpus Endometrial Carcinoma EFO_0007532 MONDO_0000553 NCIT_C159413
Uveal Melanoma EFO_1000616 MONDO_0006486 NCIT_C7712
Wilms tumor MONDO_0006058 MONDO_0006058 NCIT_C3267
Wilms tumor EFO_1000056 MONDO_0006058 NCIT_C3267
Loading